Cua

Cua

The open-source Docker container for computer-use AI agents — a cloud desktop for every agent, in one command.

Cua Walkthrough Demo[1]

Seed

Round

YC P25 · MIT-licensed

17.8k

GitHub stars

1.1k forks · viral launch on HN

<1s

Cloud desktop hot-start

97% native CPU on Apple Silicon

In production with

Hugging Face

Open model hub

Datadog

Observability

Meta

Frontier model research

NVIDIA

Agent runtimes

Red Hat

Enterprise Linux

Elastic

Search infra

Duolingo

Consumer AI

HUD

Agent evaluation · Series A

Plus 50,000+ engineers building on the platform across Claude Code, Cursor, Codex, and direct API integrations.[2] [1]

Thesis

Every computer-use agent needs a desktop. Cua is the open-source container — Docker for computer-use AI — that gives any agent a sandboxed cloud desktop in one command, hot-started in under a second.[1] [2] The longer-term bet: as Anthropic, OpenAI, and every framework downstream push agents from the browser onto the OS, Cua becomes the default infrastructure layer underneath them. The model layer ships the brains; Cua ships the body.
  1. 01

    Computer-use is the next API surface. Anthropic shipped computer use with Claude 3.5 Sonnet in October 2024 — 14.9% on OSWorld, nearly 2× the next-best model.[7] OpenAI followed with Operator in January 2025.[4] GUIs are the universal interface: the agent that controls a desktop can do any work a human can. Browser-only is a subset; the full stack is the OS.

  2. 02

    Docker is the right metaphor — and the right wedge. A single command spins up a sandboxed desktop. Containerization unlocks fan-out, reproducibility, and security for agent workloads. Browserbase did this for browsers and built a real business on top; Cua extends the same primitive to macOS, Linux, Windows, and Android.[5] [2]

  3. 03

    OSS wins this category by default. Builders need to inspect, fork, and self-host computer-use infrastructure — the security surface is too sensitive to outsource to a closed binary. MIT-licensed, 17.8k GitHub stars and growing, native integrations with Claude Code, Cursor, Codex, plus OpenAI, Anthropic, Ollama, and OpenRouter as model providers.[1] The OSS core is the distribution; the managed cloud is the renewal.

  4. 04

    The founder is the infra builder. Francesco co-created Windows Agent Arena at Microsoft AI — the canonical benchmark for OS-level computer-use agents — before starting Cua.[3] The person who designed the eval is now building the runtime everyone else has to pass it on. That sequence — research the problem, define the benchmark, ship the production primitive — is structurally hard to imitate.

Problem

Every computer-use agent has the same first problem: where does it run?

Letting the agent loose on the user's host is a security non-starter — the LLM hallucinates a rm -rf, exfiltrates a credential, or installs a payload. Spinning up a desktop VM is the obvious answer, but every team building one ends up rebuilding the same six layers: Apple Virtualization Framework wiring, screen capture, input simulation, container orchestration, isolation policy, and a hot-start runtime that boots fast enough for a chat-speed agent loop.[1]

Browser agents got a head start because the browser is already a sandbox. Browser Use commoditized the agent framework on top (50k+ stars).[13] Browserbase took multiple rounds of funding to build managed browser infrastructure underneath.[5] But the browser is a subset — Anthropic and OpenAI both shipped computer-use APIs at the OS level precisely because the bulk of real work happens outside the tab.[7] [4]

There is no Browserbase for the desktop. Every team trying to ship a CUA-class product is paying the infrastructure tax in-house — and the tax is high enough that some give up and ship browser-only. Cua is the missing primitive: one command, MIT-licensed Linux, macOS, Windows, or Android container, exposed through a computer-use interface any LLM can drive, running at 97% native CPU on Apple Silicon.[1] [2]

14.9%

Claude 3.5 Sonnet on OSWorld

Nearly 2× the next-best model — Oct 2024 launch[7]

38.1%

OpenAI Operator on OS-level tasks

Jan 2025 launch · production AI on the desktop[4]

$22B → $90B+

RPA market 2024 → 2030

The legacy automation budget computer-use replaces[6]

The model layer is past proof-of-concept. The runtime layer is the open question — and the one Cua exists to answer.

Why Now

The agent stack split into model, framework, and infrastructure layers — inside a single year.

Anthropic and OpenAI validated the demand curve at the model layer. Browser Use and Browserbase proved the framework /infrastructure split works. Cua is the OS-level counterpart to Browserbase — the runtime under everything else.

Developers can direct Claude to use computers the way people do — by looking at a screen, moving a cursor, clicking buttons, and typing text.

Anthropic

Anthropic[7]

Claude 3.5 Sonnet launch · Oct 2024

Operator can fill forms, place online orders, schedule appointments, and complete other repetitive tasks — a first step toward AI that acts on your behalf across the digital world.

OpenAI

OpenAI[4]

Operator launch · Jan 2025

The agent stack is splitting into model, framework, and infrastructure layers — the value will compound at the layer that handles the runtime, not the one that handles the prompt.

Browser agent market map

Browser agent market map[11]

Theta Labs · 2025

Three preconditions converged inside a single year.

Frontier labs validated the API surface. Anthropic shipped Claude 3.5 Sonnet computer use in October 2024 — 14.9% on OSWorld, nearly 2× the next-best system, and partnered with Replit, Canva, Asana, DoorDash, Cognition, and The Browser Company on launch day.[7] OpenAI followed three months later with Operator at 38.1% on OS-level tasks.[4] The category went from research demo to production primitive in twelve months.

Vision-language models hit the latency floor. Sub-second screen reasoning is now table stakes — Claude Haiku, GPT-4.1 mini, and a wave of open multimodal models cleared the bar. The bottleneck moved from model to runtime. Container boot time, snapshot/restore, and image distribution are now the rate-limiting steps in an agent loop, and they are exactly the surface Cua engineers against.[1]

Developer demand outran the infrastructure. 17.8k GitHub stars on Cua's OSS core, 50k+ engineers using the platform, a 600+ developer Discord, and parallel multi-agent products (Codex, Claude Code's parallel execution) that already spin up multiple desktops concurrently.[1] [2] The orchestration problem isn't a future bet — it's a present-tense one.

Developers can direct Claude to use computers the way people do — by looking at a screen, moving a cursor, clicking buttons, and typing text.
Anthropic, Claude 3.5 Sonnet launch[7]

How It Works

Three layers. One container per agent. Boot to action in under a second.

Step 01

Lume — the virtualization layer

Apple Virtualization Framework wired through a clean container API. macOS and Linux today on Apple Silicon at ~97% native CPU; Windows and Android in the managed cloud. The hard part — boot time, isolation, snapshot, restore — is solved and benchmarked in OSS.

Step 02

Computer-Use Interface — the agent contract

Screen capture, element detection, mouse and keyboard simulation, exposed as a uniform interface that works whether the OS underneath is macOS, Linux, Windows, or Android. The same agent code runs against any sandbox; the runtime is fungible.

Step 03

Cua Agent Framework + MCP — the integration surface

LLM-agnostic adapter for OpenAI, Anthropic, Ollama, LM Studio, and OpenRouter. An MCP server lets Claude Code, Cursor, and Codex drive a desktop natively. The same primitive plugs into LangGraph, AutoGen, and Browser Use.

Self-hosted is the funnel. The managed cloud is where the workload ships.

Hot-start under one second. The managed cloud snapshots a warm desktop image and restores it for every new agent session. The cost difference between a cold-boot VM and a hot-start image is roughly 60× — the difference between a runtime you can spin up per chat turn and one you can't.[2]

Cross-OS fleet orchestration. macOS, Linux, Windows, and Android containers from one control plane. Windows desktops in particular are exclusive to the cloud — Apple Silicon licensing makes self-hosting Windows impractical for most teams, which makes managed the only path for the workloads that actually require it (legacy ERP, .NET, native enterprise tooling).[2]

Observability, recording, and replay. Every agent action recorded as a video plus structured trace. The artifact stack is what turns an agent prototype into a production workload — eval harnesses, regression testing, incident debugging. The OSS gives you a container; the cloud gives you a system.[2]

Docker for Computer-Use

The OS-level agent stack is at its pre-Docker moment.

The pattern is exact. Before Docker, every team rebuilt the same chroot plus init system plus image layer plus networking stack. After Docker, none of them did — and the value moved up to orchestration, registries, and managed clouds. Computer-use is at the same moment now.

The container is the wedge. The fleet is the business.

The runtime stops being a moat the moment it becomes a standard. Docker the company didn't capture the orchestration value — Kubernetes, registries, and the hyperscalers did. Cua's thesis is to be the OSS standard at the runtime layer and the first mover at the orchestration layer. Browserbase ran the same play in browsers and built a real business on it.[5]

The operational memory compounds. Every agent run leaves a trace inside the container — what worked, what failed, which UI surfaces broke, which recovery strategies converged. That dataset is the natural input for RL training, regression evals, and reliability improvements. Cua-Bench is already in the repo for exactly this loop.[1]

The container is the API contract that survives model churn. Frontier models cycle every six months. The OS surface doesn't. A team that builds against Cua's computer-use interface today will run the same code against whichever model is best in 18 months. Abstraction over substrate is the durable position.

How can AI agents interact with operating systems, desktop applications, and browsers without jeopardizing security or sacrificing performance?
Cua YC launch — Docker container for computer-use agents[8]

Market

The runtime layer is structurally larger than the framework layer.

Near-term ICP is every team shipping a computer-use agent: foundation labs running evals (HUD, on the customer list), legacy automation startups (Fira), academic research, YC AI cohort companies, and the 50k+ engineers already building on Cua.[2] The buyer is a technical founder writing a research preview, or a Series-A team scaling fleet ops — both want OSS by default and managed when production demands it.

Longer-term, the category is agent infrastructure as a line item. RPA is a ~$22B (2024) market growing toward $90B+ by 2030, driven by enterprise digitization.[6] The CUA-class agent stack is the AI-native successor — the segment where workflows RPA can't reach (legacy ERP, design tools, CAD, native apps with no API) finally become automatable. Browserbase has proven a real business sits under browser agents; the OS-level fleet is structurally a larger surface.

Near term — every team shipping a CUA-class agent

YC current and recent batches, AI infra teams at frontier labs, agent framework authors, and the long tail of builders integrating computer use into existing products. 50,000+ engineers already on the platform; 17.8k stars and growing weekly.[1] [2]

Long term — agent infrastructure for every enterprise OS workload

RPA ~$22B (2024) → $90B+ by 2030 is the inherited budget.[6] The CUA-class stack expands the surface to native apps, design tools, legacy ERP, and mobile QA — workloads browser-only automation structurally cannot reach. The OSS container is the wedge; managed fleet, observability, and Windows /Android exclusives are the renewal.

Every team building a computer-use agent has to solve the same desktop problem. Cua should be the answer by default — and that's how the next generation of agent infrastructure gets written.
Orange Collective

Competitive landscape

Four neighbors. None of them ship the OSS desktop container.

The frontier labs are upstream. Browser infra is adjacent. Dev sandboxes are a different shape. Agent frameworks are downstream consumers. Cua's position — OSS desktop runtime, multi-OS, multi-LLM — has no direct equivalent.

Anthropic Computer Use · OpenAI Operator

Model layer — upstream

Both shipped computer-use APIs at the model layer — the brains of the agent. Neither ships the runtime: customers using Claude computer use still need to provide a sandboxed desktop, and Operator's discontinuation in August 2025 reset the field. Cua sits underneath both, not against them. The frontier labs define the demand curve; Cua absorbs the infrastructure that comes with it.[7]

Browserbase · Hyperbrowser

Browser infra — adjacent

Browserbase has built a real business around managed browser sandboxes. But the browser is a subset — most real work happens in native apps. Cua extends the same primitive to the full OS (macOS, Linux, Windows, Android). Browserbase's existence is the proof of category; the OS-level fleet is the bigger half of it.[5]

E2B · dev sandboxes

Different shape

E2B and similar dev-sandbox products give agents a Linux shell for code execution. Terminal-first, not GUI-first. The CUA workload (Photoshop, CAD, design tools, legacy ERP) requires a real desktop with a display server. E2B is adjacent infrastructure, not a substitute, and the two are likely to be deployed side by side.

Browser Use · Skyvern · Stagehand · LaVague

Framework layer — downstream

Browser Use commoditized the framework layer for web (50k+ stars, $17M Series A). These frameworks are downstream consumers of infrastructure, not competitors to it. Cua is positioned to serve as the desktop runtime they all plug into when they expand beyond the browser — the same partner/customer pattern Browserbase ran with the same wave of frameworks.[13]

The model layer is shipping the brains. Someone has to ship the body. Cua is the open-source default for the runtime — and the OSS default usually wins infrastructure.
Orange Collective

Founder deep dive

The person who wrote the benchmark is now writing the runtime.

Why Francesco built it. Five years at Microsoft AI, sitting inside the team that defined what "OS-level agent" even meant. He co-created Windows Agent Arena — the benchmark frontier labs use to evaluate computer-use agents.[3] The pattern is rare and powerful: build the eval, watch every team in the field struggle with the same infrastructure problem, then leave to build the production primitive that fixes it. The founder defined the goalposts before he started building the field.

Why this is the right shape for the founder. Cua is, at its core, a virtualization and runtime company with an agent skin on top. The Apple Virtualization Framework work is hard — 97% native CPU on Apple Silicon is a real engineering achievement, not a flag. The OS-agent surface is even harder; the people qualified to ship both halves are measurably scarce. Francesco's background sits exactly at that intersection.[1] [3]

Why velocity is a feature. 17.8k GitHub stars from a launch that went viral on HN, with the OSS repo continuing to ship multi-OS support, MCP integration, hot-start cloud, and a benchmark suite — all inside a small team. The pace of shipping is the leading indicator that maps to what we've watched in every category-winning OSS infra company: Vercel, Supabase, Browserbase. Cua's repo cadence is in that band.[1]

The long arc. Cua becomes the operating system for AI agents on the desktop. Every agent run goes through one of its containers; every fleet of agents is orchestrated through its cloud; every benchmark in the category cites the runtime. The OSS core wins distribution; the managed cloud captures the workload; the long-term moat is the operational memory of how thousands of agents actually behave inside a sandbox. The container is the wedge. The fleet is the business. The data is the moat.

Founder & team

Francesco Bonacci

Francesco Bonacci

Founder & CEO

Five years at Microsoft across Xbox and Microsoft AI before founding Cua. Co-creator of Windows Agent Arena — the benchmark frontier labs use to evaluate OS-level computer-use agents. Designed the eval, then built the runtime.

Risks & mitigations

Risk

Anthropic, OpenAI, or a hyperscaler bundles a managed desktop alongside their computer-use API and absorbs the runtime layer.

Mitigation

Both Anthropic Computer Use and OpenAI Operator launched without a sandbox attached — both companies have been explicit that customers bring the desktop. Their structural incentive is the opposite of vertically integrating: every closed runtime they ship locks their model to one substrate and rules out the OSS, multi-LLM workloads that actually drive volume. Cua's MIT license, multi-provider adapter (Anthropic + OpenAI + Ollama + LM Studio + OpenRouter), and Linux/macOS/Windows/Android fleet are the surface a single vendor cannot match. The frontier labs define the demand curve; Cua absorbs the infrastructure that comes with it.

Risk

Computer-use adoption is slower than the hype — agents stay browser-centric, the OS-level surface stays a research toy.

Mitigation

Two independent data points say it's already arrived. Anthropic's October 2024 launch reported Claude 3.5 Sonnet at 14.9% on OSWorld — nearly 2× the next-best model — and partnered with Replit, Canva, Asana, DoorDash, Cognition, and The Browser Company on launch day. [7] OpenAI followed three months later. Cua already has 17.8k stars, 50k+ engineers, and named users including HUD (Series A) for agent evaluation. The category is shipping, not speculating.

Risk

Open-source monetization — every infra OSS founder hits the moment where the cloud cannibalizes the wedge.

Mitigation

Cua is following the Browserbase / Vercel / Supabase playbook: MIT core wins distribution, managed cloud wins the renewal. The managed surface — hot-start under one second, fleet orchestration across four OSes, Windows containers exclusive to the cloud, observability and replay — is structurally a different product from the self-hosted runtime. Every paying user converted to managed in the first cohort. The OSS is the funnel; the cloud is where the workload sits in production.

Risk

VM-based infrastructure is expensive — agent workloads at scale eat margin and a lighter shim catches up.

Mitigation

Ephemeral VMs with snapshot/restore and sub-second hot-start are an order of magnitude cheaper than full boots — Cua's runtime is already engineered around this. Apple Virtualization Framework on Apple Silicon achieves 97% native CPU; the marginal cost of an agent action is closer to a function call than a server. The lighter-shim alternative (browser-only, terminal-only) is a strictly smaller product surface and structurally cannot reach the workloads — CAD, design tools, legacy ERP, native apps — that drive the enterprise budget.

What we're watching

  • Cloud fleet utilization crossing the line where managed revenue exceeds OSS-led conversion — the moment the wedge becomes a business.
  • First Fortune 500 production workload — the signal that OS-level agents have crossed from research evals into enterprise procurement.
  • Anthropic or OpenAI partnering with Cua as a reference runtime — explicit or implicit endorsement of the OSS substrate.
  • Windows-native and Android container GA — the unlock for legacy ERP, mobile QA, and the workloads browser infra structurally cannot reach.

References

  1. [1]GitHub — trycua/cua (MIT, 17.8k stars, 1.1k forks)
  2. [2]Cua — Product homepage (50k+ engineers, <1s hot-start, multi-OS)
  3. [3]Windows Agent Arena — Evaluating multi-modal OS agents at scale (arXiv 2409.08264, Francesco Bonacci et al.)
  4. [4]OpenAI — Introducing Operator (Jan 23, 2025)
  5. [5]Browserbase — Cloud browsers for AI agents (~$40M raised, infra layer reference)
  6. [6]Grand View Research — RPA market $22B (2024) → $90B+ by 2030
  7. [7]Anthropic — Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku (Oct 22, 2024)
  8. [8]Y Combinator — Cua launch: Docker container for computer-use agents
  9. [9]Y Combinator — Cua company profile (P25, Diana Hu)
  10. [10]Model Context Protocol — Open standard for connecting AI assistants to tools (Anthropic, Nov 2024)
  11. [11]Theta Labs — Browser agent market map (X, 2025)
  12. [12]OSWorld — Benchmarking multimodal agents for open-ended tasks on real computer environments
  13. [13]Browser Use — Open-source agent framework for web (50k+ stars)
  14. [14]Cua Discord — 600+ developer community