
Halluminate
The RL environment and data factory for computer-use AI.
Seed
Round
YC S25 · revenue-generating
2.8×
MoM revenue growth
Excluding active frontier-lab PoC · mid-5-figure MRR base
$344B
Hyperscaler AI capex (2025)
Spilling into training, eval, and env infra
In production at
Two of the largest browser-agent companies plus an active frontier-lab pilot targeted to convert this quarter.[2]
Thesis
- 01
Post-training is the new pre-training. Frontier model gains from scaling parameters and tokens are flatlining. The capability gains that matter now — agentic planning, long-horizon tool use, computer use — are coming from reinforcement learning in real-world environments. Every recent jump (o-series reasoning, ChatGPT Agent, Claude Computer Use) is a post-training story, not a pre-training one.[3] [4]
- 02
Environment scarcity is the bottleneck. Every lab is building computer-use agents; every lab is starving for environments to train and grade them on. Live web training is unsafe, unreliable, and unscalable — websites rate-limit, change weekly, and reset state in ways that destroy training signal.[1] Halluminate ships the missing piece as a product, not a side project.
- 03
The data flywheel compounds. Every agent run produces a labeled trajectory. Every environment seeds the next. Every checker becomes a reward model. The lead Halluminate accumulates is not the individual sandbox — it's the catalog of reusable modules, the library of verifiable tasks, and the corpus of expert trajectories that compound into a permanent advantage.
- 04
The customer is every frontier lab and every serious agent company. Hyperscalers spent $344B on AI in 2025 alone.[5] The same dollars that funded pre-training are now flowing into post-training stacks — and the post-training stack is inseparable from environment infrastructure. This is the most capital-rich, urgency-pressed customer set in software.
Problem
Computer-use agents work in the demo. They break in production. The gap is environment quality.
Browser and computer-use AI is the most-watched capability in the model lab roadmap — and the most fragile in deployment. WebArena, the canonical research benchmark, still shows agents underperforming humans by enormous margins on routine multi-app workflows.[1] a16z's framing is identical: "current agent offerings more closely resemble advanced RPA tools than true autonomous systems."[13]
The reason isn't model capacity. The reason is that labs train agents on whatever data they can scrape — public web recordings, OSS web benchmarks, synthetic prompts — and then fine-tune them on a few thousand hand-curated trajectories. None of that mirrors the production stack the agent will actually run against. The Salesforce instance, the Slack workspace, the ServiceNow ticket queue, the QuickBooks reconciliation flow — these are the surfaces the customer cares about, and they look nothing like the public web.
Live-web training makes it worse. Real websites can't be reset. They rate-limit aggressively, ban automation, and punish exploration. They change layout overnight. They cost real money when an agent buys the wrong flight. Every lab knows this; every lab has tried to build a stand-in internally; every lab has discovered that environment authoring is a product problem masquerading as a research problem.
$344B
AI capex (2025)
Hyperscaler spend now spilling into training, eval, and env infra
$14.3B
Meta → Scale AI
Single deal — labs paying premium prices for training data
$15B
Applied Intuition
AV simulation infra — proves the env-infra business model
LA Times AI capex 2025[5] · NYT Meta-Scale deal[7] · Reuters Applied Intuition valuation[11]
Why Now
The post-training era is here. Environment quality is the next constraint.
Three trends collided in the same eighteen months: agents shipped to production, pre-training gains flatlined, and every lab woke up needing the same thing — verifiable environments to train and grade computer-use models on.
Until AIs can learn through real-world trial and error like humans do, we must create custom environments that can faithfully simulate reality and accurately reward AIs for skillfully navigating the simulation.
Mechanize[12]
RL environments lab
Current agents still exhibit significant limitations in capability — struggling with complex or unfamiliar interfaces — and efficiency, operating too slowly and expensively to compete effectively with human operators.
a16z[13]
Computer-Use & Agentic Coworkers
The RL environment platforms are becoming foundational infrastructure for anyone looking to train generalist AI workers.
Felicis[14]
Rocket Fuel for AI
Three preconditions converged in the same eighteen months.
Computer-use agents are now first-party products. OpenAI shipped ChatGPT Agent. Anthropic shipped Claude Computer Use. Google launched Project Mariner. Browserbase shipped Director.[3] [4] [13] [15] The category went from research demo to flagship product in twelve months. Every one of those launches has the same next-step problem: making the agent reliable on the long tail of enterprise apps.
Post-training is where capability now comes from. The o-series, Claude 3.5 Computer Use, and ChatGPT Agent were all RL post-training stories. As Mechanize put it, "until AIs can learn through real-world trial and error like humans do, we must create custom environments that can faithfully simulate reality and accurately reward AIs for skillfully navigating the simulation."[12] The output of post-training is no better than the environment it was trained against.
The money is following the bottleneck. Hyperscaler AI capex hit $344B in 2025.[5] [6] Meta paid $14.3B for Scale.[7] Surge is raising $1B at a $30B+ valuation.[9] The same labs and acquirers that bid up the pre-training data stack are turning their attention to the post-training environment stack. Felicis calls RL environment platforms "foundational infrastructure for anyone looking to train generalist AI workers."[14]
Until AIs can learn through real-world trial and error like humans do, we must create custom environments that can faithfully simulate reality and accurately reward AIs for skillfully navigating the simulation.
How It Works
Two products. One loop. Environments and the data they produce.
The loop is the product.
Design task → simulate → instrument → verify → train → review. Halluminate runs the same loop the best in-house lab teams run, packaged as infrastructure. Customers pick a workflow, Westworld stages a sandbox for it, checkers score every episode, and Athena's reviewers triage the failures into reward models.
Reusable modules bend the curve. Authentication, billing, search, form fills, ticket queues, notification banners — the same primitives show up in every enterprise UI. Each new environment shares more of its scaffolding with the last. The catalog grows superlinearly to engineering hours, the same way Applied Intuition's scenario library did in AV.[11]
Interoperable with the agent stack labs already use. Westworld plugs into popular agent frameworks, browser automation infra (including Browserbase),[15] and the standard training pipelines labs run. No new framework to adopt — Halluminate becomes the env layer beneath whatever the customer already runs.
Environments Are the New Datasets
Static data taught models to predict. Environments teach agents to act.
The shift from "sweatshop data" to simulation-as-data is the through-line of the last twelve months of frontier research. Every lab is saying the same thing. The companies that build the environment layer become the data layer of the next generation.
Datasets won the last era. Environments win the next.
Why static data ran out of room. Pre-training was about scraping the world. Post-training is about practicing in it. A static label set can teach a model what a "good" outcome looks like once. An interactive environment teaches it how to recover when something unexpected happens — and that is the entire content of agent reliability.
Why OpenAI built Procgen. The earliest lab investments in environment infrastructure — Procgen, DeepMind's StarCraft sandbox, the OpenAI Gym lineage — were research bets that environments are the scarcest resource in RL.[8] The pattern is now repeating one level up the stack: instead of toy gym tasks, the bottleneck environments are enterprise workflows. Same shape of problem, same shape of moat.
Why the data flywheel compounds. Every agent episode against a Halluminate environment produces a trajectory, a checker score, an annotator review, and a reward signal. That data is the canonical training input for the next model release. Customers buy environments today; they end up renting the resulting data corpus for the next decade. The catalog and the corpus grow together.
The RL environment platforms are becoming foundational infrastructure for anyone looking to train generalist AI workers.
Market
The buyer set is small. The budget is enormous.
Frontier model labs. Three to five labs control the high-end of the post-training spend. Each one is staffed with a small team trying to produce computer-use environments fast enough to keep up with the agent roadmap. Halluminate is already in active pilot with one of them, targeted to convert this quarter. Sustained AI capex creates adjacent demand for training and eval infra to help labs realize their model investments.[5] [6]
Serious agent companies. Browser Use, Yutori, Manus, Browserbase Director, and the next wave of agent products all need the same thing the labs need.[13] [15] Their differentiator is reliability in the customer's actual stack — which means training and grading against environments that mirror that stack. Halluminate sells the same product on the same loop.
Enterprise. The medium-term buyer is the enterprise platform team deploying internal agents. Vertical functions — marketing, finance, sales, HR — all require company-specific tuning against company-specific surfaces.[13] The same Westworld + Athena loop powers internal agent evaluation before the agent ever touches production.
Every frontier lab is trying to build the environment stack in-house. Every one of them is failing to keep up with their own agent roadmap. Halluminate is the only company shipping the catalog as a product.
Competitive landscape
Data factories. In-house teams. Halluminate is the only one built for environments.
The adjacent categories all touch the same loop — data-labeling shops, RLaaS platforms, OSS research benches, and lab-internal teams — but none of them are organized around the catalog of verifiable enterprise environments. That gap is the wedge.
Scale and Surge sell workforce. Snorkel sold labels. Labs build sandboxes between releases. Halluminate is the only company building the environment catalog as a product — and the product is exactly what the entire post-training stack is bottlenecked on.
Founder deep dive
A product-research operator and a startup data engineer building the part of the lab stack no lab has bandwidth for.
Founder & team
Risks & mitigations
What we're watching
References
- [1]WebArena — Realistic web environment for autonomous agents
- [2]Y Combinator — Halluminate company profile
- [3]OpenAI — Introducing ChatGPT agent (computer-use launch)
- [4]Anthropic — Computer Use documentation
- [5]LA Times — Big Tech AI spending to reach $344B in 2025
- [6]New York Times — AI spending and the real economy (2025)
- [7]New York Times — Meta invests $14.3B in Scale AI
- [8]OpenAI — Procgen Benchmark: gym environments for generalization in RL
- [9]Reuters — Surge AI explores $1B raise at $30B+ valuation
- [10]TechCrunch — Adept raises $350M for computer-use agents
- [11]Reuters — Applied Intuition valued at $15B (AV simulation infrastructure)
- [12]Mechanize — Sweatshop data is over (RL environments thesis)
- [13]a16z — The rise of computer use and agentic coworkers
- [14]Felicis — Rocket Fuel for AI: RL environments and the RLaaS market
- [15]BuiltIn SF — Browserbase Director and $40M Series B


