Decisions and Dollars

How application companies survive the "what if Anthropic builds this" question

Jun 12, 2026

Anthropic shipped Claude Fable 5 yesterday, the first Mythos-class model the public can use. It tops nearly every benchmark there is, with the lead widening the longer the task runs. The smarter the model, the less your software is worth on its own.

I tweeted last week that every venture-backed application company now has to be a data company or a fintech company, ideally both. This essay is the long version.

Let’s start with the shift in who is using the software. I wrote two years ago that per-seat pricing cracks once agents become the users, and we seem to have crossed the line: Cloudflare says agent traffic passed human traffic for the first time. This fact has been debated online, but the trend is clear that agents are going to be the primary customer for all software. Think about what this does to the business model. A thousand employees running a hundred thousand agents isn’t a hundred thousand seats. So, what can an application company charge for?

An agent leaves behind two things worth metering: the decisions it makes, and the money it moves. The decisions are data. The money is fintech. Those are the two companies you have to become.

Decisions

xAI has an option to buy Cursor for $60B, a company that’s now doing about $4B in annualized revenue. The software is NOT the main reason xAI had to pay up. Anthropic and OpenAI were already watching developers work in real time using Claude Code and Codex. Buying Cursor was the fastest way for xAI to get into the token flow. Musk (world’s first trillionaire mind you) said as much, that the record of how a million developers actually use models would go straight into Grok’s training, and the high price was the toll for skipping the years it would take to collect that data the slow way.

People rebuilt working Cursor clones within weeks when it first launched and none of them caught on, because Cursor won on taste. The thousand small calls about what to surface and when to disappear. A clone copies the interface but inherits none of it. It can never reproduce the years of those developers accepting and rejecting and rewriting what the model handed back. Cursor now trains its own models on those diffs. The product won on taste. The data, however, became its primary moat.

To see why those diffs are worth sixty billion, imagine replacing 90% of your employees with a team of geniuses who have no idea how your company operates. It’s just chaos. That is roughly what dropping a frontier model into your business feels like, and Fable 5 just made the problem more obvious. Because a model that solves 80% of real software tasks where last year’s best managed barely half is not the thing you’re short of. The geniuses are interchangeable, all brilliant, all hard to tell apart on any of these benchmarks.

They fail for one reason. None of them knows what the people you replaced knew.

The bandaid has been to pull that knowledge out of people’s heads and hand it to the model as context. But most of it was never in a structure you could empty out. It’s tacit, and it only ever surfaced in the choices people made. The deal they walked away from. The line of code reverted at 2am. That customer nobody chased, and nobody wrote down why. That’s the real stuff. You can’t write it down as workflows since a lot of it is judgment that is not being stored today.

To bridge this, we are now moving from a world of context → harness → judgment. Context was retrieval, the right pieces in front of the model. Harness was the scaffolding, the loop the model could run inside. Judgment is the last layer and the only one that compounds, everything left behind by every call and correction and reversal made on top of the data.

Every AI application pitch I see right now has the context slide as the moat. Context graphs, the why behind every decision, wire it all into the model. That part is table stakes now, because context is the one thing every competitor is assembling the same way.

The corrections are different. Think of them as a scorecard. Every time a user fixed what the model did, they recorded what right looks like in your business. That scorecard does two jobs nothing else can. It’s the training signal that tunes a rented model to your business. And it’s the test set, the only way to know whether your agent is actually getting better at the job, because no public benchmark measures your workflow. You don’t need to pretrain a model from scratch. Even Cursor didn’t. Its in-house models reportedly sit on top of an open-source base, with the diffs doing the differentiating. Fine-tuning and RL on top of frontier models got cheap enough that a Series B company can run this loop today. Two years ago you needed a lab.

Sarah Guo calls this territory the untrainable: work whose correctness can’t be scored from the outside. The corrections are how you come to own it.

The vertical AI leaders already run this play. Harvey is worth $11B and Legora past $5B, both selling into law, both racing past the standalone tool toward owning the entire matter, because the lawyer’s edits on a draft are the corrections nobody else gets to see. Rogo is doing the same inside finance, capturing how analysts actually build the model and revise the memo.

None of these companies trained a foundation model. They built the harness around a rented one and kept the judgment that runs through it. That’s the thing that compounds.

An incumbent like Figma owns more than SVGs. It has the history of how a design got from v1 to v47 and every version someone killed on the way, a graded record of design taste. Linear holds the argument under every closed ticket. Notion holds the shape a team’s thinking takes across a thousand edits. You can’t export any of this when a competitor tries to pull the customer off, and all of it is the answer a generic model doesn’t have.

Which is why the labs are buying judgment off the shelf. It started with human-labeled data: Mercor is worth $10B paying a network of experts $85 an hour. Meta paid $14B for Scale to own the pipeline. A startup in New York will now clean your apartment for free if you let it film the whole thing, because the robotics teams need to watch a human decide what to do next. And it’s led to many RL environment companies reaching hundreds of millions of dollars in annualized revenue selling this same judgment over long-horizon tasks.

The labs trained on the whole internet and ran out, so now they buy decisions directly.

Dollars

23andMe sat on DNA from fifteen million people, a dataset pharma would kill for in this day & age.. and still went bankrupt last year. If money doesn’t flow through your data, you are just funding a science project. Most founders are still sleeping on this half.

Toast figured this out years ago. A restaurant is basically a payment processor with a kitchen attached, and the payments make Toast far more money than the software running the floor. Ramp took it further. Free corporate card, no fees anywhere, a cent or two skimmed off every dollar of the hundred billion that runs across it. That’s a $32B company built on rounding errors. The free card was just the front door to the interchange, and the swipe fee holds because the network holds. Money even pays you while it sits, collecting float before it ever moves.

Not all money meters have a moat. One popular vibe-coding app reportedly makes about 50% margin on the credits it sells, most of its annualized revenue simply a markup on inference. But a token markup has no network behind it, and your own inference bill falls every quarter, so that margin melts as the models get cheaper. The durable fintech is the kind with lock-in underneath it: payments where the network holds, lending where the data underwrites a loan a bank can’t see.

Payment infrastructure for agents is now finally live. When an agent books the flight and orders the parts and pays the vendor, something has to authorize the charge, carry it, and take a cut. Stripe shipped a protocol for it, and Visa and Mastercard are racing to set the standard. OpenAI already skims a few points off whatever its agents buy. A trillion agents transacting is soon going to be the largest payments economy ever built.

Turn one into the other

The best application companies that will last stop treating the two halves as separate. Judgment is the record of decisions about the work. Fintech is the record of decisions about money. The strongest companies turn one into the other.

Shopify is the best example. It started as store software. Then it attached payments, then Shopify Capital, lending merchants money underwritten by the sales data already flowing through the store, loans a bank couldn’t underwrite on its own. The merchant grows, the sales grow, the data helps with the next loan. Roughly three quarters of Shopify’s revenue now comes from the money side, not the software subscription. You can take that data and offer products only you could offer. Stripe is doing the same with Radar, and Ramp runs the identical loop with spend data and the card.

Rippling is trying to do something similar. Its core object is the employee, so payroll and benefits and devices and the card all draw on one source of truth. It still hasn’t locked anyone out, and Gusto and Deel are growing right alongside it, but the company that owns the object compounds on it while everyone else still tries to assemble it by hand.

Guard the writes

But there’s the “headless” tension that no one has resolved. If all software will be used by agents, to stay useful you have to let the agents in, and to stay alive you can’t let them take everything. Every system of record is being asked to be open enough that an agent can plug in through whatever protocol is on offer, and closed enough that nobody migrates off once they have what they need. Salesforce cut Slack’s data off from Glean and the other outside agents this year. They’re just the first to do it out loud.

The way forward to build a durable application company will be a split. Let the agents read, since reads are cheap and important no matter what you do. Guard the writes.

The place where new judgment gets entered, where people and agents approve and correct and reverse each other. That is the part a rival can’t migrate, at least not easily. What they will scrape is yesterday’s state. The decision being made right now is the only thing that stays yours.

What if Anthropic builds this?

You can argue that the labs are already in the token flow, so where’s the moat? Claude Code sees every command a developer runs and every suggestion they wave off, and ChatGPT watches more decisions in a day than your product logs in a year. If the lab already sees everything, what do they need you for?

My answer: what their tool sees is mostly generic. The same coding and writing every model sees, the exact stuff the labs are racing each other to commoditize. The rare judgment lives deep inside one company: how your hospital reads scans, which deals your firm learned to walk away from. None of it ever touches a lab’s chat box.

And the labs spent years telling enterprises they don’t train on their data. They mean it.. at least I hope so. The tacit knowledge moving through the model inside your product stays yours by contract. They see the trace go by, and they agreed not to keep it. You’re the only one allowed to.

The fintech half is the one they don’t want anyway. A lab will happily take your data. It has no use for your loan book, your fraud losses, or your money-transmitter licenses in forty states. So for your buyers, the data makes you worth buying. The fintech makes you hard to dislodge.

Cursor built the best data engine in its category, and a lab paid sixty billion to own it. That’s either your dream or your warning, and you won’t know which until it’s too late to change course. So build the two things that survive. Accumulate the judgment. Sit in the path of the money. That’s how you outlast the “what if Anthropic builds this” question.

Thanks to Ethan Ding, Hrishi Olickel, Christina Li, Tina He & Vivek Trivedi for reading drafts of this essay!

Sagar Kalarkopp

Jun 19

Really enjoyed this, especially the framing around distribution vs. product.

One thing I keep wondering about: as agents get better at driving other apps’ UIs, doesn’t that push companies to start blocking or tightly gating agent access? If an agent can treat most products as interchangeable backends, that feels like it erodes a lot of the moat for many apps.

Curious how you think about that dynamic—do we end up in a world where agents are the default interface, or one where incumbents push back hard and force everything through controlled APIs and partnerships?

Audrey Choy

Jun 20

eval is the decision and judgement that is the proprietary data of the company

Discussion about this post

Ready for more?