What is an AI distillation attack?

A distillation attack copies a model's capabilities without ever touching its code or weights. The attacker sends the model a huge volume of queries, records the answers, and trains a competing model on those question-and-answer pairs. Anthropic told the US Senate that a rival aimed more than 28.8 million such queries through roughly 25,000 fake accounts over about six weeks at Claude, targeting its strongest skills: writing code and reasoning like an agent.

Did Anthropic really pay 1.5 billion dollars over pirated books?

Yes. In 2025 Anthropic agreed to pay about 1.5 billion dollars to settle Bartz v. Anthropic, the largest copyright settlement in US history, over books it had downloaded from pirated shadow libraries to train Claude. A judge drew a line: training on books Anthropic had legally bought was fair use; downloading and keeping the pirated copies was not.

If everything explainable can be copied, how does a business protect its advantage?

Assume anything you can explain clearly can be copied, your pricing, your process, your positioning, the same way a frontier model's capabilities were copied by asking. The advantages that survive do not live in a queryable form: the trust of a customer who will not switch over a price difference, the name a person reaches for without comparing, and a verified identity that proves you are the source of your own claims. DETGAAO builds that last layer, a machine-readable, client-owned identity an AI agent can find, verify, and trust.

What makes a business trustworthy to AI agents that do the buying?

An AI agent does not read a business the way a person does. It looks for structured, confirmable proof of who the business is and what it offers, and skips the ones it cannot verify. A competitor can copy what you say; a verified, machine-readable identity, the kind DETGAAO builds, lets an agent confirm a business is real and the source of its own claims before it connects or buys.

What is model distillation, and is it the same as the OpenAI–DeepSeek dispute?

Model distillation trains a smaller or cheaper model to imitate a larger one by learning from its outputs. It is a normal training technique, but it turns into a dispute when the outputs are taken without permission. In early 2025 OpenAI accused DeepSeek of pulling outputs through its API to train a cheaper competing model, the same mechanism Anthropic now pins on a rival: a frontier capability copied by asking, not by breaking in.

← Signal

Your business is easier to copy than Claude

Anthropic told the Senate a rival copied Claude by querying it 28.8 million times. The same Claude it paid 1.5 billion dollars to settle for building on pirated books. If a frontier lab cannot stop its work being copied, your pricing, process, and positioning do not stand a chance.

June 25, 20265 min readMichael Fellner

I'm on Claude (and Codex) for hours a day, mostly for two things: writing code and running agents. So a story from this week got my attention. Anthropic told the US Senate that a rival had copied those exact two capabilities, not by breaking in, but by using Claude until it had enough to build its own. The effort is kind of impressive, no?

Anthropic is treating this as theft, and maybe it was. Maybe it's also a 'sign' of what is copyable in any business, including yours, and I bet most of yours is more exposed than Claude was.

The theft they took to the Senate

In a letter to the Senate Banking Committee, Anthropic described what it called the largest known distillation attack against it to date. Nearly 25,000 fake accounts, more than 28.8 million queries to Claude over about six weeks, aimed squarely at the model's most valuable skills: writing software and reasoning like an agent. No server was breached. No model file walked out the door. Someone asked Claude millions of questions and trained their own model on the answers. Again, 'the effort' ...

Anthropic is pointing at Alibaba and its Qwen team. Alibaba has not responded, so for now this is one company's account, not a fact. The mechanism is not in dispute, because it is the whole business model of a chatbot. You ask, it answers. Do that 28.8 million times with intent and you have a teacher.

It is not even the first time. In early 2025 OpenAI accused DeepSeek of the same move, pulling outputs through the API to train a cheaper model that scored close to its own. This is becoming the normal way a frontier capability leaks. The thing you sell by answering gets copied by asking.

The product built by taking

Claude got built by taking, too.

To train the model, Anthropic fed it an enormous pile of other people's work, and last year it paid 1.5 billion dollars to settle with authors whose books it had pirated from shadow libraries to do it. That is the largest copyright settlement in US history. The judge drew a clear line. Training on books Anthropic had legally bought was fair use. Downloading and keeping pirated copies was not. The capability was assembled out of everyone's work, then sold back to the people it was assembled from. To be clear what I am saying, a company that built its product on the 'free' knowledge of generations (to monetize it) is now complaining that someone is stealing that exact knowledge. The irony is almost too on the nose. Also, let's be very clear that is not an Anthropic 'issue'. It's true for all the frontier models.

Do I still pay for the tools? Happily (ok maybe not happily), and I will keep paying. Again, ironic, a company that built its product by taking is now at the Senate because someone took from it by simply 'talking to it.' It is copies of copies all the way down, and every layer is annoyed at the one above it.

Why your business is the easy target

Step out of the AI fight, because you are not Anthropic and your rival is not Alibaba. The principle scales straight down to a ten-person company, and it gets worse on the way down, not better.

A frontier lab spent billions and years building Claude, and it still could not keep its most valuable capability from being copied by a determined customer with 25,000 logins. Your edge is not protected by billions of dollars of training. Most of it sits in plain sight. Everything you can explain clearly, someone can copy: your pricing logic, your onboarding flow, the positioning your whole homepage exists to teach a stranger in ten seconds, the playbook your best salesperson runs from memory. The clearer and more legible you make all of it, the faster it travels to whoever is paying attention.

This is the quiet tax on every "here is exactly how we do it" case study and every transparent pricing page. None of it is wrong to publish. But the more explainable your advantage is, the more copyable it is, and explainable advantages are most of what a marketing team produces.

So are you doomed?

So the useful question is not how to hide. You cannot, and hiding kills the marketing that grows you. The question is what is left after a competitor copies everything they can see.

A few things. The trust of a customer who has been burned before and will not switch over a price difference. The relationship where someone calls you first because they know you will pick up. The name a person reaches for without comparing, the way an agent takes a brand off the price-sorted list the moment a human asks for it by name. These survive copying because they do not live in anything queryable. No API returns the reason customers trust you.

There is one more, and most businesses have not built it yet: a verified, machine-readable identity. As more of the buying moves to agents, the thing that cannot be faked is proof that you are the business you claim to be, in a format software can check. A competitor can copy what you say. They have a much harder time copying a confirmed record that you are the one who said it, and that you are real. It is one of the few edges that gets stronger as copying gets easier, which is the argument for building it before you need it.

Name it in a line

None of this means stop publishing, stop explaining, stop teaching, or stop selling in the open. It's how it works. It means knowing your moat and which part is free mentorship for the competition.

So one test, and I mean it as a real one. If a competitor copied everything about your business they could see tomorrow, the pricing, the process, the words on the site, what would you still have that they would not? If you can name it in a line, protect it with everything you have, and make it legible enough that a machine can confirm it. If you cannot name it in a line, it was never a moat. It was a head start, and the gap is closing faster than it used to.

Anthropic (OpenAI) learned that the expensive way, in Senate hearings... The least the rest of us can do is learn from their lessons.

A dark infographic titled 'DISTILLATION ATTACK — HOW TO COPY A FRONTIER AI', with the subhead 'no break-in. just 28.8 million questions.' A vertical sequence of steps in red and white: 25,000 fake accounts made to look like ordinary users; 28.8M queries over 6 weeks, from Apr 22 to Jun 5, 2026, targeting writing code and agentic reasoning; every answer recorded, questions in, answers out, that is training data; train a rival model, the capability rebuilt, no servers breached. A red callout box at the bottom reads: 'The model they copied was itself built on other people's work. Anthropic paid $1.5B in 2025 to settle for the books it pirated to train it.' A footer reads 'what survives being copied? · detgaao.com/signal · Source: Anthropic letter to the US Senate, Jun 2026.'

Frequently asked questions

What is an AI distillation attack?: A distillation attack copies a model's capabilities without ever touching its code or weights. The attacker sends the model a huge volume of queries, records the answers, and trains a competing model on those question-and-answer pairs. Anthropic told the US Senate that a rival aimed more than 28.8 million such queries through roughly 25,000 fake accounts over about six weeks at Claude, targeting its strongest skills: writing code and reasoning like an agent.
Did Anthropic really pay 1.5 billion dollars over pirated books?: Yes. In 2025 Anthropic agreed to pay about 1.5 billion dollars to settle Bartz v. Anthropic, the largest copyright settlement in US history, over books it had downloaded from pirated shadow libraries to train Claude. A judge drew a line: training on books Anthropic had legally bought was fair use; downloading and keeping the pirated copies was not.
If everything explainable can be copied, how does a business protect its advantage?: Assume anything you can explain clearly can be copied, your pricing, your process, your positioning, the same way a frontier model's capabilities were copied by asking. The advantages that survive do not live in a queryable form: the trust of a customer who will not switch over a price difference, the name a person reaches for without comparing, and a verified identity that proves you are the source of your own claims. DETGAAO builds that last layer, a machine-readable, client-owned identity an AI agent can find, verify, and trust.
What makes a business trustworthy to AI agents that do the buying?: An AI agent does not read a business the way a person does. It looks for structured, confirmable proof of who the business is and what it offers, and skips the ones it cannot verify. A competitor can copy what you say; a verified, machine-readable identity, the kind DETGAAO builds, lets an agent confirm a business is real and the source of its own claims before it connects or buys.
What is model distillation, and is it the same as the OpenAI–DeepSeek dispute?: Model distillation trains a smaller or cheaper model to imitate a larger one by learning from its outputs. It is a normal training technique, but it turns into a dispute when the outputs are taken without permission. In early 2025 OpenAI accused DeepSeek of pulling outputs through its API to train a cheaper competing model, the same mechanism Anthropic now pins on a rival: a frontier capability copied by asking, not by breaking in.

Want help thinking through what this changes for your marketing operations?

Start a conversation