Skip to main content
DetGaao

Keep Your Tests Alive

Dropbox shut off paid search and mobile ads across the US for a month to find out what its advertising actually drives. The tidy number making the rounds misses the part that bites later. The real damage from a blackout test does not surface for months.

May 21, 20265 min readMichael Fellner

Came across a post on LinkedIn late last night that reminded me of a large marketing mix disruption test we ran in one of my previous roles.

It started the way these things usually start, in a budget meeting, with an executive asking how certain we were that the ROI on our paid channels was "real." One question led to another, until: "Should we just turn it all off and see what happens?"

I have no idea whether the lead-up at Dropbox looked anything like ours, but they just ran a very similar experiment, and unlike us, they are sharing what they found. I cannot share mine, and even if I could, measurement itself has changed too much since then to compare cleanly.

In April 2025, Dropbox went dark on purpose. They shut off paid search and mobile app advertising across the United States for a full month, held matched international markets as a control, and measured what their ads were actually buying. Not what the dashboards claimed. What moved when the spend stopped.

What the study found

The version I saw on that post compresses the result into a single incrementality figure, collapsing the two channels into one. It makes for a stronger hook, even if it loses the part that matters. The actual Dropbox paper had no need to compete for likes, and it showed that the two channels behaved nothing alike.

Mobile advertising turned out to be worth far less than its click attribution claimed, so cutting it was the right call. Paid search looked durable, and at first glance it held up. But a large share of that apparent strength was demand rerouting into organic search. Turn off the paid result and people typed the brand into Google anyway, then clicked the free link instead. Same intent, different door.

Applying all of it, Dropbox moved about $25M of its FY25 performance budget out of the segments that were not pulling their weight, and modeled the full-year hit to new recurring revenue at roughly $8M. That is a substantial result, and a disciplined one. It is also where most people stop reading. I almost stopped writing. The problem is that the part worth knowing starts here.

A blackout can look free

When you cut a channel, demand does not vanish on day one. It reroutes. Organic catches some, direct catches some, and the topline barely flinches for a few weeks. A few weeks is exactly the window a 30-day test reports on. So the test comes back clean. The spreadsheet says you saved money. The ROI was not real. The exec was right.

As I said up top, I have run, or been part of, these tests before. Some were probably close in size, some were smaller, and some were probably too small. Regardless, what I learned through all of them is that the damage does not always show up in the first 30 days.

On one of the larger tests I was part of, the decision to cut looked correct for months. Then around month six we noticed the growth categories starting to stall, and new SKUs were not getting the traction they normally would. The channel we had throttled was doing quiet work we never credited it for: introducing the brand and its new products to people who were not searching for us yet. None of that lands in a revenue delta measured over thirty days. It shows up two or three quarters later, in the part of the business that depends on discovery.

And it gets harder. By month six you have probably run three other large-scale tests. Who is to say one of those did not move the discovery numbers too?

The part nobody writes about

These tests are not for everyone. Running one takes nerve, and a culture that can absorb the result.

Going dark on your strongest channel means walking into a board meeting and saying something like: "We turned off half of our best-performing marketing on purpose, to find out whether it actually works. Good news, it works. Bad news, revenue is down significantly this month." Most executive teams do not have the appetite for that sentence, and it is not a character flaw. It is a real cost with a real owner.

Then there is the politics. When you cut a channel in half, the manager responsible for those millions in spend has just watched their budget, and their contribution to sales, cut with it. (Can you say bonus?) Nobody wants to be the line item that proves the spend was not incremental. Running the test fair, reading it straight, not quietly nudging it toward the answer that protects the org chart, takes integrity from everyone holding a piece of it. That is not a measurement problem, it's a people problem. It decides whether the data you get back is worth anything.

Stick with the analytics

None of this is an argument against testing. Incrementality work is some of the most useful measurement we have, and last-touch attribution in particular overstates what your ads are responsible for, often by several times over. Dropbox was right to run the experiment and right to act on it. The discipline is in how long you keep watching.

If you run a mix disruption, hold the question open past the window that makes the test convenient. Do not just watch the channel you cut. Watch the categories and the new products it might have been feeding, and watch them for two or three quarters, not thirty days. Keep the analytics running well after the test is officially closed.

A channel that looks free to cut in May can quietly kill your customer acquisition in fall. Also, the number that makes for a great social media hook is rarely the whole story, and the expensive part is often the one that takes six months to show up.

Want help thinking through what this changes for your marketing operations?

Start a conversation