Experimental designs are like asking a Genie for a wish. You’ll be sorry if you don’t word it carefully.
Let’s say we want to know whether cable ads targeting specific segments of the population or online pre-roll ads are more effective at moving voters’ preferences. To explore this hypothetical, our campaign takes a chunk of money, splits it in two, and tells the TV and online guys to spend it however they want.
A quant jock, or quantitative analyst, will randomly assigns each of the target audiences into treatment and control groups. The quant jock then gives the treatment targets to the TV and online guys to hit with the ads. Those in the control groups are not hit with anything.
So far, so good—it seems. The campaign pros control how the money is spent so they have the best chance to move voters. The scientists have control over the random assignment, so they know that the results will give the true impact of the TV and online ads. Everyone’s happy.
After a week, the campaign polls from the full universe of voters to measure the change in vote preferences from the cable and online ads. It turns out that cable beat online. In fact, cable increased support for their candidate, and the online ads decreased it. Being data-driven, the campaign slashes the online budget and doubles the cable budget Good call? Not at all. I didn’t tell you about two crucial aspects of the experiment's design.
1) Both the cable TV and online guys had three 30-second ads, but ran them in different proportions. In fact, the online guys only ran ads one and two, while the cable guys ran all three. That’s strike one against the experiment. We don’t know if targeted cable did better because a) it’s just a better mode of delivery, b) ad number three is fantastic and responsible for all of the impact or c) some combination of these.
We need to control for the message if we want discover how effective particular modes of delivery are compared to each other. All we know here is that one mix of ads on targeted cable worked, and one mix of online ads didn’t. That’s not terribly useful for developing campaign strategy.
2) Voters were randomly assigned to the treatment groups by zip code (not individually), and different segments were targeted within these areas. That’s strike two against the experiment. That’s fine on some purely technical grounds: the random assignment was successful, and we obtained the true difference between the vote preference in the control and treatment groups.
But practically this means that we can’t know why cable worked and online didn’t, even if all three ads had run in the same proportions to the same segments. Why? Because the type of voter who’s likely to see an ad on cable is different from the voter likely to see a pre-roll ad online. In fact, the online and cable guys purposefully targeted different segments. We don’t know which kinds of voters viewed which ad in any of the zip codes.
Maybe targeted cable worked because the ads were effective with the older female voters who were more likely to see them in the first place. And maybe online the ads were just as effective with those voters, but not with the younger male voters who were more likely to have seen them.
Even worse, maybe the first ad worked with half the voters online, caused backlash with the other half, and the second ad was just a dud. The same could be true for cable, but with a different population mix, we’d see a net positive movement for our candidate instead of a net negative.
We have absolutely no idea whether targeted cable ad-for-ad and voter-for-voter works better than online pre-roll. We have absolutely no idea which ad works best overall, or with particular voters. We have no idea how to shift more votes or how to do it more efficiently.
Here’s the bottom line: the campaign ran an experiment that’s correct on technicals but a disaster in practice.
Designing an experiment is a science and an art. It’s difficult, and there are always tradeoffs. But a good design will answer specific, important questions. In contrast, a bad design is simply a waste of time—no statistical Genie can answer what you failed to ask in the first place.
Adam B. Schaeffer, Ph.D., is the Director of Research and co-founder of Evolving Strategies, a data and analytics firm dedicated to understanding human behavior through the creative application of randomized-controlled experiments.