Election polling is central to allocating campaign resources and to effective fundraising. And probabilistic polling, where, in theory, every person in a target population has a known probability of being sampled and responding to a poll, is considered by many to be the only meaningful type of polling.
The dominance of probabilistic polling is apocryphally attributed to the Literary Digest fail of 1936, when the magazine sent out more than 10 million postcards to their subscribers and automobile owners and got back over 2 million responses; they prominently published the raw tally of responses as a poll result that predicted a Landon victory over Roosevelt.
Shocking to no one today (and quite frankly not everyone at the time), targeting an unrepresentative sample, asking them to opt-in to a poll, and publishing the raw results, resulted in a really inaccurate forecast. During that same campaign, pollsters such as George Gallup, Archibald Crossley, and Elmo Roper used random draws from representative samples to predict the election outcome with reasonable accuracy.
So why are we revisiting the basis of polling now? First, probabilistic polling is becoming a lot less random and representative, thus less cost-effective and accurate. Second, in a forthcoming academic paper we show that with proper statistical adjustment non-representative and non-random polls can yield accurate forecasts very cost-effectively.
Probabilistic polling suffers from increasing selection issues. As communication progresses from landlines to cellphones to the Internet, the ability to consider a full sample of the target population has become increasingly complicated.
Just a few years ago a pollster could consider the phonebook of landlines as the full sample (recognizing that people without phones generally didn’t bother to vote), but now the pollster needs to create a mixture of modes that will still miss segments of likely voters.
Further, with caller-ID and new norms (not to mention survey saturation) there is also a declining participation rate among those who are reached. Thus, probabilistic pollsters are starting with an unrepresentative sample and then asking them to opt-in to their poll. Sound familiar?
The good news is that once we accept that our raw data can be highly non-representative, we can make great progress by matching our sample to the population based on demographics.
We tested this adjustment on a seemingly ridiculously non-representative poll conducted on the Xbox gaming platform during the 45 days leading up to the 2012 U.S. presidential election. Beyond being fully opt-in, the Xbox sample is highly biased in two key demographic dimensions: sex and age. But we have two key attributes that allow successful adjustment: huge sample size and repeated responses.
We conducted 750,148 interviews with 345,858 unique respondents. Each interview included between three and five questions, one of which is always the ubiquitous voter intention question for the 2012 presidential election. Further, every respondent provided us nine key demographics prior to their first poll.
It is standard practice to weight individual responses (sometimes quite heavily) to make the poll representative of the target population. We do something more robust, but equally transparent. We look at every respondent and break them into their core demographics: age, sex, ethnicity, income, state and party ID.
For each day’s data, we ran a multilevel regression to determine the probability that a person with any given set of demographics would express a preference for Obama, Romney or another candidate. In other words, we take all of these core demographics and see how they correlate with poll results on a given day, and from that we can look at any demographic cell (for example, a combination of demographics such as: 18-29, female, Asian, middle income, residing in Ohio, Republican), and estimate the probability that a person in that cell would have supported each candidate on that day.
These estimates are strongly model-based; certainly we don’t have enough respondents to get good estimates from each cell alone. All answers in our poll help identify all possible answers, while traditional polling silos each response.
But that’s OK: political scientists have many decades of experience constructing such regression models, and they’ve been validated in various ways over the years. It is by incorporating this analytical understanding that we are able to surpass traditional methods of survey analysis.
Having produced estimates for each demographic cell in the population, we then poststratify those probabilities. That is, by digging into exit polling data for the previous presidential election, we first estimate what percentage of the target population lies in each cell, and then weight that group accordingly.
For a presidential election, modern computing power allows us to regularly update this for hundreds of thousands of different cells. We affectionately call this method Mister P (long for MRP, which is short for multilevel regression and poststratifcation).
How did our method work in practice? Our daily polls of state-by-state and national elections matched up very closely with the Pollster and RealClearPolitics rolling averages—but without requiring the aggregations of dozens of professionally-conducted surveys, and instead just using an opt-in poll from a video game console.
Further, our estimated vote shares by demographic combinations were nearly perfect. For example, we were of by just one percentage point on the percent of women aged 65 and older who would vote for President Obama.
How is this possible when we had such a low percentage of older and women respondents in our sample?
First, we actually had a greater number of people in this group than any other single poll, a benefit of polling 20,000 respondents per day. Second, our method draws power from other demographics such as race, income and state where we were more representative.
In addition to accuracy, measuring public opinion also needs to be timely and cost-effective in the era of 24-hour news cycles. Though our data were collected on a proprietary platform, there are many venues for researchers to quickly and cost-effectively gather data similar to ours from large non-representative samples. Moreover, the analysis produces forecasts that are both relevant and timely, as they can be updated faster and more regularly than standard probabilistic polls.
The greatest impact of non-representative polling will likely not be for presidential elections, but rather for smaller, local elections, where it is too costly to run traditional polling.
This type of analysis could be used to have regularly updating estimates for congressional or even state elections. Non-representative polls could also supplement traditional polls by offering preliminary results at shorter intervals, and at a fraction of the cost.
Finally, when there is a need to identify and track pivotal events that affect public opinion, non-representative polling (again coupled with aggressive model-based adjustment on demographics and partisanship) offers the possibility of cost-effective continuous data collection.
Traditional probabilistic polling will certainly continue to be an invaluable tool for the foreseeable future. However, there have been a few technological advances in the last 75 years in both data collection (the Internet) and analytics (the computer) that beg us to reconsider non-representative polling.
David Rothschild is an economist with Microsoft Research. The research paper that formed the basis for this piece was co-authored with Andrew Gelman, Sharad Goel, and Wei Wang.