Why We Didn’t Need a Representative Sample to Predict a Trump Victory

By Vuk Vukovic

06/07/2017 03:05 PM EDT

The results of the 2016 US presidential elections caught a huge number of pundits and pollsters by surprise. All the major poll-based forecasts, a lot of prediction models, the otherwise very precise prediction markets (even the super-forecaster crowd) all got it wrong. They estimated high probabilities for a Clinton victory, even though some were more careful than others.

Our prediction survey, however, was spot on. We at Oraclum Intelligence Systems, a Cambridge-based data company, predicted a Trump victory, and we called all the major swing states in his favour: Pennsylvania (which no single pollster gave to him), Florida, North Carolina, and Ohio. We correctly called Virginia, Nevada, Colorado, and New Mexico for Clinton, along with the usual Red states and Blue states to each.

We only missed three – New Hampshire, Michigan, and Wisconsin (although for Wisconsin we didn’t have enough survey respondents to make our own prediction so we had to use the average of polls instead). Our only misses were Michigan, where our predictor gave Clinton a 0.5 point lead, and New Hampshire where it gave Trump a 1 point lead.

Every other state, although close, we called correctly. In Florida, for example, we predicted 49.9% for Trump vs. 47.3% for Clinton. The actual vote percentages went In the end it was 49.1 to 47.7. In Pennsylvania we have 48.2% to Trump vs. 46.7 for Clinton (it was 48.8. to 47.6. in the end). In North Carolina our method said 51% to Trump vs. 43.5% for Clinton (Clinton got a bit more, 46.7, but Trump was spot on at 50.5%). Our model even gave Clinton a higher chance to win the overall vote share than the electoral vote, which also proved to be correct. Overall for each swing state, on average, our method was correct within a single percentage point margin. Read the full prediction here.

It was a big risk to ‘swim against the current’ with our prediction, particularly in the US where the major predictors and pollsters were always so good at making correct forecasts. But we were convinced that the method was correct even though it offered, at first glance, very surprising results.

How We Developed Our Prediction

We used a different type of survey – a user-generated prediction survey. The established poll-based forecasters all usually pick up the ‘low-hanging fruit’ polling data and run it through some elaborate model. We, on the other hand, needed to get actual people to come to our site and take the time to make a prediction for their state. So instead of just picking up raw data and twisting it as much as we can, we needed to build our own dataset. Our very small sample size (only 450 participants) indicates that our method does not require a representative sample to make a good prediction, nor is it sensitive to typical problems of online polls such as self-selection.

Even with a small and unrepresentative sample the method works. Why? Our survey asks the respondents not only who they intend to vote for, but also who they think will win, by what margin, as well as their view on who other people think will win. It is essentially a wisdom of crowds concept adjusted for the question on groupthink.

The wisdom of crowds is not a new thing, it has been tried before. But even pure wisdom of crowds is still not enough to deliver a correct prediction. The reason is because people can fall victim to group bias if their only source of information are polls and like-minded friends. We used social networks to overcome this effect. Using Facebook and Twitter, we were able to recognize people clearly in groups where this bias was strong. People living in bubbles tend to only see one version of the truth – their own. This means they’re likely to be bad forecasters. On the other hand, people living in more diverse groups are exposed to both sides of the argument. This means they are likely to be much better forecasters, so we value their opinions more. By performing this network analysis of voter preferences we are able to eliminate groupthink bias from our forecasts and therefore eliminate the bias from polling.

Our poll is an online poll and all online polls have their problems with sampling and self-selection of respondents. This makes them biased towards particular voter groups – like the young, better educated, and urban populations. Pollsters are trying to compensate for these biases, by adjusting their results for various socio-demographic characteristics. However the final result can still be dubious, as shown recently in Florida when four different pollsters gave four different results based on the same dataset. Our solution to these traditional issues with online polls is the very idea of combining the wisdom of crowds with a network analysis to remove the selection bias. Asking a respondent how people around them think means that we are including a group of people instead of an individual. So all we have to do is to correct for each groups bias, which is somewhat easier than correcting for individual bias.

Finally, to get our prediction, we performed 100,000 simulations to get the most probable outcome, and it was the one in which Donald Trump takes it all – Florida, Pennsylvania, North Carolina, and Ohio, and wins the presidency by electoral college vote.

Previous to the US election we tested the same method on the Brexit referendum and it provided the same results. We had 6 models tested, three of which showed Leave and three of which showed Remain. We did not bother with being correct at the time, we just wanted to see which method was the best one. The one method that gave us a 51.3% for Leave is the same one that predicted the victory for Donald Trump, and the victory of Emmanuel Macron in France earlier this year. Next stop are the German federal elections in September 2017.

The logic behind it

The puzzling question is why this approach works at all. When people make choices in elections, they usually succumb to their standard ideological or otherwise embedded preferences. However, they also carry an internal signal which tells them how much chance their preferred choice has. In other words, they think about how other people will vote. This is why people tend to vote strategically and do not always pick their first choice, but opt for the second or third, only to prevent their least preferred option from winning.

Each individual therefore holds some prior knowledge as to who he or she thinks will win. This knowledge can be based on current polls, or drawn from the information held by their friends and people they find more informed about politics. Based on this it is possible to draw upon the wisdom of crowds where one searches for informed individuals thus bypassing the necessity of having to compile a representative sample.

However, what if the crowd is systematically biased? For example, many in the UK believed that the 2015 election would yield a hung parliament. In other words, information from the polls is creating a distorted perception of reality which is returned back to the crowd biasing their internal perception. To overcome this, we need to see how much individuals within the crowd are diverging from opinion polls, but also from their internal networks of friends. This is the key to the success of our predictions.

Vuk Vukovic is the director and co-founder of a Cambridge-based data company Oraclum Intelligence Systems (together with Dr Dejan Vinkovic and Prof Dr Mile Sikic). He is also a PhD student of politics at the University of Oxford.

Why We Didn’t Need a Representative Sample to Predict a Trump Victory

Upcoming Events

Direct Mail Summit

2027 Reed Awards & Conference

Subscribe To Our Newsletter

Follow us

Why We Didn’t Need a Representative Sample to Predict a Trump Victory

Become a member and get access to exclusive content.

Footer

Upcoming Events

Direct Mail Summit

2027 Reed Awards & Conference

Subscribe To Our Newsletter

Follow us