It’s worth recalling today how forecasters in 2016 gave Hillary Clinton a 91-percent chance of winning — a prediction that gave Democrats a warm and fuzzy feeling right up to the eve of Election Day. While the high probability of winning didn’t put Clinton in the White House, the national polls were right in a sense: she did win the popular vote by a couple points as predicted.
But national polls aren’t a stand in for the Electoral College vote. Clearly many state polls weren’t exactly spot on. There were many reasons for this including the wide variety of methodology, specifically education weighting, Trump voter turnout, and human error. It’s worth going through this history now because I don’t believe it’ll be repeated after the vote tally comes in for 2020.
Education Level
The education level of voters has always been a driving force in their presidential vote preference, and this became especially important in the 2016 election. From a polling perspective we have known that less-educated voters are regularly underrepresented in surveys. There had been a hesitancy among many pollsters, myself included, to weight by education (i.e. make a determination of what the education of the electorate “should be”) due mostly to the fact that this is an unknown quantity.
Some demographics are easy to determine to within a point or two of what has happened in past elections. Gender, age, race, party registration — in states where party registration is a thing — and certainly the areas voters live in are all easy to determine accurately and base a future-looking judgment on, as these demographics are available on states’ voter file. This isn’t the case for education.
There’s census data to help get pollsters get an estimate, but the exact number — or even a reasonable range — is very hard to determine as it doesn’t match voter turnout. For example, according to the Census Bureau, 29 percent of Michiganders over the age of 25 have a bachelor’s degree or higher. But according to CNN’s 2016 Presidential Election Exit Polls, 42 percent of Michigan voters had a college degree. Meanwhile in 2018, that figure was 35 percent. That leaves a wide range of ways to weight the data for pollsters and only becomes trickier as you get into congressional and state legislative districts where exit poll data doesn’t exist.
In earlier years the inability to weight by education and the known undercount of lower-educated voters tended not to have a huge effect on the overall results. Any undercount of lower educated voters were typically made up by the higher educated voters and since their level of support for Democratic candidates were similar it didn’t tend to sway the results. In 2016, however, there’s a major shift in how lower-educated voters were voting and now the undercount had a meaningful effect on the results.
Unexpected Voters and Voting Patterns
Just as big as the undercounting of less educated voters was the unexpected turnout patterns that occurred in states like Pennsylvania, Michigan, and Wisconsin. These changes in voter turnout from 2012 to the 2016 election played to President Trump’s advantage and also had an adverse effect on the polling results at the state level. This was a two-fold challenge for pollsters.
On the one side were voters in Democratic strongholds who didn’t turnout for Clinton at the same level they did for President Barack Obama in 2012. These numbers have been talked about by many people, and the short version is in places like Macomb County in Michigan and Dane County and Milwaukee in Wisconsin there was a much lower number of voters turning out and casting ballots. In theory this is something pollsters could have, and should have, picked up. What was a complete surprise, and also nearly impossible to pick up in polling, was the number of voters who turned out in 2016 that hadn’t voted since early 2000s or even the 1990s.
Based on current voter files there were 110,000 voters in Pennsylvania who voted in 2016 but hadn’t voted since 2004 or before. This number is 53,500 in Wisconsin, and in Michigan, it’s a little over 31,500. It’s hard to put much blame on pollsters for not getting these voters into their samples since these 110,000 voters in Pennsylvania make up about 1.3 percent of the total registered voter population in the Keystone State. Finding these voters and having enough on them in the survey to look at the data and understand that this phenomenon was likely to happen would have been cost prohibitive.
Human Error
This is the toughest pill to swallow. The numbers were there. We had data that pointed to what was happening, but we didn’t want to see it. This issue is twofold: 1) when the prevailing wisdom is saying one thing, it’s very difficult to stand up and say “I disagree.” This is especially true in the campaign world where doing what’s expected and losing gets you a better return than sticking your neck on the line and being wrong. This is clearly something that needs to change, but that’s a whole other discussion. And 2) a candidate like Trump was unique to say the least. It was hard for many to view it plausible that he could win states like Wisconsin where a Republican had not won since 1984.
This second part was one of the biggest problems in 2016, where the humans looking at the data were putting their own bias on to the results because, at least on the Democratic side, it was unfathomable that Trump could win.
Looking at 2020
When we take a look at 2020 election polling, we can see that the methodologies have been changing. Pollsters have had to adapt and find new, better ways to reach people due to low response rates to phone calls, and any pollster worth a lick of salt is aware of the problems discussed above. So far we have seen better education weighting, adjustments based on Trump 2016 turnout, and better judgement on data with minimized human error.
A large focus for this election has been to ensure the polls have the proper education levels for the area that the survey was taken in. The change or addition of education weights will most likely be the most common change in methodology. By down-weighting college graduates we can expect to see more accuracy among the underrepresented less-educated voters. This doesn’t mean that the polling will be perfect. Indeed, there are a multitude of other factors that will affect the accuracy.
Another issue in the 2016 methodology that’ll most likely not be an issue this election is unexpected Trump turnout. This unexpected Trump voter demographic was unanticipated in 2016 due to the fact they hadn’t voted in most of the previous elections. This contributed to them not being included in likely voter samples which allowed their vote to be unpredicted. Since they voted in 2016, they’ll now be included in the likely voter models. Could there be more voters like this out there? Sure, but the numbers will be small at best.
As for the human element, smart shops adjusted the way they look at the data to ensure that the human element is minimized. Don’t go in trying to prove your world view, analyze the data and determine what the numbers are saying.
We’ve been pouring over numbers and haven’t seen any “shy Trump voters” this election. People who are voting for Trump aren’t hiding their preference from pollsters and exaggerating Biden's lead. There are many things the polling industry has adjusted to allow for a more accurate predicted result, but there are always things we cannot factor in. We cannot factor in the voter suppression that’s happening in the country or how many mailed in ballots won’t be counted.
Polling is not perfect — what is? But the industry has learned from 2016 and adjusted accordingly. This isn’t to say every poll and every pollster should be looked at as the same, but reputable firms and outlets have adjusted and we’re expecting to see accurate results based on what we can control.
Stefan Hankin is the founder of Lincoln Park Strategies