Forecasting the Election: Polling Models vs. Betting Markets
One Will be Wrong; The Other Will be Less Wrong
There is less than a week to go before election day, and the most unifying trait shared between the two parties is a mutual desire for outcome certainty. The 2016 race proved that there are no sure things in politics. Low turnout, low Democratic enthusiasm, and a proportionally high share of undecided voters tipped an apparent Clinton lead into a narrow Trump victory. Now, fast forward to the eve of the final ballot getting cast in 2020, and seemingly the only thing higher than Joe Biden's poll position is the share of the electorate that believes that pre-election day polls are meaningless. According to the FiveThirtyEight election model forecast, Joe Biden currently has an 89% chance of winning the electoral college vs. President Trump's 11%. The advent of political betting markets, such as PredictIt, allows researchers to quantify just how much skepticism surrounds polling data. Implied betting odds instead suggest that Biden has roughly a 62% chance of victory compared to Trump's 38%— a net difference of 27% between the two models. Using history and theory as a guide, we try to make sense of these disparities and offer a sober take on which method better represents reality.
Across the Fifty U.S. States, the District of Columbia, two congressional districts in Maine, and three congressional districts in Nebraska, a total of 56 simultaneous races combine to award electors and select the Presidential winner. Assigning equal weighting to all 56 races, the FiveThirtyEight model, on average, gives Joe Biden an additional win probability of 5.6% per contest. However, differences are more appreciable in key battleground states. Topping the list of probability discrepancies is Pennsylvania, where FiveThirtyEight estimates that President Trump has just over a 14% chance of re-election, 25% lower than the implied PredictIt betting odds of 39%. As of this writing, eleven states have estimated probability differences over 10%, all of which have higher Trump re-election odds on PredictIt than FiveThirtyEight. These include Pennsylvania, Michigan, Wisconsin, Florida, Minnesota, Arizona, North Carolina, Nevada, Georgia, New Hampshire, Ohio. Unsurprisingly, the above list reads a lot like a final-week campaign itinerary for both candidates.
There are three conceivable explanations for why the probability differences might vary as much as they do in battleground states:
Sampling / Polling Error
FiveThirtyEight Modelling Error
PredictIt Pricing Inefficiencies / Investor Herding
While we won’t know which is the best explanation until after November 3rd (or possibly later), peering back to see how these platforms predicted the 2016 race might give clues about their predictive accuracy in the week ahead.
2016 In Review
In the final FiveThirtyEight 2016 election forecast, the model estimated that then-candidate Trump had a little less than a 29% chance of winning the electoral college. For the national popular vote forecast, the FiveThirtyEight forecast had Clinton capturing 48.5% of the voting public vs. 44.9% for Trump. Despite perceptions of inaccuracy, these models came very close to the final tally (well within a single standard deviation). In total, Hillary Clinton won 48.0%, and Donald Trump won 45.9% of all votes cast— both landed within one percentage point of the FiveThirtyEight model.
PredictIt, on the other hand, had final implied electoral winning probabilities for Clinton and Trump at just above 79% and slightly below 21%, respectively. While both platforms had Trump as the clear underdog, FiveThirtyEight's model suggested the likelihood of an Election Day upset was higher than the betting odds would reflect.
FiveThirtyEight's model tended to better account for state-level surprises as well. In Pennsylvania and Wisconsin, FiveThirtyEight gave candidate Trump an additional 2.3% and 3.0% in winning probability, respectively. However, PredictIt's implied betting odds did give Trump a slightly higher chance in Michigan, measuring 2.7% higher win probability than FiveThirtyEight's model. The state with the starkest difference was Florida. Betting odds had the possibility of a Trump win in the Sunshine State at just below 33%. FiveThirtyEight's model instead had Florida in the tossup category, giving candidate Trump about a 45% chance of winning.
Though neither platform ever reflected a 2016 Trump victory as a probabilistically likely event, at both the state-level and national races, the FiveThirtyEight model routinely did a better job of identifying competitive races that pundits (and campaigns for that matter) had otherwise taken for granted. Returning to the three possible explanations of why there are wide forecast margins in battleground states, the relative outperformance of the FiveThirtyEight prediction model makes it a less likely causal candidate. Polling data, by most accounts, missed the mark in 2016. It wasn't just that the polls were off; it's that they tended to be wrong in similar ways to one another. After four years of conducting statistical autopsies and recalibrating methodologies, pollsters claim to have made significant adjustments in 2020. One change is the widescale adoption of education-level weighted sampling, among others. Whether these succeed in reducing sampling correlations and, by extension, making pre-election polls more accurate remains to be seen.
Market herding, or the tendency for many investors to think in similar ways and distort prices, is likely a partial cause for the battleground forecast margins. Four years ago, PredictIt markets overweighted the likelihood of a Clinton victory by significant margins in Florida and at the national level. Further, market functionality relies on depth. A wide range of investors with varying opinions of fair value is a necessary pre-condition to efficient risk pricing. The relative size of PredictIt's state-level election markets, which all tend to be less than $2.5 million in total market cap, makes it more vulnerable to a material mispricing.
Come election night, and it will be clear whether statistical- or market-based prediction models had a better pulse on the realities of the 2020 electorate. We here at Chandan will follow up with an addendum to this piece in the days to follow, analyzing where forecast models were most and least accurate. Until then, we leave you with the wisdom of George Box: "all models are wrong, but some are useful."