Chip Somodevilla/Getty Images
On October 18, Sam Wang thought it was over. So much so that he promised - on Twitter, for the world to see - that he'd eat a bug if Donald Trump earned more than 240 electoral votes.
On November 12, he was eating a bug on live television.
Wang, a top election forecaster and professor at Princeton University, had given Democratic nominee Hillary Clinton a 98% to 99% chance of winning in the days leading up to the election. Of course, it didn't quite end up that way.
"To state the obvious, this is not random sampling error because it was shared across all pollsters in the same direction. This is some kind of large systematic error, far larger than typically occurs in a presidential election year," Wang said.
For the second-consecutive election cycle, the polling and prognostication industry is reckoning with how it got it wrong - and why we're talking about a President-elect Donald Trump when virtually all available data pointed to President-elect Hillary Clinton.
National polls weren't too far off from the eventual result - after all, Clinton is on her way to a near 2-point victory in the popular vote.
But polls in key battleground states like Michigan, Wisconsin, and Pennsylvania - states Republicans hadn't won at the presidential level since the 1980s - belied the state of the race. Clinton had leads so healthy in Wisconsin, for instance, that she entered Election Day with a 6.5-point average advantage in the state.
She lost there by a point. She lost the trio of states. And she lost the presidency.
"Polls might not be capable of predicting elections," Patrick Murray, the head of Monmouth University's polling institute, told Business Insider.
Murray's final Pennsylvania poll showed Clinton with a 4-point lead with a 4.9 percentage point margin of error, which still was not big enough to capture the margin - 1.2 points - by which Trump would win the state.
"It's an imprecise science," he said. "It's a science with a margin."
With a level of distrust in statistics and polling, heightened by Trump's months of railing against "dishonest" polls and media outlets, Murray said the result of the election only adds to how "very hard" it is "to fight against that."
"Do we ever win back the public trust, or in fact, should you even try?" he said. "What did we miss? We know it's systematic. Can we correct for it?"
But Murray already had developed a theory for what happened: "Non-response among a major core of Trump voters."
What happened?
Murray's theory looked as if it had some legs.
For instance, in Pennsylvania and Michigan, Clinton's total percentage of the final vote fell within 2 points of what had been predicted for her in the polling average. In Wisconsin, Clinton wound up with a near spot-on level of support as the polls had forecast, outperforming the polling average by 0.1 points.
But Trump outperformed his polling average by 4.5 points in Pennsylvania, 5.6 points in Michigan, and an unheard-of 7.6 points in Wisconsin. Underperformances by Libertarian nominee Gary Johnson and Green Party nominee Jill Stein may have contributed to that result, but it didn't account for such a large discrepancy.
"It's not that they lied to pollsters, but they didn't even pick up the phone," Murray said.
In his final Pennsylvania poll, Murray said, he was accurate around the Philadelphia metro area through Scranton and in the Pittsburgh metro area. But he missed the central, rural part of the state by 20 points: He had polled a 10-point Trump lead while Trump ended up winning that area by 30 points.
"That's probably the problem across all swing states, and that's the first state that I looked at because it's the one I did most recently," he said. "We were getting metro areas correct, but in the non-metro areas, and this will be true in Wisconsin and Michigan as well, that, a certain type of Trump voter seemed more [unwilling] to talk to pollsters. And, it plays into this whole anti-establishment sense."
"Just a few points off ... is what will cause the entire narrative to be off," he continued. "This wasn't just, 'You got this state poll wrong or this poll didn't go this way.' This was a systematic miss of all the polls in the same direction. And I'm going to guess that on metro areas it was spot on.
"We were missing something in rural areas."
Reuters
Methods don't matter
Murray said it "didn't matter" what methodology was being used to conduct the poll, whether it was random dialing, live telephone, robocalls, or online polling.
"The people who don't talk to us are significantly different than the people that do talk to us that they can throw a poll off by 4 or 5 points," he said. "And that is a serious methodological issue. Because you can't wiggle your way out of it."
"If this is true, it will be a real challenge for the polling industry to figure out how do we account for what seems to be an anti-establishment segment of the population who won't talk to pollsters," he continued. "And if politically, they skew one way or another, then we're really missing them."
Tom Jensen, the director of the Democratic-leaning firm Public Policy Polling, pinned some of the blame on the response rate for telephone polling. He told Business Insider there's "no doubt that it makes life harder" when response rates are lower than 5% and that weighing answers "to try an make up for the people we're missing" only goes "so far."
Speaking specifically on Michigan, a state where Jensen's final poll showed Clinton with a 5-point lead, the pollster said it showed how "the dynamics" of polling individual states can shift from cycle to cycle.
"In 2012 it was one of the states where Democrats were underestimated the most," he said. "This year it was one of the states where they were overestimated the most."
Two things happened in the state, he believes: First, pollsters underestimated the drop in black-voter turnout around Detroit from President Obama's 2008 and 2012 election. Second, the pool of white voters interviewed wasn't quite as Republican as the overall white electorate.
In Wayne County, home of Detroit, Clinton's margin of victory over Trump was 93,000 votes fewer than Obama's margin over Romney in 2012. That was more than enough to give Trump a thin advantage.
"I think across the board it's clear that pollsters had trouble sufficiently measuring the extent to which rural white voters were going to turn out and pretty universally support Republicans," he said.
"So there aren't easy answers," he continued. "It's easier to identity the what of went wrong with polls than the why."
Jensen said he repeated to anyone who'd ask throughout the general election that Clinton was up by 3 or 4 points, but that it was not "that unusual" for polls to be off by that amount.
"And if they're all off by that in the same direction Trump could win," he said. "Unfortunately, the warning came to fruition this time."
Steve Mitchell, whose Fox 2/Mitchell poll in Michigan found Clinton with an identical 5-point lead heading into Election Day, chalked up his miss to a very specific reason: Human error.
Mitchell was one of a number of pollsters who had already reckoned with a surprise in the Wolverine State earlier in the cycle. He had told Business Insider in March that he'd "only been wrong like this twice in my life" when discussing how Michigan pollsters, including himself, missed Sen. Bernie Sanders of Vermont's shocking victory in the state's Democratic primary.
In the Clinton-Trump matchup, Mitchell said he changed his methodology right before his final poll, placing a greater weight on younger voters.
"As a result of that, I just misweighted the data and should have had it much closer," he told Business Insider. "It was a mistake that I made. I'm a pollster. I take full responsibility for the mistakes that I made. We had the trend, we had the trend before anybody else. I have a feeling if we continued to do it the same way we had, or had I weighted it in the same way I had weighted it prior to that, we would've had it."
Forecasting
To Wang, who runs the Princeton Election Consortium, the misses in Pennsylvania, Wisconsin, and Michigan were "similar" to results across the nation.
He found the median state-level presidential polling error to be 4 percentage points. And he saw close Senate races had polls that were off, too, favoring Democrats by a median of 6 percentage points.
"What happened? Don't know yet. With a close race, it's easy to pin the error on any number of causes. But all of those causes add up to a single number - the final result," he said.
Of course, if just 1% of Trump voters voted for Clinton in the three highlighted states, "we would be having a very different conversation," he added.
Wang said a number of steps must be taken to attempt correction of the miss, including reviewing if pollsters adequately captured hard-to-poll demographics - such as blue-collar white voters - and as beginning to understand how to capture the leanings of undecided voters.
"Based on cognitive science, these voters might be mentally committed to a choice - they just aren't able to verbalize it," he said. "We humans are like this in all kinds of domains, from what to have for lunch to who to marry. Seems like indirect approaches like social media and web search data might be a new way to predict voter behavior."
When looking at the three "Rust Belt" states, Scott Tranter told Business Insider it's hard to know whether the polling there "really missed" or if there wasn't enough polling to catch a "fast moving" trend at the end.
Tranter is the co-founder of Optimus, an analytics firm that did work for Sen. Marco Rubio's 2016 presidential campaign and Wisconsin Sen. Ron Johnson's senatorial campaign - the latter of which ended in an upset win nearly as unexpected as Trump's.
In Pennsylvania, for example, Clinton held a lead of nearly 9 points in the RealClearPolitics average with less than a month to go until the election, only to watch the lead get cut all the way down to less than 2 points by Election Day.
Prediction models, which were universal in forecasting the race for Clinton, didn't account for something simple that would have likely given Trump a much greater chance, Tranter said.
Getty Images/Slaven Vlasic
Each model had given individual odds for Trump to win the three states. Combined, it seemed highly unlikely that he could pull off upsets in all three. But what the numbers didn't account for, Tranter said, was that if Trump won just one of the three states, it was much more likely that he would win the other two.
This should have, in turn, increased his odds of being able to pull off the sweep.
"Something that may have not been taken into account in public models is that if one state goes Trump it adjusts the probability that another state goes Trump," he said. "I wouldn't be surprised to find some correlation between Wisconsin going Trump thus increasing the odds Pennsylvania goes Trump."
"Correlation is not causation, but it can be explanatory and should be accounted for," he added.
The lessons
Polling "is not dead," says Matt Oczkowski of Cambridge Analytica, the analytics team that worked with Trump's campaign and the Republican National Committee during the presidential race.
"People who say polling is dead are foolish," he said. "Polling is not dead, it's necessary for what we do. We need it for research. It's the way polling is conducted that has to be changed. What we did goes to show that polling needs to be combined with data science to be successful."
Oczkowski said pollsters "making guesses on old intuitions and old methodologies isn't going to work anymore."
"Understanding who you are polling, understanding what the map looks like, what the electorate looks like is so important, and that's why so much of the public polling was off," he continued. "So proper investment in doing more polling but coupling that with a data science program is really going to be the future."
Problems at the state level this time around were no shock to Michael Ramlet, CEO of the online-polling firm Morning Consult. He said the results are a direct consequence of "the old-school" polling methods.
"What happened is you've got a lot of the usual suspects like a lot of the universities, like Quinnipiac and others, that do a lot of state polling, and then you've got the Wall Street Journal, NBC/Marist poll," he said. "All of those polls usually have one thing in common - they're using a live-telephone methodology."
Morning Consult had conducted a study in December 2015 that produced evidence for a theory of "shy" Trump voters who experience social desirability bias during telephone interviews - that is, they don't want to admit they're voting for Trump. Ramlet felt that played out in the general election.
"I think that played into greater degree of effect in Midwestern states where, generally speaking, pretty educated populace, above average income nationally," he said. "And what we're seeing is there is probably some degree of that desirability issue combined with some outdated methodology. I don't think we're shocked that the state polls got it wildly wrong, just surprised at the widespread degree that they got it wrong, giving a false sense to the prognosticators."
"The biggest failure of the larger polling community," he said, "was that they were implying that this race was not as close as it was."
Maxwell Tani contributed to this report.