By Patrick Murray
Is public opinion polling accurate? The answer to that question is driven in large part by whether a specific subset of polling – specifically, preelection “horse race” polling – is perceived as accurate. On that front, the polling industry has hit some bumps recently, particularly with the 2020 election. An evaluation of preelection polls conducted by the American Association for Public Opinion Research (AAPOR) could not identify a clear culprit behind that year’s polling miss. The most likely explanation is differential nonresponse – i.e., that a cohort of Donald Trump supporters did not participate in the polls out of mistrust in political institutions.
In other words, the poll samples may have in fact looked “right” from the perspective of age, gender, race, and education, but within each group was a small overrepresentation of Democratic support because of nonresponse from Trump backers across these demographics. Keep in mind that these errors are small, and in fact would not be noticeable in most public polls. But when taken together in the context of the presidential election, these group errors resulted in a 4-point skew in the average polling sample.
We examined the polling Monmouth conducted in six states prior to the 2020 election. We validated turnout for 2020 for everyone selected for the sample – both those who participated in the survey and those who did not – in an attempt to identify patterns that could identify the nonresponse problem. This included demographic and vote history data for tens of thousands of voters in each state. Even with this enhanced data set, we could not find a “silver bullet” in the voter files that distinguishes a cohort of likely Trump supporters who refused to participate in the polls. Our review is summarized in a presentation prepared for the 2022 AAPOR conference here.
Was this suspected nonresponse problem also behind the polling miss in the New Jersey gubernatorial election the following year? The short answer is no. It turns out – via post-election voter verification – that the underlying sample in the New Jersey polling was representative of the full voter population. The real culprit was an unusually large partisan skew in who actually turned out – a skew that was not captured by likely voter models. This skew was not present in the Virginia electorate that year, which is why the polls in that state were better barometers of the actual outcome there. In other words, the samples of registered voters in both states were accurately representative of all registered voters there. The problem in New Jersey was likely voter modeling. [This is also reviewed in our presentation.]
Prior to 2016, our first suspect in an election polling error would have been the likely voter model – which is exactly what we found to be the case here. But since 2016, we have to contend with the notion that every polling error is due to a systematic problem.
Thus, we are left with a dilemma. Polling is a social science methodology with an acknowledged potential for error. This means that polls can and do miss the mark, either individually or as a group, from time to time. The magnitude of those misses can be as much perceptual as statistical. For example, an election poll that has Candidate A ahead by 2 points in a race that Candidate B actually wins by 3 points is perceived by the public as being more erroneous than a poll with Candidate A ahead by 10 points in a race she wins by 16 points – even though the absolute error in both instances is the same.
Public pollsters, unlike campaign pollsters, have a mission to hold up a mirror to the public on important political, cultural, and behavioral issues of the day – e.g., think prevalence and rationale for COVID vaccine opposition. While it is important to survey voters about elections, the “horse race” question is fraught with problems and likely voter modeling pushes polling methodology to do things it is not designed to do – i.e., predict future behavior. It is a testament to the polling field that election polls tend to get things right more often than not.
The 2020 election aside, there is little evidence that public opinion polling is measurably less accurate today than it was a decade or a generation ago. The polling industry has been proactive, and largely successful in addressing challenges as they arise (such as declining response rates, the growth in cell phone use, etc.). But in an era of distrust, we have to be even more vigilant and transparent.
In that spirit, Monmouth is making available the full datasets for its final state election polls in 2020 and 2021. The 2020 files include appended voter file information for both respondents and nonrespondents. As mentioned earlier, the Monmouth team did not identify any “silver bullet” corrective – either for the 2020 nonresponse error or the 2021 likely voter model error. We hope that by sharing these datasets, though, other researchers may be able to further our understanding of challenges in polling these elections.