Close Close

Are Nate Silver’s Pollster Ratings “Done Right”

This originally appeared as a guest column on Pollster.com.

The motto of Nate Silver’s website, www.fiverthirtyeight.com, is “Politics Done Right.” I’m not sure that his latest round of pollster ratings lives up to that moniker.

As most poll followers know, Nate shot to fame during the 2008 election, taking the statistical skills he developed to predict baseball outcomes and applying them to election forecasting. His approach was pretty accurate in that presidential race (although it’s worth noting that other poll aggregators were similarly accurate).

Nate recently released a new set of pollster ratings that has raised some concerns among the polling community.

First, there are some questions about the accuracy of the underlying data he uses. Nate claims to have culled his results from 10 different sources, but he seems to not to have cross-checked those sources or searched original sources for verification.

I asked for Monmouth University’s poll data and found errors in the 17 poll entries he attributes to us – including six polls that were actually conducted by another pollster before we partnered with the New Jersey Gannett newspapers, one omitted poll that should have been included, two incorrect election results, and one incorrect candidate margin. [Nate emailed me that he will correct these errors in his update later this summer.]

Mark Blumenthal also noted errors and omissions in the data used to arrive at Research2000’s rating. I found evidence that suggest these errors may be fairly widespread.

In the case of prolific pollsters, like Research2000, these errors may not have a major impact on the ratings. But just one or two database errors could significantly affect the vast majority of pollsters with relatively limited track records – such as the 157 pollsters out of 262 pollsters on his list who have fewer than 5 polls to their credit.

Some observers have called on Nate to demonstrate transparency in his own methods by releasing that database. Nate has refused to do this (with a dubious rationale that the information may be proprietary) – but he does now have a process in place for pollsters to verify their own data. [If you do, make sure to check the accuracy of the actual election results as well.]

I’d be interested to see how many other pollsters find errors in their data. But the issue that has really generated buzz in our field is Nate’s claim that pollsters who either were members of the National Council on Public Polls or had committed to the American Association for Public Opinion Research (AAPOR) Transparency Initiative by June 1, 2010 exhibit superior polling performance. For these pollsters, he awards a very sizable “transparency bonus” in his latest ratings.

One of the obvious problems with his use of the bonus is that the June 1 cut-off is arbitrary. Those pollsters who signed onto the initiative by June 1, 2010 were either involved in the planning or happened to attend the AAPOR national conference in May. A general call to support the initiative did not go out until June 7 – the day after Nate’s ratings were published.

Thus, the theoretical claim regarding a transparency bonus is at least partially dependent on there also being a relationship between pollster accuracy and AAPOR conference attendance. Others have remarked on the apparent arbitrariness of this “transparency bonus” cutoff date Nate claims that regardless of how a pollster made it onto the list, there is statistical evidence these pollsters are simply better at election forecasting. I don’t quite see it.

His methodology statement includes a regression analysis of pollster ratings that is presented as evidence for using the bonus.

The problem is that even in this equation, the transparency score just misses most researcher’s threshold for being significant (p<.05). More to the point, his model – using dummy variables to identify “transparent” pollsters, partisan pollsters, and internet pollsters – is incomplete. The adjusted R-square is .03. In other words, 3% of total variance in pollster raw scores (i.e. error) is predicted by the model.

Interestingly of the three variables – transparency, partisan, and internet – only partisan polling shows a significant relationship. He decided to calculate different benchmarks that award transparent polls and penalize internet polls (even though the latter was based on only 4 cases and not statistically significant). And oddly, he does not treat partisan pollsters any differently than other pollsters, even though this was the only variable with a significant relationship to rawscore.

I decided to look at this another way, using a simple means analysis. The average error among all pollsters is +.54 (positive error is bad, negative is good). Among “transparent” pollsters it is -.63 (se=.23) and among other pollsters it is +.68 (se=.28).

But let’s isolate the more prolific pollsters, say the 63 organizations with at least 10 polls to their names who are included in Nate’s first chart. Among these pollsters, the 19 “transparent” ones have an average score of -.32 (se=.23) and the other 44 pollsters average +.03 (se=.17). The difference is not so stark now.

Firms with fewer than 10 polls to their credit have an average error score of -1.38 (se=.73) if they are “transparent” (all 8 of them) and a mean of +.83 (se=.28) if they are not. That’s a much larger difference.

I also ran some ANOVA tests for the effect of the transparency variable on pollster raw scores for various levels of polling output (e.g. pollsters with more than 10 polls, pollsters with only 1 or 2 polls, etc.). The F values for this test range from only 1.2 to 3.6, and none were significant at p<.05. In other words, there is more error variance within the two separate groups of transparent versus non-transparent pollsters than there is between the two groups.

I can only surmise that the barely significant relationship between the arbitrary transparency designation and polling accuracy is pointing to other more significant factors, including pollster output.

Consider this – 70% of “transparent” pollsters on Nate’s list are have 10 or more polls to their credit, whereas only 19% of the “non-transparent” ones do. In other words, Nate’s “bonus” is actually a sizable penalty levied against more prolific pollsters in the latter group. “Non-transparent pollsters happen to be affiliated with a large number of organizations with only a handful of polls to their name – i.e. pollsters who are prone to greater error.

For comparison, re-ran Nate’s PIE (Pollster Introduced Error) calculation using a level playing field for all 262 pollsters on the list. I set the error mean at +.50 (which is approximately the mean error among all pollsters).

Comparing the relative pollster ranking between the two lists produced some intriguing results. The vast majority of pollster ranks (175) did not change by more than 10 spots on the table. Another 67 had rank changes between 11 to 40 spots on the two lists; 11 shifted by 41 to 100 spots, and 9 pollsters gained more than 100 spots in the rankings because of the transparency bonus. Of this latter group, only 2 of the 9 had more than 15 polls recorded in the database.

Nate says that the main purpose of his project is not to rate pollsters’ past performance but to determine probable accuracy going forward. But one wonders if he needs to go this particular route to get there. Other aggregators use less elaborate methods – including straightforward mean scores – and seem to be just as accurate.

His methodology statement is about 4,800 words (with 18 footnotes). It reminds me of a lot of the techies I have worked with over the years – the kind of person who will make three left turns to go right.

This time I think Nate may have taken one left turn to many. We’ll know in November.