One thing a week’s vacation from blogging helps you get perspective on is the Gallup tracking poll. On August 1 when I had my last day at The Atlantic it was time for panic as McCain had tied things up. Then Obama started to regain ground, going up to a four point lead. Then the race tightened again, then Obama opened up a five point lead, and now it’s tightening again but with Obama back to a smallish lead having beaten back the strong challenge McCain was mounting around August 1. In short, McCain’s “Celebrity” ad and drilling attacks were working well, but when the McCain campaign went after Obama on the tire gauge thing he came up with effective countermeasures and regained his lead.
Maybe.
Or maybe none of that happened. As everyone knows, there’s sampling error associated with polling. As a result, if you poll 1,000 people on August 1 and then you poll 1,000 different people on August 2 you shouldn’t be surprised to see the results differ by several percentage points even in the absence of any change in the underlying public opinion. Beyond that, doing one poll per day throughout a long campaign would mean that you’d expect to see one or two relatively rare outlier results per month even under circumstances of total stasis. And as Alan Abramowitz points out if you look at the daily results this is actually what you see — incredible volatility with Obama’s lead oscillating violently around an average of 3-4 points. Since it’s not plausible that the public mood is really swinging anywhere near as rapidly as a very naive reading of the Gallup daily results would suggest, people could see that this is basically statistical noise in a stable race.
But Gallup doesn’t report its daily results, they report a multi-day rolling average. Abramowitz notes that if you report a ten day rolling average, you get a chart where nothing happens — Obama maintains a flat lead of 3-4 points. Again, a stable race. But if instead of doing either of those things you do what Gallup actually does and report a three day rolling average, you get these pleasant looking peaks and valleys in the race. The change over time here is large enough in magnitude (unlike on the ten day chart) but also slow enough in pace (unlike on the one day chart) to be plausibly interpreted as public opinion shifting in response to events. And since the human mind is designed to recognize patterns and construct narratives, and since it suits the interests of campaign journalists to write narratives, people interpret the peaks and valleys of the three day average as real shifts in public opinion. But while I have no way of proving that it’s just statistical noise and nothing’s really happening, the “nothing happening” narrative is completely consistent with the data, and it’s telling that the conventional narratives collapse when the data is presented in different ways whereas the “noise” narrative is consistent with multiple ways of displaying the information.
August 11th, 2008 at 9:58 am
This is a good point. One thing I’ve heard is that we wouldn’t see these ‘trends’ if it was just statistical noise. But since it’s a running average, there’s a strong correlation between polls on subsequent days, thus the “trends”. One simple way to test this would be to look at the specific poll data points. Is there any observable statistical correlation between day to day polls in the data set?
August 11th, 2008 at 10:08 am
By the way, Gallup itself has suggested that only a sustained change in the average of its tracking poll over several days would be a strong indication of a true change in public sentiment. And so Gallup is implicitly agreeing with the idea that we need to be looking at something closer to Abramowitz’s ten-day rolling average for reliable indications of such changes.
August 11th, 2008 at 10:22 am
And when the talking heads extrapolate from the tracking polls it’s always interpreted as a negative for Obama.
If McCain draws close it’s because the negative adds are working, when the next poll shows McCain down again they never say the adds are no longer working. The logic is only applied to confirm the prejudice that Obama is in trouble.
August 11th, 2008 at 10:29 am
Well I should hope nothing would happen over a 10 day rolling average, 90% of your data would be a control. Not that I’m knocking Abramowitz or defending the idea of tracking polls, but that’s sort of like saying if you don’t eat anything for a week, you’ll be really damn hungry.
August 11th, 2008 at 10:32 am
Is designed!!
August 11th, 2008 at 10:41 am
Brien,
I don’t follow. If public sentiment actually did shift X number of points at some point, then that shift would eventually show up in a ten-day rolling average–it would just take a while for the necessary number of post-shift days to accumulate. In that sense those additional post-shift days wouldn’t be a control–they would be providing confirmation of a real shift.
August 11th, 2008 at 10:59 am
Matt, if you want to do a quick-n-dirty test of whether it’s typical statistical noise, you could take the one-day polling data for either candidate over a period where you think their true popularity level was stable and run a statistical normality test on it like the Jarque-Bera test. If it comes out looking like a normal distribution, chances are that the true mean (i.e. the actual popularity of the candidate) was stable over the period and all the fluctuations were just statistical noise.
If the true mean had shifted at any time in the campaign, it would probably show up either as a skew to the data (if the candidate had just become more or less popular over the period, and spend more time at either the higher or lower popularity level) or as thinner tails (if the popularity wandered up and down, the standard deviation of the overall distribution would appear large but the tails would still be defined by the lower standard deviation of daily polling data margins of error). Either of those deviations from normality in the data would show up in a Jarque-Bera test.
August 11th, 2008 at 11:07 am
DTM,
Rolling averages essentially compare data over x number of days, with everythin else in between used as a control. So if you have a 3 day rolling sample, you have 1 day as a variable, and 2 days as a control. If you use a 10 day sample, you’ve got 9 days as a control, and 1 day as a variable. So yeah, if you up the amount dedicated to the control group from 66% of the sample to 90% of the sample, I’d imagine variations in the data get a lot flatter.
August 11th, 2008 at 11:36 am
Given the extent to which polls serve as a bulwark for functional democracy and accountable elections, the increasing sensationalization of polls as a means to drive news ratings rather than to reliably monitor public opinion is a very, very disturbing trend.
Polls should be boring. No one should care what they say.
August 11th, 2008 at 12:43 pm
Brien- I’m having a hard time figuring out what you’re talking about. A moving average is a high pass filter to remove day-to-day fluctuations.
And in this case, we clearly WANT to flatten out day to day variations. “Real” changes in preferences are just too small to be sure of over a day or three, and unlikely to happen that quickly anyway. (Especially in these doldrums, with no major scandals, VP announcements, etc.) The results from sample back on August 3 are still almost as interesting as the brand new ones from August 11.
And while we seem to lose the ability to compare day-to-day trends, given the sample size we didn’t really have that anyway. At least the 10-day average would make that transparent.
If it helps, don’t look at is as a rolling average at all. Think of it as a more accurate poll with 10x the sample size that takes ten days to complete.
August 11th, 2008 at 1:32 pm
Brien,
I am also struggling to understand your claim. I think you may be thinking exclusively about day-to-day comparisons, in which case it is true that for a rolling average of X days, from one day to the next the two averages will have X-1 days of polling in common. But I am thinking about longer term trends as well, and obviously the more days that pass between your two points of comparison, the fewer days in common, until finally there will be no days in common at all.
So, the upshot is that the more days you include in your rolling average, the slower the poll will react to any legitimate shift in public sentiment. But nonetheless if there is a legitimate shift, eventually that shift should be fully reflected in the rolling average … it will just take a while.
August 11th, 2008 at 1:39 pm
I really wish they’d put error bars in these kinds of graphs and star points that display significant differences, would make them a lot more useful..
August 11th, 2008 at 2:23 pm
Right.
On top of that, it’s not clear to me how useful comparing polls a few days apart actually is, even if we had the data. If Gallup dialed 100,000 people a night, the daily polls would be highly accurate and we wouldn’t even really need the moving averages. It’d probably be generally more stable, while also showing (statistically significant) 1/2 point or larger overnight swings from (un)favorable news cycles, new ads, etc.
I guess that’d be handy data for campaign ad producers, and fantastic fodder for cable news anchors eager to add ever more breathless meta-layers of self feedback to the process, but it doesn’t really seem like it’d be that great for the rest of us.
In terms of actual voter preferences, the day-to-day swings are still just noise. What we’re really interested in are trends on the scale of months. Aggregating a couple weeks of nightly polls from June and couple weeks from August is perfectly sufficient.
August 11th, 2008 at 3:08 pm
Each daily release indicates a MoE. That means the result for each candidate is likely within that range. For gallup it’s +/-2 points (A bit more precision would be nice because I think the MoE is a bit short of 2%, probably 1.7 or 1.8%). In any case, if you want to know if support has likely changed from point A to B you look at the difference. Since Gallup’s polls have a MoE of two points each day you look for a change of greater than 2% + 2% or 4%.
Occasionally a change of greater than 4% in the sample can happen without a real change in support and with daily results it’s easy to watch for a sustained increase or decrease of support.
We haven’t really seen that movement.
The problem with the 10 day rolling averages for Gallup is that when there is an actual change it will take a while for it to show up in the results. If Gallup trippled the size of it’s poll the MoE would drop from 2% to 1% but that costs more money.
The real interesting thing is the stability of the results. For all the ads, for all the negativity, for all the work of the campaigns, things haven’t really moved. Neither campaign has found a game changing narrative.
August 11th, 2008 at 3:19 pm
It’s not increasing the amount of data that creates a flatter trend, it’s the amount of data devoted to being a control. In any timeframe, 1 day is going to be your variable, and the rest is going to be your control. So in a 3 day average, you’ve got 2 days devoted to your control group, or 66%. In a 10 day average, 90% of the data is yur control group. I’m not saying this makes the longer average flawed, I’m just pointing out that of course the longer average is going to be flatter regardless of how the actual data plays out, because only 10% of your data is devoted to the variable, as opposed to a full 1/3 of the data in a 3 day sample. Both of them fundamentally suck, and are basically designed simply to get publicity via the media, who’d much rather tell you about a poll than do actual reporting any day.
August 11th, 2008 at 3:30 pm
Brien,
I still don’t get your point.
Let’s take 20 days of polling. Now compare the 10-day rolling average after days 1-10 to the 10-day rolling average after days 11-20.
I’d suggest none of the days in this comparison are serving as a “control”, because there are no overlapping days between the two samples. But explain to me why you think that is wrong.
August 11th, 2008 at 4:06 pm
But then what you’re doing is just comparing 2 different 10 day averages, which isn’t what we’re talking about here. If you were going to convert 10 day averages into 20 day averages you’d just be comparing two days over approximately 3 weeks, and 95% of your data would be devoted to your control group. You’d hardly ever get ANY “daily” movement at all.
August 11th, 2008 at 4:10 pm
I should say that I’d agree with that way of looking at polls though, and I’d argue that’s the proper way to digest Gallup’s “daily” data if you’re inclined to do so. Look at the poll on a given day, and compare it to the previous week’s poll on that day.
What I’m saying though is simply that, as a mathematical function, 10 day rolling averages would always look flatter than 3 days rolling averages because there’s substantially more data allocated to your control group. I’m not trying to make any points about data or methodology really, I’m just noting that the “conclusion” to Abramowitz’s number crunching should have been obvious from the outset; the more days you include in a rolling average, the less pronounced fluctuations will appear, and vice versa, because you have a much, much tighter statistical control.
August 11th, 2008 at 4:23 pm
Brien has a very strange idea of statistics. When Gallup calls somebody and asks, “Are you more likely to vote for McCain or Obama?” the answer is one data point. It is not a control (what would it be a control against?). When Gallup makes 1000 of those calls, every answer is one data point. The point is that the sample of 1000 people is not the same as the ultimate sample, which may be 100 million voters. Every time you take a sample of 1000, your results are subject to random fluctuations: If you called Jones instead of Smith, you would have gotten one less vote for McCain and one more for Obama, and this shows up in the tally. If Gallup does the same experiment tomorrow, that is another collection of data points. Statisticians do various mathematical tests on those collections of data points in order to see if some pattern(s) can be found. If Gallup found that the percent supporting McCain went up one percentage point every day for 10 days, and if Gallup did proper statistical tests to evaluate the data (ie: what percentage of answers is “neither” or “don’t know”), then Gallup could tell us that McCain had gained on Obama in a statistically reliable way. The problem, as Yglesias explains, is that in the day-to-day fluctuations, the random deviations are enough to provide entertainment value but don’t actually represent real changes in public views. The way we detect that fact is by looking at several days running, and finding that they average out to nothing.
August 11th, 2008 at 4:38 pm
But every day’s data doesn’t get counted as a variable. If they did, you wouldn’t do rolling averages at all. But this sort of polling is impossible, because there’s too much volatility to the sample. Thus, we get rolling averages to account for that.
The way a rolling average works is rather simple; each day a certain amount of data is “updated” while the rest remains constant. So if it’s Thursday an I’m compiling a 3 day rolling average, what I do is replace Monday’s sample with Thursday’s sample (the variable) while the rest of the data remains unchanged (the control). So with 3 days worth of samples, the variable accounts for 1/3 of the data, te control set for 2/3, or twice as much. This limits the volatility that would arise from reporting unique data everyday. But if we lenghten the amount of time we include in our average, then we increase the amount of data allocated to the control group, since basically all we do is increase the span of time between our comparison points. Barring some really major event creating a drastic swing in the results, increasing the control size will itself yield a flatter result.
Again, I’m not saying there’s anything wrong in Abramowitz’s math, just that he’s basically arguing that the sky is blue.
August 11th, 2008 at 5:18 pm
Brien,
You seem to be completely missing the point. Of course a 3-day rolling average is less volatile than a daily poll. And of course a 10-day rolling average is even less volatile.
It isn’t that a 3-day or 10-day average is proof of a lack of volatility in the campaign. It is that the diminished volatility due to — as you point out — the nature of averaging itself masks the noise in the poll.
It would be, as you say, arguing the sky is blue if so many people didn’t look at the 3-day rolling average and conclude that the sky isn’t blue.
August 11th, 2008 at 5:30 pm
Well true, but I’m not arguing that. I’m not going to defend the benefits of tracking polls in any sense. There a concoction of polling agencies designed to do nothing more than catch the attention of cable news and get the polling agency’s name splashed on CNN all day long. They have no real statistical validity, or practical value, whatsoever.
In fact, my intention was to highlight a flaw with the basics of tracking polls that wasn’t wholly touched on, and that I think is much more fundamental. If that was unclear, I apologize for poorly articulating my statement.
August 11th, 2008 at 5:58 pm
There is a difference between saying “nothing important is happening” and “it is statistical noise”. For one thing, I agree with the first claim and disagree with the second. The shift from Obama up by 2% to Obama up by 7% was statistically significant (at a level that even if one assumes that there is considerable data mining behind the choices of the end dates it is still significan). Simlarly (but a bit more so) for the decline from 7% to 0%. Still more for 9% to 0%. Also Obama’s lead according to Rasmussen moved up and down roughly in synch with his lead according to gallup (this would be a pure coincidence if all that was going on was statistical noise).
Both tracking polls have large samples. This matters for statistical noise. I think the only view consistent with the data is that a small fraction of people are changing their minds. Nothing much is going on. A reasonable assessment of McCain’s chance of winning isn’t changing (for one thing polls this far out have historically had about zero value in forecasting outcomes as has been widely reported). However, the changes aren’t just statistical noise. They are statistical noise plus small brief shifts in public opinion.
More generally the “just statistical noise” is relatively testable and rejected by the data (including Rasmussen too at least for any plausible amount of data mining).
August 11th, 2008 at 8:37 pm
more to the point, diogenes has pointed to this paper presented to the international statistical institute which lays out the reasons why the media is misinterpreting close polling results w/a “statistical dead heat.”
long story short, because of accepted statistical calculations, a +or- 4 point moe doesn’t mean there’s a 50% chance that the two are tied; statistically speaking it’s more like 98% the leader is actually ahead vs. a 2% chance the other guy is actually ahead.
when the moe is reduced to 2%, that only reduces the actual probability to 84% vs. 16%.
so basically, the media have no idea what the study and theory of statistics have to do with actual statistics.
end result: obama is, and always has been, ahead. period.
August 11th, 2008 at 11:08 pm
But then what you’re doing is just comparing 2 different 10 day averages, which isn’t what we’re talking about here.
Who is this “we”? I understand that you are limiting your analysis to day-to-day comparisons, but my point (and the point others have implicitly been making) is there is no need to so limit yourself. And of course the only slightly more sophisticated version of this point doesn’t imply a strict non-overlap requirement. Rather, you are gradually getting more and more non-overlapping data as the days pass from your initial comparison point.
Finally, for what it is worth, Abramowitz constructed his chart using polling from June 5 to August 4, so that was more than long enough to allow comparisons between several entirely non-overlapping 10-day samples. And I think that was very much central to the point he was making, which was not just that there hadn’t been a lot of day-to-day variation in the 10-day rolling average, but also that there had not been much variation over the entire series.
January 14th, 2009 at 9:21 am
laptop battery
laptop batteries
February 8th, 2009 at 9:21 pm
laptop battery
March 1st, 2009 at 11:24 am
March 17th, 2009 at 3:43 am
Great site. Good info
tramadol
March 22nd, 2009 at 7:51 am
tramadol
Great site. Good info
March 22nd, 2009 at 11:39 am
buy viagra online
If you have to do it, you might as well do it right
April 8th, 2009 at 5:23 am
I bookmarked this site. Thank you for good job!
viagra