{"id":2697,"date":"2010-07-13T17:30:00","date_gmt":"2010-07-13T21:30:00","guid":{"rendered":"http:\/\/www.calgarygrit.ca\/?p=2697"},"modified":"2010-07-13T17:30:00","modified_gmt":"2010-07-13T21:30:00","slug":"seat-projection-methodology","status":"publish","type":"post","link":"https:\/\/www.calgarygrit.ca\/?p=2697","title":{"rendered":"Seat Projection Methodology"},"content":{"rendered":"<p>Before I jump in, let me be clear on two points:<\/p>\n<p>1. This model has evolved over the past 3 years (<em>though Stockwell Day will deny it has<\/em>) and each step in it involved a lot of thought, testing, and consideration.<\/p>\n<p>2. I\u2019m completely 100% open to criticism and suggested improvements. I\u2019m sure it can be improved.<\/p>\n<p>That said, let\u2019s jump in.<\/p>\n<p><strong>THE PROJECTION FRAME<\/strong><\/p>\n<p>At the basic level, this is a uniform swing projection using regional data (Atlantic Canada, Quebec, Ontario, Prairies, Alberta, BC). If the Liberals go up by 5 points in Alberta, the model swings every riding up 5 points. If the NDP drops 8 points in Atlantic Canada, the model drops every riding down 8 points.<\/p>\n<p>I experimented with geometric swings (including the variation 538 used for the <a href=\"http:\/\/www.fivethirtyeight.com\/2010\/04\/how-our-uk-forecast-model-works.html\">UK election<\/a>) and mixed models, but they simply didn\u2019t work as well on any of the elections in my data set.<\/p>\n<p>The reason for this is simple \u2013 if the Liberals swing up 5 points in Ontario, the geometric swing is going to give them almost all of their newfound support in rural areas, since that\u2019s where there are the most votes available to swing to them. And that\u2019s <strong>not<\/strong> what usually happens in reality.<\/p>\n<p>Now, I\u2019ll concede in a WTF election like, say <a href=\"http:\/\/en.wikipedia.org\/wiki\/Canadian_federal_election,_1993\">1993<\/a>, when the political landscape of the country is dramatically changed, uniform swing may not work. I wouldn\u2019t use this model to project the  next Alberta election. But uniform swing is simply the best frame to build this model out of, especially since <u>its deficiencies are easy to correct at the simulation stage<\/u>.<\/p>\n<p><strong>THE BASE VOTE<\/strong><\/p>\n<p>Projection models generally use the most recent election as their base. The problem with this is that if a party has a good or bad showing in a riding, <em>the data becomes skewed beyond repair<\/em>. It&#8217;s the same reason you use more than 1 year worth of data to project hockey or baseball stats &#8211; even though Aaron Hill hit 36 home runs last year, it\u2019s foolish to expect him to repeat the feat.<\/p>\n<p>In the political arena, there are prominent examples of this that stand out \u2013 the Greens won\u2019t match their 2008 totals in <a href=\"http:\/\/www.cbc.ca\/news\/canadavotes\/riding\/013\/\">Central Nova<\/a> next election and, no matter how bad the campaign goes, the Liberals will exceed theirs. The NDP results from last election in <a href=\"http:\/\/www.cbc.ca\/news\/canadavotes\/riding\/294\/\">Saanich Gulf Islands<\/a> were obviously hurt by some <a href=\"http:\/\/www.cbc.ca\/news\/canadavotes\/story\/2008\/09\/23\/bc-julian-west-resigns.html\">naked truths<\/a> that (hopefully) won\u2019t be repeated in the next election.<\/p>\n<p>But even on a more subtle level, parties get good candidates, candidates run bad campaigns, local issues emerge. There needs to be a way to <u>smooth these events out<\/u>.<\/p>\n<p>To test this, I used a regression model to &#8220;predict&#8221; the 2008 vote in each riding, based on the 2006 and 2004 riding results and each riding\u2019s predicted vote based on demographics (<a href=\"http:\/\/calgarygrit.blogspot.com\/2010\/05\/fun-with-numbers-liberal-vote.html\"><em>click here for the low down on this<\/em><\/a>). In all cases, these numbers were adjusted using uniform swing.<\/p>\n<p>And the results were clear \u2013 <strong>the best model used all three predictors (2004, 2006, and the demographic regression)<\/strong>.<\/p>\n<p>So keeping the ratios the same, I\u2019m using the following results to get the \u201cbase\u201d vote for each riding:<\/p>\n<p>2008 election: 38.7%<br \/>2006 election: 17.4%<br \/>2004 election: 7.8%<br \/>Demographic regression: 36.2%<\/p>\n<p><strong>OTHER FACTORS<\/strong><\/p>\n<p>Incumbency exists. It means something. It should be taken into account. I\u2019m not going to argue the point any further, because every study ever run on Canadian politics has come to this conclusion, as has <a href=\"http:\/\/calgarygrit.blogspot.com\/2009\/07\/numb3rs-incumbency.html\">my own research<\/a>. So, I\u2019m adjusting for incumbency based on the effect it had on the 2004, 2006, and 2008 elections.<\/p>\n<p>By elections are a different beast. They\u2019re unpredictable, and it\u2019s hard to say how good a gauge they are of future results. <a href=\"http:\/\/calgarygrit.blogspot.com\/2009\/11\/by-elections-should-we-give-damn.html\">My research <\/a>on them has been limited, but the numbers tell me the best prediction model will weight the by election for 44% of the base. And who am I to disagree with the numbers?<\/p>\n<p>The final bit of finessing I\u2019ve used relates to the polling data (which I\u2019ll talk about in a second &#8211; <em>patience&#8230;<\/em>). The Green Party has consistently under performed their polling numbers at the provincial and federal level in Canadian elections. As a result, I\u2019ve scaled the Green polling numbers back to 78.55% of their value. Just make sure the angry hate mail is written on recycled paper before you send it.<\/p>\n<p>My spreadsheet is set up so that I can easily remove these correction factors or change their impact. But in each case, I\u2019ve given them the impact the data tells me to.<\/p>\n<p><strong>POLLING DATA<\/strong><\/p>\n<p>Right now, I\u2019m taking the most recent poll from each polling company and assigning a weight to it based on the sample size and the company\u2019s accuracy in provincial and federal elections over the past 5 years. Under this weighting system, the \u201cbest\u201d poll is worth about twice as much as the \u201cworst\u201d poll.<\/p>\n<p>This is the aspect of my projection model most likely to change in the coming months, and I\u2019m open to suggestions. Things to consider are:<\/p>\n<p>-Ekos releases <em>massive <\/em>amounts of data compared to other companies. They\u2019ll interview 7,000 people a month \u2013 but is it \u201cfair\u201d to give that data seven times the weight of an n=1,000 poll from another company?<\/p>\n<p>-Should weight be given based on the freshness of data? Is 3 week old data worth as much as 3 day old data? And if not, what\u2019s the half life of polling data? Does this change during an election campaign?<\/p>\n<p>-Is it fair to judge the accuracy of a polling company on past election results? Right now, pollster accuracy is being based on 8-12 data points. Hardly a large sample.<\/p>\n<p><strong>ADDING VARIANCE TO THE SIMULATION MODEL<\/strong><\/p>\n<p>Up to this point, I\u2019ve described a very thorough uniform swing model. But a probabilistic model can do so much more. In most models, a seat the Liberals are projected to win by 1% counts just as much in their tally as a downtown Toronto seat they\u2019re projected to win by 40%. Yet in reality, if we project them to win a seat by 1%, it\u2019s basically a coin flip election \u2013 it could go either way.<\/p>\n<p>So we need to make the data messy. Unfortunately, I worry my explanation will be messy as well. But here goes.<\/p>\n<p>The first step is to find the regional support for each party in a given election simulation. This is done using the margin of error on the polling data. If I have 1000 interviews from Atlantic Canada, then the Atlantic Canada data carries a margin of error of +\/- 3.1%. So the numbers get simulated under a normal distribution accordingly. What that means is if the Liberals are polling at 35% in Atlantic Canada, in some of my sim elections they\u2019ll come in at 37% for the province. In others, 32%. Most of the time, they\u2019ll be close to 35% but we\u2019re talking about 10,000 simulations here, so in some of these \u201celections\u201d, they may very well get 31% or 40% in the province. That\u2019s just how margins of error and variance work.<\/p>\n<p>After that, we need to add some noise when transferring polling data from the regions down to the ridings. To do this, I looked at how regional shifts have carried through to the riding level in previous elections \u2013 for example, if the Liberals drop 8 points in Ontario, they won\u2019t drop 8 points in <em>every <\/em>riding. They\u2019ll fall by 4 in same and by 12 in others.<\/p>\n<p>So variance is added, keeping that overall regional polling number the same. Based on my research, the variance gets larger when the change gets larger (i.e. if a party goes up by a lot, their gains are a lot <em>less<\/em> uniform at the riding level) but even if a party\u2019s support is unchanged in a region, their support will still change at the <em>riding level<\/em>. In English, even if the Liberals are polling at the same level in Quebec now as they got last election, they\u2019ll go up in some ridings and down in others \u2013 but it will even out.<\/p>\n<p>I won\u2019t go into the exact mechanics of this, but I\u2019ve test driven this numerous times and the program produces riding variance at the same level it should, based on what\u2019s happened in the 3 previous elections.<\/p>\n<p>But there\u2019s one more level of variance this <em>doesn\u2019t<\/em> take into account: <u>The polls being wrong<\/u>. There <em>are <\/em>elections when pollsters overshoot the margin of the error. We can have 10,000 interviews from 7 polling companies, and miss the bullseye by 3%. That\u2019s not a knock on the pollsters \u2013 some people don\u2019t vote, some people lie, and some just change their mind at the last minute. There\u2019s no use pretending otherwise. Think of the 2004 Canadian election when the electorate swung back to the Liberals on the last weekend.<\/p>\n<p>So, I\u2019ve gone back and looked at how much, <em>beyond normal sample size variance<\/em>, pollsters have missed the mark by (in Canadian provincial and federal elections). And I\u2019ve built this in to the very first step of the model. So even if we have reams of data showing the Tories at 35% nationally, in some of my sims they\u2019ll \u201cactually\u201d be at 32%. In some, they may be at 37% or 38%. Again, the variance is added based on what we\u2019ve observed in recent Canadian elections.<\/p>\n<p><strong>RECAP &#8211; HOW IT WORKS<\/strong><\/p>\n<p>1. Public polling data is grouped together by region.<\/p>\n<p>2. In every simulation, the polling data is adjusted based on sample size variance and on how often polling companies just \u201cmiss the mark\u201d.<\/p>\n<p>3. From this, every riding is simulated based on how the numbers tend to transfer from the regional level down to the riding level.<\/p>\n<p>4. The riding simulations take into account past election results, demographics, incumbency, and by elections.<\/p>\n<p>Using this, my laptop simulates 10,000 elections. From this, I can calculate odds of a given party winning the election or a given seat changing hands.<\/p>\n<p>As I said off the top, I\u2019m open to suggestions on changes \u2013 I\u2019m sure there are improvements that can be made.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Before I jump in, let me be clear on two points: 1. This model has evolved over the past 3 years (though Stockwell Day will deny it has) and each step in it involved a lot of thought, testing, and consideration. 2. I\u2019m completely 100% open to criticism and suggested improvements. I\u2019m sure it can [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.calgarygrit.ca\/index.php?rest_route=\/wp\/v2\/posts\/2697"}],"collection":[{"href":"https:\/\/www.calgarygrit.ca\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.calgarygrit.ca\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.calgarygrit.ca\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.calgarygrit.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2697"}],"version-history":[{"count":0,"href":"https:\/\/www.calgarygrit.ca\/index.php?rest_route=\/wp\/v2\/posts\/2697\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.calgarygrit.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2697"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.calgarygrit.ca\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2697"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.calgarygrit.ca\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2697"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}