[lbo-talk] Fwd: using statistics to estimate the elections in Michigan and Florida that didn't happen

Doug Henwood dhenwood at panix.com
Thu May 22 10:53:26 PDT 2008


I forwarded Robert Naiman's query to the AAPOR list, and got this response.


> From: Marco Antonio Morales-Barba <marco.morales at NYU.EDU>
> Date: May 22, 2008 1:37:27 PM EDT
> To: AAPORNET at ASU.EDU
> Subject: Re: using statistics to estimate the elections in Michigan
> and Florida that didn't happen
> Reply-To: Marco Antonio Morales-Barba <marco.morales at NYU.EDU>
> List-Archive: <https://lists.asu.edu/cgi-bin/wa?LIST=AAPORNET>
>
> Some initial thoughts that could account for why no "serious"
> effort has been done on this matter:
>
> It is certainly true that running a regression is trivial, and the
> same is true for predicting values (and uncertainty associated with
> these values). But the one thing to keep in mind is that these
> counterfactual scenarios are heavily dependent on the assumptions
> made when estimating models. Running the suggested regression with
> a time trend assumes, for instance, that the population is similar
> in all states, the people who turn out to vote are no different
> across states, that all voters in all states behave in exactly the
> same manner over time, and that there are no carry-over effects
> across primaries, and that there is some increasing effect of some
> sort. Estimates produced with these assumptions would then be used
> to make predictions. Common sense tells me this would be extremely
> unrealistic.
>
> An alternative is to model all these factors, along with the
> effects of campaign expenditures, candidate appearances, grassroots
> operations, etc to have a much better model to estimate and predict
> votes in Michigan and Florida. But, since we do not know much about
> the behavior of candidates in these states (how much would Clinton
> or Obama spend in the state, what type of ads would they run, what
> strategic decisions would they make based on the primaries that
> would follow...) would affect voters (how much would ads affect
> turnout, how much would rallies motivate people to vote for one
> candidate or another, etc), the estimation would have a huge degree
> of uncertainty which needs to be accounted for. That said, the
> question is how informative are predictions high uncertainty,
> particularly in elections that might be very close? In other words,
> what do we really learn when std errors or confidence intervals of
> these estimates overlap?
>
> But if these were the chosen path, at least these four points
> should be addressed to get a somewhat plausible estimate:
>
> i) differences across states should be accounted for, including
> turnout (perhaps to be modeled as a two-stage process).
> ii) this would be a time-series problem, which complicates the
> estimation (Sadly, the suggested time trend would not solve the
> problem.)
> iii) we have exit poll data, and census data at the state level.
> The model would need to be estimated as multilevel (in time-series)
> which can become even more complex.
> iv) address the problems that endogenous decisions by candidates
> would have on the consistency of estimates
>
> There are also other clever alternatives for estimating
> counterfactual scenarios using "synthetic" states. The idea is
> related to matching in the sense of creating a weighted average of
> all other units on relevant characteristics so as to simulate what
> would happen to the synthetic state absent some relevant condition.
> It's still a new technique, but the details can be found here:
>
> - Abadie, Alberto and Javier Gardezabal. 2003. "The Economic Costs
> of Conflict: A Case Study of the Basque Country" American Economic
> Review 93(1):112-132.
> - Abadie, Alberto, Alexis Diamond and Jens Hainmuller. 2007.
> "Synthetic Control Methods for Comparative Case Studies: Estimating
> the Effects of California's Tobacco Control Program" Ms. Harvard
> University.
>
> But this is also dependent on assumptions about what
> characteristics of the state are relevant for the exercise.
>
> All this to say that even when the job can be done in an Excel
> spreadsheet by college undergrads, I'm not sure how much could we
> learn from such an exercise.
>
> Best,
>
> Marco Morales
> PhD student
> Wilf Family Department of Politics
> New York University
> 19 W 4th St, room 320
> New York, NY 10012
> +1 (212) 992-8690 (o)
> +1 (212) 995-4184 (f)
>
> marco.morales at nyu.edu
>
> ----- Original Message -----
> From: Doug Henwood <dhenwood at PANIX.COM>
> Date: Thursday, May 22, 2008 10:53 am
> Subject: using statistics to estimate the elections in Michigan and
> Florida that didn't happen
> To: AAPORNET at ASU.EDU
>
>
>> [Someone just posted this to the listserv I moderate. Any thoughts?]
>>
>> From: "Robert Naiman" <naiman at justforeignpolicy.org>
>> Date: May 22, 2008 10:38:40 AM EDT
>> Subject: [lbo-talk] using statistics to estimate the elections in
>> Michigan and Florida that didn't happen
>> Reply-To: lbo-talk at lbo-talk.org
>>
>> With all the ink that's been spilled over the question of how
>> Michigan
>> and Florida will be represented at the Democratic convention, and
>> with
>> a much-anticipated meeting of the Democratic rules committee at the
>> end of this month to consider this question, I'm surprised that I
>> haven't seen anywhere anyone trying to answer the following basic
>> question:
>>
>> If Michigan and Florida had voted when they were supposed to, and if
>> both candidates had campaigned there, what would have been the
>> result?
>>
>> Of course, the true answer to this question is an unknowable
>> counterfactual. But, as you all know, economists answer such
>> questions
>> all the time, often on the basis of considerably less data than
>> exists
>> in this case.
>>
>> Since January, 48 states and DC have voted, and we know the
>> result. We
>> also have detailed demographic data on who voted, in the form of exit
>> polls, for example, here:
>>
>> http://www.cnn.com/ELECTION/2008/
>>
>> We should also include a right-hand variable for time since Iowa.
>>
>> So we could run a multiple linear regression of the results against
>> the demographic data and time.
>>
>> Let's say, as a first pass, that we do the following: ignore other
>> candidates, make the left hand variable Obama's share of the Obama/
>> Clinton vote in terms of delegates, ignore the internal dynamics of
>> delegate apportionment within the state, pretend that the
>> demographics
>> of the Florida and Michigan votes matched their demographics in
>> the US
>> Census (this last one should be straightforward to correct by
>> estimating the demographics of the turnout first, but that would take
>> another round of entering data from the census for each state.)
>>
>> This would be a great exercise for a college statistics class that
>> e.g. uses Excel. It would take a little effort to enter the data into
>> the spreadsheet, but if a class were working on it, they could easily
>> divide up the task using e.g. a shared spreedsheet under Google Docs.
>> ___________________________________
>> http://mailman.lbo-talk.org/mailman/listinfo/lbo-talk
>>
>> ----------------------------------------------------
>> Archives: http://lists.asu.edu/archives/aapornet.html .
>> Unsubscribe? Send email to listserv at asu.edu with this text:
>> signoff aapornet
>> Please ask authors before quoting outside AAPORNET.
>
> ----------------------------------------------------
> Archives: http://lists.asu.edu/archives/aapornet.html .
> Unsubscribe? Send email to listserv at asu.edu with this text:
> signoff aapornet
> Please ask authors before quoting outside AAPORNET.



More information about the lbo-talk mailing list