2016 Presidential Election: Modeling North Carolina


Every four years, data scientists throughout the United States have their attention drawn to the presidential election and the task of correctly modeling the outcome. In 2008, Nate Silver set the standard by correctly predicting all 50 states correctly. He immediately followed this performance up in 2012 by predicting 49/50 states correctly. However, even with these past results in mind, many people out there were skeptic about modeling the 2016 election due to the outspoken nature and controversy concerning the two major candidates. In hindsight, the skeptics were likely correct considering the many models (including mine) that struggled on November 8th.


This project completed as part of my coursework aimed to predict the winner of the state of North Carolina, along with the corresponding percentages for each party. 



The majority of the key data was made up of polling results taken from Electoral-Vote.com. I also used historic data regarding the presidential election outcomes over the last 30 years.


As far as the model itself is concerned, it is composed of two parts: polling and regression. I decided to look at strictly the most recent polls from only credible sources at the time the model is run . These polling results were averaged the get an initial result. However, the polls seldom produced a result with 100% of the total votes. This leftover group (the “undecided”) needed to be accounted for. I decided to do this with  a non-linear regression model based on past election results.

One cool thing about this model is that it is dynamic by nature. As election day draws closer and the polls become more accurate, the model breakdown shifts to be more “polling-heavy”, much like FiveThirtyEight does with their model.


The results, as of November 1st, were as shown in the plots below. Along with the expected vote share predictions, I also ran a normal distribution probabilistic model and calculated the chances of each candidate winning.

The Next Step

Now that the election is over, I have had time to reflect on my model. Although it wasn’t necessarily “correct” in this election, I still believe the model is functional and more importantly very effective for learning purposes. Moving forward, I am committed to brainstorming better ways to model the presidential election and I am eager to try my hand at predicting all 50 states in a more accurate manner come the 2020 election.



  • Language: R
  • Environment: RStudio
  • Date Completed: October 2016
  • Techniques Used: Statistical Methods, Data Munging, Probabilistic Modeling
  • See My Code: Github