Data Set: Pokemon Go

Nick Paterno January 30th, 2022

The data set is pokemon_go. This data set is available on openintro.org/data and in the `openintro` R package.

The Pokemon franchise first started in 1996 with the release of Pokemon Red and Pokemon Blue on the Nintendo GameBoy. Since then, it has spawned thousands of hours of anime, an extremely popular trading card game, over 20 video games and has become a billion dollar franchise.

Pokemon Go, released in 2016, is a mobile game that allows players - or trainers - to collect and train Pokemon in the real world through augmented reality. A key part of Pokemon Go is using evolutions to get stronger Pokemon, and a deeper understanding of evolutions is key to being the greatest Pokemon Go player of all time. The ‘pokemon_go` data set covers 75 Pokemon evolutions spread across four species. A wide set of variables are provided, allowing a deeper dive into what characteristics are important in predicting a Pokemon's final combat power (CP). This is a great data set for exploring linear and multilinear regression as well as exploratory data analysis and visualization.

Let's take a look at a basic linear regression comparing pre and post evolution CP.

At first glance there is an overall positive relationship. Going a bit further, there are clearly a few distinct relationships. Since the plot is for four species, it's reasonable to believe that maybe species plays a factor. Let's see what happens when we color by species.

Our instincts were good! There is a relationship for each species. However, a closer look reveals that two species appear to share the same pattern. Here it helps to know a bit more about Pokemon. One variable not in the data set is the Pokemon type; the two species who share the same linear pattern - Caterpie and Weedle - are both Bug Type pokemon. This means we could be a bit more general and say there is likely a separate model for each Pokemon type. The Pidgey species is a bird or Flying Type Pokemon. It too has a strong linear relationship. But, look at Eevee!

There is definitely more variation in the model we would build for Eevee. Eevee is a Normal Type Pokemon. It is also a unique Pokemon because it has more than one evolved form. In total, there are eight "Eeveelution", each one having a different type. At the time this data set was collected, there were three Eeveelutions in Pokemon Go: Vaporeon (Water Type), Flareon (Fire Type) and Jolteon (Electric Type). Let's see what happens when we color by Type instead of Species.

Success! We seem to have identified that a linear relationship does exist between a Pokemon's pre and post evolution combat power when grouped by their species! Interestingly, the Electric Type is the only one with a negative relationship. Here is a great place for further exploration or a possible student project. If we collect more data on additional species of all Pokemon Types will we see the same positive pattern for most Types and a negative pattern for Electric Type Pokemon?