Data Set: Vehicle Data from the EPA (2021)

Megan Mocko February 20th, 2022

The data set is epa2021. This data set is available on openintro.org/data and in the `openintro` R package.

This week's data set of the week is epa_2021. These results are gathered each year for vehicles tested under the oversight of the Environmental Protection Agency. The data is collected in Ann Arbor, Michigan, in the National Vehicle and Fuel Emissions Laboratory. The information includes 12 manufacturers including Volvo, General Motors, and Toyota. The data is primarily categorical, but a few quantitative variables are present, such as city_mpg and high_mpg. There are a total of 28 variables in the data set.

The data set is an excellent one to use for essential graphical summaries at the start of the semester. One way to present this to the class is to have the students answer the following three questions.

  1. What type of graph would you use to explore the center and variability of miles per gallon for the vehicles tested at the National Lab?
  2. What type of graph would you use to explore the relationship between the miles per gallon for vehicles in the city versus on the highway?

You can then have them work in groups or pairs to make the graphs. Here is a histogram of the miles per gallon in the city.

A histogram showing miles per gallon in the city. The plot is right-skewed. The maximum value is close to 60, but the majority of the data is around about 21 miles per gallon. This data is from the epa2021 data set.

Here is a scatterplot for exploring the relationship between miles per gallon in the city or highway.

A scatterplot showing the miles per gallon in the city on the x-axis and the miles per gallon on the highway on the y-axis. The plot shows a generally positive trend - as the miles in the city tend to increase so do the miles per gallon on the highway. This data is from the epa2021 data set.

The Guidelines for Assessment and Instruction in Statistical Education include an emphasis to "Give students experience with multivariable thinking." However, don't stop there. Have the students think about the different manufacturers of cars? Ask them if they think that the cars made by the various manufacturers would have different relationships between miles per gallon in the city versus on the highway.

This plot is a grid of plots of mpg for city versus mpg for the highway by the manufacturer. For most manufacturers, you see a positive, linear trend. However, for a few manufacturers, there are too few cars to determine a trend. This data is from the epa2021 data set.

To finish the activity, you could ask them what surprised them or if this is what they expected to see.

Source: Fuel Economy Data from fueleconomy.gov. Retrieved 6 May 2021.