Data Set: CPU (computer processing unit)

Nick Paterno November 8th, 2021

You may find this data set in the `openintro` R package or on this page.

Over the last decade technology has become heavily integrated in our everyday lives. That integration has been made easier as technology has improved; computers are faster and more efficient than ever. One way speed is measured is the frequency of the cpu. Most - but not all - cpu's have a base clock and a boost clock. The base clock is the speed the processor runs at idle or for less demanding tasks like word processing. The boost clock is a higher frequency that the processor can run at for more intense tasks like video editing and gaming. In fact, many gamers will "overclock" their cpu to reach a speed higher than the boost clock! A technical question we can ask ourselves is why are processors getting faster? What specifically is causing the increase in speed?

Another feature of many modern cpu's is multithreading. In less technical terms, multithreading allows the computer to do multiple tasks at the same time. Intel refers to this as Hyper-threading and AMD refers to it as Simultaneous Multithreading or SMT. The proportion of new cpu's with multithreading is higher than the proportion released without multithreading ten years ago!

Multithreading allows the computer to do multiple tasks at the same time. There are two bars for each year from 2010 to 2020. One bar per year shows the proportion of new processors with multithreading and one bar per year shows the proportion of new processors without multithreading. The proportion of new processors with multithreading has steadily increased.

Since processors have gotten faster over time and multithreading has become more common, maybe that is the cause. We can look at side-by-side boxplots to get a rough idea.

There are two sets of boxplots. One set is for the base clock, or idle frequency, of a processor. The other is for the boost clock which is the frequency a processor hits during intense workflows. Each set has one boxplot for processors with multithreading and one for those without. In the base clock set, there is no discernable difference between the two types of processors. In the boost clock set, processors with multithreading are consistently faster.

Visually, there does not appear to be much of a difference for the base clock, but there the five-number-summary for the boost clock is consistently higher for multithreaded processors. We can run two-sample t-tests to confirm. As expected, there is not a statistically significant difference for the base clock.

The results of a two-sample t-test where the response variable is the base clock and the explanatory variable is how many threads a processor has, single threaded or multithreaded. The results show a p-value of approximately 0.27 and a 95 percent confidence interval that ranges from -0.0477 to 0.1713. The conclusion is that there is no statistically significant difference.

For the boost clock, there is!

The results of a two-sample t-test where the response variable is the boost clock and the explanatory variable is how many threads a processor has, single threaded or multithreaded. The results show a p-value of approximately one times ten to the negative six power and a 95 percent confidence interval that ranges from 0.1805 to 0.4173. The conclusion is that there is a statistically significant difference.

This is a great example of the difference between statistical significance and practical significance. We can say that yes, there is a statistical difference in the boost clock of processors with and without multithreading. However, the 95% confidence interval shows that this difference is between 0.2 and 0.4 GHz. To the average computer user this difference will not be noticeable, i.e. it is not a practical difference.

It would seem we've closed the case. However, one reason I like to use this data set is the presence of a lurking variable! One way most technology gets faster and more efficient is by becoming smaller. In general, the physical size of a cpu hasn't changed a lot in the last ten years. What has changed is the process node - the semiconductor manufacturing process. In general a smaller process node produces smaller transistors. If transistors are smaller then more of them can fit onto the same size cpu. Smaller transistors are faster and more efficient! The chart below shows that over time, the size of the process node has been decreasing and the boost clock has increased in conjunction with it regardless of multithreading!

A scatter plot with time on the x-axis and average process node size, the manufacturing process for semiconductors, on the y-axis. There is one point per year for multithreaded processors and one point per year for single threaded processors. The size of the points is proportional to the average boost clock of processors released that year. Initially processors had larger processes and lower frequency. Newer processors have smaller processes and higher frequency.