The importance of monitoring your CPU: Meltdown and Spectre
At the end of the 20th century we witnessed something wonderful, since 1977 Dr. Dieter Seitzer and his team at the University Erlangen-Nuremberg thought about digitizing sound, but there was a small problem. The downside was that there wasn’t any hardware that could codify it apart from the computers at the time; which were out of reach for a large part of the population. Ten years later they were joined by Mr. Karlheinz Brandenburg, from the Fraunhofer-Gesellshaft company who could program the LC-ATC which stands for “Low Complexity Adaptive Transform Coding” algorithm along with a group of talented developers, but due to the tremendous amount of time required, it could only be tested in a very limited audio material.
But, what does this have to do with monitoring CPU’s, Meltdown and Spectre? Okay, calm down, now you will see. Well, computers were growing, not only in the Central Processing Units (CPU) but also in all its components. In 1981 CD Compact Discs were invented and there was innovation in all environments. In 1989 the MP3 was patented in Germany and in 1996 the Internet became widespread. 1980 was a wonderful decade for computer science, (you might have witnessed this), but the 90s were going to blow our minds. The processors in 1997 were able to reproduce the MP3 music that we had dreamed about for so long and soon the software was lagging behind with the new processors and the manufacturers noticed such a detail: the systems were wasting clock cycles (computing power per second) even when we listened to Britney Spears or U2 from our 700 megabyte hard drives. In 1999 portable MP3 players were invented, power and miniaturization began to be part of our lives.
Another great addition was the inclusion of the mathematical coprocessors in an integrated manner in the CPUs, chips with a separate socket in the motherboards which were (and still are) responsible for calculating -they do not make decisions or comparisons- in a faster way to then deliver them to the CPU. This meant computers with greater speed and efficiency but we must point out that in 1994 Intel had problems with these coprocessors when returning errors from the range of ten thousandths onwards. The media overly publicized this case; even the company Intel offered to change the defective CPUs free of charge. This serves as a precedent for the issue that concerns us, both for the specialized users and for the rest of people.
The arrival of the 21st century
After the so-called computer chaos of 2000, two companies controlled the CPU production: AMD ® (previously known as Cyrix) and Intel ® .
Both manufacturers had experience i manufacturing processors so they were faster than the existing software at the time. In order to take advantage of wasted CPU cycles, they sold them with the following integrated features:
- Speculative execution: essentially the processor when it comes to a conditional quickly calculates both results before evaluating the conditional, then it returns the correct result and discards the incorrect one.
- Privilege levels: this has nothing to do with processor speed, but with the reservation of memory zones in order to be used only and exclusively by the kernels of current operating systems. This is all about security, user applications closely monitored by the operating system.
- Out of order execution: although it was a milestone at the time, it does not mean more relevance in this case.
- CPU cache: with the rise of fast but expensive memory chips, CPU manufacturers changed the architecture so that small amounts in different layers between the CPU and RAM – were used and are used to store data and programs which are frequently used. This means greater speed but less safety.
- Parallelization of processes: Having two or more processors and/or cores in a computer enhances the previous lines with only a small cost in computers. After the appearance of the Pentium Pro model in 1995, all these conditions became a reality.
The second decade of the 21st century
In 2016 another problem was discovered with the Intel processors, this time when searching for prime Mersenne numbers (the study of prime numbers is important for cryptography and our privacy). This could be verified even with the execution of a legendary software designed to “stress” servers called Prime95 : the computers were “freezing” upon reaching the exponent 14 942 209. Fortunately, Intel was able to distribute a BIOS update. But nothing would prepare us for January 2018, when Meltdown and Spectre were presented one after the other.
January 4, 2018
This was the day when Intel officially admitted the failure Spectre and Meltdown found by the Google Project Zero , and began to be disclosed by the media. In this blog we publish an article that serves as a guide to the type of crisis of information technologies, for those responsible of that company.
How does Meltdown work ?
- The program attacks and requests access to an area of memory that is forbidden by default (1st query).
- Immediately it makes a second request conditioned to the value it expects to obtain from the first consultation.
- Here the speculative execution that we described at the beginning of the article comes into action: the CPU resolves both queries immediately.
- The processor properly detects that both queries do not proceed (the 1st query is a reading of a value outside the allowed area for the program and the 2nd query processes according to the value read from the 1st) and denies both results.
- Although both results are banned in the program, the two queries are left on hold on processor cache memory, since they had been calculated before checking privileges. As we told you, modern CPUs are so fast that they take advantage of all the available time in order to gain speed. AMD processors are not affected by Meltdown since they check privileges before speculative execution. The processors Itanium® (all) and Atom® (prior to 2013) are the only ones from Intel® that are not affected.
- Now, the attacking program makes a third request with the same scheme as the first one but this time in a valid memory area where it has read privilege: if it is returned immediately it means that the first two queries were executed and are in cache. We must say that: the third query is similar to the first onr and the CPU “already knows” since it has it in cache memory and it is “resolved” beforehand.
- Attacking the attacking program again and again creates methodically and systematically a map of the RAM memory against the cache memory: thus knowing what areas of RAM each of the running programs belongs to, gets a correlation with the cache.
- Once the map is assembled, the data collection stage will begin: you may notice a particular program and its “dumpster”
- The term “dumpster” is used to illustrate the operations that remain in the cache: when these data are delivered to the program that legally requested them, the processor simply leaves them there.
How does Spectre work ?
It works in a very similar way but with two variants, one is more difficult to execute than the other one. It takes advantage of the branch predictor , a special case of speculative execution. Due to the nature of this, they affect AMD processors since they rely on breaking the isolation between applications that follow good programming practices, which ironically makes them even more vulnerable to Spectre.
Meltdown and Spectre: its impact on the industry
Imagine a data centre with a thousand computers in which these characteristics are restricted , we would actually have, 700 computers with the same workload and in order to return to the previous level of productivity we would need to buy 300 more computers. Although it will take years to find the solution in the CPU architecture itself, there are many ideas to get out of this quagmire.
How can we be protected?
We can protect ourselves from Meltdown updating the kernel of our operating systems, in the case of Ubuntu, it can be updated without resetting, but for other operating systems resetting is inevitable. For a normal user this is not a problem, but for large companies restarting their servers, in addition to monitoring the workload after “patching“, can mean a great deal of time and money.
Unfortunately for Spectre, it will take time until we have good protection, however as soon as we find a solution to this problem we will write an article about it. Thank you very much for your attention, don’t forget to write down your questions and comments!
Programador desde 1993 en KS7000.net.ve (desde 2014 soluciones en software libre para farmacias comerciales en Venezuela). Escribe regularmente para Pandora FMS y ofrece consejos en el foro. También colaborador entusiasta en Wikipedia y Wikidata.
Machacador de hierros en gimnasios y cuando puede se ejercita en ciclismo también. Fanático de la ciencia ficción.
Programmer since 1993 in KS7000.net.ve (since 2014 free software solutions for commercial pharmacies in Venezuela). He writes regularly for Pandora FMS and offers advice in the forum. Also an enthusiastic contributor to Wikipedia and Wikidata.
Crusher of irons in gyms and when he can he exercises in cycling as well. Science fiction fan.