CPU trends

 In the CPU benchmark post lots of steps have been shown, but some additional data had to be added as the original data source was a bit outdated. Yes, if you start a business, then ... we could not stop at the most wanted plots.

Check and supplement database

First, about the missing entries of recently made CPUs on the market I made a search for better databases and as I have mentioned the one from NoteBookcheck.net was useful. Inmediately verified some of the selected CPUs in competetion for the first place and found matching data. At the same time I started to hunt for outliers, like the one labeled with arrow:


This was found to be a real outlier, so no data error have been found, it is just simply a CPU bad in X264 pass 1 benchmark (video conversion).

Then I found a data conversion error on the original data source site, among the data of the power consumption:

A few of AMD CPUs had this extremely high power consumption value which of course is a result of wrong data conversion extraction from some original source, which was checked on NooteBookCheck.net:
Oh, yes, the hyphen. For sure this was misinterpreted by Excel and noone verified the data put on the website. Note: the higher numbers were taken as power values as the best performance data is shown in the other columns which for sure were reached by the highest power consumption state... anyhow it does not count, except if you make a scatter plot like this:

Page 2 on the CPU benchmark Data Analysis dashboard
 
A secondary consideration was to select a CPU with low consumption (to stay green, but rather to have long up-time on batteries), which at the end was met with the Ryzen 5 4700U (15W). 

But back to the check-and-fix steps. To have the top CPUs also in the list not to be mislead by old data and to know the real quality of the selected CPUs in the contest for the first place in our decision making process I have involved some CPUs with the highest 3DMark benchmark result. To match the data of the our original data source and the NoteBookCheck.net data clearing, conversion and extraction was made by a Google sheet (using some automatized cell function, see in the image):


Of course, Excel would have been good as well for this task, on the other hand, any scripting (language) would have been superfluous.

Helpful colors ...

The basic scatter plot is not easy to read, especially as the results are plotted versus CPU ID (list number) here ...


... thus I tried to differentiate the two companies, AMD and Intel...


and then chose to use processor types to color the bubbles. Color coding if all CPU are present is still not enough, but if you select some subset then it turns to be useful. See interactively on my Google Data CPU benchmark dashboard (page 2).



This Type based coloring was chosen for almost all the other panels, as well.


CPU trends

... finally, about such things which are widely known, but with some twist:


Already mentioned above, and widely known that higher computational speed requires and results in  higher power consumption (TDP). Here again the ratio of 3DMark result / Cinebench Multithreading plotted versus power consumed. It is visible that there are strict values for the CPU power consumptions and if you chose one value then there are CPUs widely spread vertically, which of course means that the relation is not linear, rather there are other factors which determine the final benchmark results. We can also state that in this ratio versus power plot there is no specific relation between these factors so low power CPUs can be as good as highly consuming ones. Check that outlier at 100 W, with an average ratio value!



Similarly, the number of cores in the processor results in a widely spread benchmark (here 3D Mark) values if you chose one value (e.g. 4), so here again, not the core number is the ultimate parameter that determines the result. The increasing tendence of performance by core number is clear (in all other benchmarks as well, so those are not shown) so people are waiting for 10-12-16 core CPUs... wait! Could it increase infinitely? Not all properties can be increased without reaching a limit defined by nature or technical implementation:


So if you follow the bending curve of benchmark results by increasig L3 cache size, then a limit around 32 MB is already visible, which is also a limit of profit for the producers, but don not be afraid, first because the L3 cahce is not the most important part regarding the performance, secondly, they will, for sure, figure out something to follow Moore's law.

It is the Cinebench Multithreading benchmark which (still) shows a linear tendency, saying that in conversion of videos L3 cahce size can be relevant and will not be a limiting factor for a while. Good news for the video content editors and creators.


There is always a competition of overclocking the CPUs until those get almost burnt down just to prove that my CPU can perform better than it is said in the datasheet. So here there are some plots of minimum and maximum (turbo) frequency related benchmark results (similar for all type of benchmarks).


This is shown mainly because this looks great... and on the other hand here you can see the trends clearly.

For sure, PC- / laptop- lovers will be satisfied with the trends in the future and in spite of te fact that technology is again getting close to a new barrier, but I remember ~10 years ago the situation was the same and ... clever engineers solved the limitation of wire size of the era.






CPU benchmark - helping the decision

 I was asked to help a friend to chose a laptop (notebook) as his previous one was on the way to die.

REQUIREMENTS

...as first idea were determined to find the best solution on the market matching his needs, such as:

  • strong CPU, at least an Intel Core i7 (at least 9th generation) or an AMD Ryzen7 (3xxx at least),
  • strong GPU, a dedicated one with at least 2 GB video RAM,
  • SSD drive to make it fast, along with DDR4 RAM (size does not matter above 6 GB, as can be increased later),
  • and other wishes which are not relevant from the point of view of this post. 
We all know that to find your favorite device is not an easy task as you plan to have it for several years and hope that it gives you more joy and success than irritation or disappointment.

the CONCEPT

Chosing the right one is in general starts with online search or with real window shopping at first step. The last is for those who wants to feel (touch & watch) the variety of the hardwares, but this was not the case for us.
There are plenty websites with wide spread keyword based search engines. If you are lucky than you know a website which only gathers information from other sites and help customers in finding the list... a vaste list, which at the end is disturbing and increases your doubtfullness whether you make the right choice at the end?
So we have chosen pure and simple comparison of 2 components: CPU and GPU (from which I show the CPU part).
Let's dive in the details of the CPUs.

BENCHMARK

There are plenty of benchmarks on the web to find the 'ultimate comparison' list of the CPUs, or GPUs (or complete laptops), but in which can we trust? How to use and understand benchmark values, Intel's hints may be useful.
He found a table of different benchmark results of some CPUs (an outdated copy of some other website, translated to hungarian, such as this one on NoteBookcheck.net data) and asked me to analyse it. This table contains the following benchmarks:
  • 3DMark, beside other options, it is able to check CPU workload processing capabilities, checks single and multithreading abilities. 
  • Cinebench, which is based on the Cinema 4 Suite and the benchmark have different options, such as 32 (not used here) and 64 bit tests, varied with Single Core and Multi core (hyperthreading) test modes.
  • x264 Pass benchmark determines CPU performance how fast it encodes a 1080p video into the HD x264 video format.
From the viewpoint of the analysis:
  • for all above mentioned benchmark results the higher numbers indicate better performance,
  • single benchmarks can test some properties of the CPU but may not reveal all keypoints,
  • there is a rumour that CPU makers are specialising their CPUs to have the best results with defined test softwares, but for sure they cannot optimise for all. It would be a large benefit for us Users, but it is either impossible or ... results in an expensive hardware. 
So the data analysis have been started through the steps of...

ETL

... jump over this section if you are not interested, see Results***
Well this is not a real ETL (in this post, see for real ETL here***) as the process here will be shown using
  • Microsoft Excel
  • Google Sheets
  • Notepad++
  • Google Data Studio

  1. Data extraction and cleaning



The data of the table was copied to an excel worksheet. But of course everything fall apart as Excel considers that is able to figure out what we want and what we mean by the inserted content (type, the hungarian decimal comma separator, converting numbers to dates,...), but otherwise there would be no need for ETL. :)
I tried other modes, such as Notepad++ but the data on the hungarian site was not well configured, so it turned out even worse. I continued with Excel, saved into csv format.






As you can see data is not well organized and a lot of element were defined in a wrong way.
I switched to NotePad++ to use the Exchange option using regexps.

The cleaning:
  • header removed,
  • separator was selected to be semicolon (so not a strict csv file was made), to eliminate further errors of
  • decimal separator, dot exchanged to comma (safety mode, made row-by-row):  . → , to match my regional settings

detailed ETL steps (involving steps below)

Regrouping and extracting data

  1. The aim was to have numbers only if possible, without physical units. The following changes have been made, removed: 
    • not required units, names, 
    • splitting CPU made, type and model (space and '-' as separators) into 3 columns (first creating empty columns to the right, not to overwrite existing data)
    • splitting CPU model and modifying symbols



    • splitting number of cores and threads
    • splitting turbo min/max frequency values separated by a hyphen (data not used, just for the sake of ETL and real fanatics)
      The result:
    • suspicious number verification and fixing (sometimes by internet search)

  2. Preparation for Machine Learning

    Ultimate conversion of all remaining text based descriptors to numbers.
    The result as data table was not used in the following steps, but have been saved for ML process in a separate csv file. (see in another post)


  3. Data load
    Data was loaded to Excel (new worksheet) and into Google Data Studio as well, for demonstration purposes.

Visualization

None of the below softwares are perfect for the aim, but both can be used to retrieve some information, of course GDS is way better for this analysis purpose. Missing features of the softwares are mentioned in the text.
  1. Excel for the basics

    Depending on the number of cores, the power consumption may vary a lot. This parameter also determines the performance, however the best performance does not go along with the highest power, see that real outlier at 100 W:



    To have faster CPUs, the base and all related communication (BUS) frequency is following an increasing tendency since ... ever. Of course it is not only the frequency, which determines the final efficiency, but it is a highly relevant parameter (as a rule of thumb for any kind of electronical processing and communication). The reached benchmark values (3D Mark in this case) are plotted against maximum turbo frequencies (in MHz), in case the turbo frequency was not defined, the CPU's generally determined frequency was used: 


    The best 3D benchmark result: Intel Core-i9 9980HK and the worst (among those which has data given) Intel Core-i5 4300Y ... of course these are not solid statements and do not count as the aim was something medium amongst the medium level CPUs, for an affordable price. (This is not an Intel ... 9980HK advertisement! Just a statement based on the given data (For fanatics and censorious, I have checked: AMD Ryzen 9 3900X is better now, Sep 2020)!)

    Excel is not appropriate to make charts/plots with labeled data points, which would facilitate finding the best/worst/any item. There are details (data series name, X and Y value), but that is not helpful in all cases and not interactive at all!

    (The word data series in hungarian: "adatsor" has been broken in to two, unknown reason, must be a bug in Excel :) )

    Excel can deal with simple data rows:

    Lots of conceptual plotting errors, useless/incomplete information in the pop-up panel

    It is not what we want, but draws our (data cleaner) attention on missing (zero) values and general things to be aware of.



  2. Data Studio for the fine details
    In this situation data was prepared (structured) with the help of Excel and Notepad++, but of course new Parameters and Fields can be easily defined in Data studio, such as a benchmark results weighted with the power consumption which may be a good feedback on





Blog design

If you wish to move Pages to the top to be present as a Menu.
Ha fent, menü-szerűen szeretnéd megjeleníteni az Oldalakat (Pages).







Save at the bottom, then Save at the bottom right.
Ments alul és Ments még egyszer a jobb alsó sarokban!

To Select Pages to show (note: changed blog color design may uncover hidden pages!):
A megjelenítendő Oldalak (Pages) kiválasztása (angolul):
https://support.google.com/blogger/answer/165955?hl=en

Snowflake universe, part #6 - Forecasting2

Forecasting with built-in ML module Further posts in  Snowflake  topic SnowFlake universe, part#1 SnowFlake, part#2 SnowPark Notebook Snow...