I was asked to help a friend to chose a laptop (notebook) as his previous one was on the way to die.
REQUIREMENTS
...as first idea were determined to find the best solution on the market matching his needs, such as:
- strong CPU, at least an Intel Core i7 (at least 9th generation) or an AMD Ryzen7 (3xxx at least),
- strong GPU, a dedicated one with at least 2 GB video RAM,
- SSD drive to make it fast, along with DDR4 RAM (size does not matter above 6 GB, as can be increased later),
- and other wishes which are not relevant from the point of view of this post.
We all know that to find your favorite device is not an easy task as you plan to have it for several years and hope that it gives you more joy and success than irritation or disappointment.
the CONCEPT
Chosing the right one is in general starts with online search or with real window shopping at first step. The last is for those who wants to feel (touch & watch) the variety of the hardwares, but this was not the case for us.
There are plenty websites with wide spread keyword based search engines. If you are lucky than you know a website which only gathers information from other sites and help customers in finding the list... a vaste list, which at the end is disturbing and increases your doubtfullness whether you make the right choice at the end?
So we have chosen pure and simple comparison of 2 components: CPU and GPU (from which I show the CPU part).
So we have chosen pure and simple comparison of 2 components: CPU and GPU (from which I show the CPU part).
Let's dive in the details of the CPUs.
BENCHMARK
There are plenty of benchmarks on the web to find the 'ultimate comparison' list of the CPUs, or GPUs (or complete laptops), but in which can we trust? How to use and understand benchmark values, Intel's hints may be useful.
He found a table of different benchmark results of some CPUs (an outdated copy of some other website, translated to hungarian, such as this one on NoteBookcheck.net data) and asked me to analyse it. This table contains the following benchmarks:
- 3DMark, beside other options, it is able to check CPU workload processing capabilities, checks single and multithreading abilities.
- Cinebench, which is based on the Cinema 4 Suite and the benchmark have different options, such as 32 (not used here) and 64 bit tests, varied with Single Core and Multi core (hyperthreading) test modes.
- x264 Pass benchmark determines CPU performance how fast it encodes a 1080p video into the HD x264 video format.
From the viewpoint of the analysis:
- for all above mentioned benchmark results the higher numbers indicate better performance,
- single benchmarks can test some properties of the CPU but may not reveal all keypoints,
- there is a rumour that CPU makers are specialising their CPUs to have the best results with defined test softwares, but for sure they cannot optimise for all. It would be a large benefit for us Users, but it is either impossible or ... results in an expensive hardware.
So the data analysis have been started through the steps of...
ETL
... jump over this section if you are not interested, see Results***
Well this is not a real ETL (in this post, see for real ETL here***) as the process here will be shown using
- Microsoft Excel
- Google Sheets
- Notepad++
- Google Data Studio
- Data extraction and cleaning
The data of the table was copied to an excel worksheet. But of course everything fall apart as Excel considers that is able to figure out what we want and what we mean by the inserted content (type, the hungarian decimal comma separator, converting numbers to dates,...), but otherwise there would be no need for ETL. :)
I tried other modes, such as Notepad++ but the data on the hungarian site was not well configured, so it turned out even worse. I continued with Excel, saved into csv format.
I switched to NotePad++ to use the Exchange option using regexps.
The cleaning:
- header removed,
- separator was selected to be semicolon (so not a strict csv file was made), to eliminate further errors of
- decimal separator, dot exchanged to comma (safety mode, made row-by-row): . → , to match my regional settings
detailed ETL steps (involving steps below)
Regrouping and extracting data
- The aim was to have numbers only if possible, without physical units. The following changes have been made, removed:
- not required units, names,
- splitting CPU made, type and model (space and '-' as separators) into 3 columns (first creating empty columns to the right, not to overwrite existing data)
- splitting CPU model and modifying symbols
- suspicious number verification and fixing (sometimes by internet search)
- Preparation for Machine Learning
Ultimate conversion of all remaining text based descriptors to numbers.
The result as data table was not used in the following steps, but have been saved for ML process in a separate csv file. (see in another post)
- Data load
Data was loaded to Excel (new worksheet) and into Google Data Studio as well, for demonstration purposes.
- not required units, names,
- splitting CPU made, type and model (space and '-' as separators) into 3 columns (first creating empty columns to the right, not to overwrite existing data)
- splitting CPU model and modifying symbols
- suspicious number verification and fixing (sometimes by internet search)
Ultimate conversion of all remaining text based descriptors to numbers.
The result as data table was not used in the following steps, but have been saved for ML process in a separate csv file. (see in another post)
Data was loaded to Excel (new worksheet) and into Google Data Studio as well, for demonstration purposes.
Visualization
None of the below softwares are perfect for the aim, but both can be used to retrieve some information, of course GDS is way better for this analysis purpose. Missing features of the softwares are mentioned in the text.
- Excel for the basics
Depending on the number of cores, the power consumption may vary a lot. This parameter also determines the performance, however the best performance does not go along with the highest power, see that real outlier at 100 W:
To have faster CPUs, the base and all related communication (BUS) frequency is following an increasing tendency since ... ever. Of course it is not only the frequency, which determines the final efficiency, but it is a highly relevant parameter (as a rule of thumb for any kind of electronical processing and communication). The reached benchmark values (3D Mark in this case) are plotted against maximum turbo frequencies (in MHz), in case the turbo frequency was not defined, the CPU's generally determined frequency was used:The best 3D benchmark result: Intel Core-i9 9980HK and the worst (among those which has data given) Intel Core-i5 4300Y ... of course these are not solid statements and do not count as the aim was something medium amongst the medium level CPUs, for an affordable price. (This is not an Intel ... 9980HK advertisement! Just a statement based on the given data (For fanatics and censorious, I have checked: AMD Ryzen 9 3900X is better now, Sep 2020)!)
Excel is not appropriate to make charts/plots with labeled data points, which would facilitate finding the best/worst/any item. There are details (data series name, X and Y value), but that is not helpful in all cases and not interactive at all!
(The word data series in hungarian: "adatsor" has been broken in to two, unknown reason, must be a bug in Excel :) )Excel can deal with simple data rows:Lots of conceptual plotting errors, useless/incomplete information in the pop-up panelIt is not what we want, but draws our (data cleaner) attention on missing (zero) values and general things to be aware of. - Data Studio for the fine details
In this situation data was prepared (structured) with the help of Excel and Notepad++, but of course new Parameters and Fields can be easily defined in Data studio, such as a benchmark results weighted with the power consumption which may be a good feedback on
Finding the One
Excel may be forced to plot several data series of different CPU models in a way that the CPU model is shown on mouse hover action
... yes, it can be done, but Excel removes CPU model names of course and consequently does not show on the X axis (why or where should it show if there are several data series at the same time?) rather plots the data in a row with simple series number in the given data serie. Consequently the start of the X axis becomes crowded;
... but you can customize colors, symbols, size... (as in the figure above). I still consider this awkward.
The best looking (scatter) plot shows the ratios of benchmarks (here the first two from the list above):
The top-left quarter represents such CPUs where the major part of the benchmark result is originating from the single threading abilities, while in the bottom-right quarter is a group of multithreading optimized CPUs. There are a few dots in the top-right quarter (close to the crossing of the mean (calculated without zeros!) lines' matching point. Those processors are considered to be half-way in between the above mentioned two groups... but wait! This is a ratio of one benchmark (3DMark this case) over other benchmark (Cine single / multi) value, so if those are 'bad' in both, then what else hardware made the final (3DMark) benchmark result the defined value that it finally became? At this point we were not interested in the answer, we went for multithreading CPUs. Please dive into 3DMark benchmark description if you are interested in the precise way the results are calculated.
It would be ideal to have the CPUs in the bottom-left quarter as that meant that those are excellent both in single and in multi thread processing, there are a few, oh wait (!)
It would be ideal to have the CPUs in the bottom-left quarter as that meant that those are excellent both in single and in multi thread processing, there are a few, oh wait (!)
- again, no clue which one those are (as Excel plot does not help, only if I check manually based on the values read from the pop-up label), besides
- if large part of the CPUs would bappear in that region then the plot would shift only (toward zeros) and it would look similar just with other mean values... nasty Statistics. :)
(By the way, by manual search these were found to be, from bottom-up: AMD Ryzen 7 3700U; AMD Ryzen 5 2500U; AMD Ryzen 3 2200U; Intel Core-i7 3632QM, which have relatively low 3DMark result.)
It is a clear outcome that there are strict lines of averages which definitely split the upper-lower and the left-right domains, which is not an evident outcome of their average nature, as dots could have bean spread in all 4 domain and result in the same average values. This could be derived from the limitation of CPU properties originating from physics laws and the way as 3DMark calculates the benchmark results from multiple test parts.
On the top left the panels with defined details are present which can be used as controllers (selectors) to play around the CPU made/model/serie/... On the left bottom the regular one-type benchmark (3D Mark, this case) results are shown, for the best 10 CPUs (from which 1065G7 is highlighted as 8th), while at the very bottom an unusual top list of benchmark value over power consumption is shown. In this list the higher the better (as above) and the winner is the above mentioned 1065G7 model.


On the top (right) the 4 scatter plots are just simple list number (ID, which is kind of a chronological list as well) related presentation of ratios of different benchmark results. The presented ones are selected list of the possible ratios, combination of these:
- 3DMark 06 CPU
- Cinebench R15 SingleCPU 64bit
- Cinebench R15 MultiCPU 64bit
- x264 Pass 1
- x264 Pass 2
If CPU types are used to color the data set then it is easier to differentiate bethween the products:
Check it interactively here: Data analysis
- Top-left: 3DMark 06 value / Cinebench R15 SingleCPU 64bit value. This, such as the followings are complex parameters, representing "how the 3DMark result depend on the single thread processing of the CPU". We can state that by newer CPUs (at the right side of the scatter plot) have less impact on 3D Benchmark, so that the ratio is increasing (slightly).
1065G7 is among the worst ones, however it has a decreased anount of threads compared to other 10th generation Intel CPUs, with the aim of low power consumption. - Top-right: 3DMark 06 value / Cinebench R15 MultiCPU 64bit value. In contrast with the previous, this ratio is decresing which means that in spite of the higher and higher 3DMark results, the determining component of this benchmark is the multitheading.
Note: multithreading may be optimally used only with softwares written for parallel processing, otherwise, using single threaded software is a kind of waste of achievable performance.
1065G7 preforms a bit outlying (on the wrong direction) among the freshly issued CPUs. - Bottom-left: 3DMark 06 value / x264 Pass 1 value and
Bottom-right: 3DMark 06 value / x264 Pass 2 value
Both of those show an average performance of video encoding regarding the
And we have verified CPUs with very similar performance and found AMD Ryzen 5 4700U (green filled spots):
In single thread based processes there is no relevant difference, in multithreading the difference is more evident, but (!) the price of the CPUs, and especially the price of the laptops which are provided with these CPUs has so big difference that my friend started to reconsider his first defined keypoints. It took a while to admit that this slight difference of the performance will not affect his computer usage experience and to accept that he will not have a Ryzen 7 (or Intel Core i7)... can be survived.
This is the story how data analysis helped to save approximately 10% of the price of a higher quality and more expensive laptop, which in turn can be spent on accessories.
To have the choice made a few more data preparation steps have been done and there are further interesting things to discover on other plots (2nd page), please read it in the CPU trends post.
No comments:
Post a Comment