General
Performance of multiple processors can be measured by using multiple programs or multiple threads in a single program. Examples for the former
BusSpd2K, SS3DSoak and with graphics are below.
Some new multi-threaded benchmarks have been produced with versions to run via 32 bit and 64 bit Windows.
They can also be used to demonstrate Pentium 4 Hyper-Threading.
These tests are described in Win64.htm with the benchmarks being in DualCore.zip plus C/C++ and Assembler programs in NewSource.zip.
Other single CPU 64 bit compilations can be found in Win64.zip
and More64Bit.zip.
The output from the new MP programs is also shown below at:
Maximum Speed Whetstone MP
BusSpeed MP Rand MP.
To Start
BusSpd2K and IntBurn64
One of my older benchmarks, BusSpd2K in BusSpd2K.zip,
and Intburn64, a 64 bit version in More64Bit.zip,
can be used to measure performance of multiple processors via a Reliability Test, an additional useful feature. To run the test a .BAT file is used, with an example following. This specifies KBytes of memory to use, running time of each of 12 tests and different log files to be used. The memory size can be adjusted to test L1 caches, L2 caches or RAM.
Start BusSpd2k Reliability, KB 8, Seconds 1, Log Log1.txt
Start BusSpd2k Reliability, KB 8, Seconds 1, Log Log2.txt
Start IntBurn64 Auto, KB 4, Secs 1, P1, Log testCPU1.txt
Start IntBurn64 Auto, KB 4, Secs 1, P2, Log testCPU2.txt
|
To Start
The 32 bit program runs tests with different data patterns using MMX instructions for the highest speed.
The 64 bit version uses the same data patterns with normal integer arithmetic and up to eight 64 bit registers.
Six tests use write followed by read and 6 tests do multiple reads. An example log file output is given below. The start and end times used can verify that multiple programs have been running at the same time
and of the correct duration, 12 seconds in this case.
Further 64 bit version results are in BurnIn64.htm.
Reliability Test 8 KB, 1 seconds per test, Mon Aug 8 14:42:43 2005
Write/Read
1 11797 MB/sec Pattern FFFFFFFFFFFFFFFF Result OK 720062 passes
2 11767 MB/sec Pattern FFFFFFFFFFFFFFFF Result OK 718231 passes
3 11763 MB/sec Pattern 5A5A5A5A5A5A5A5A Result OK 717968 passes
4 11768 MB/sec Pattern 5555555555555555 Result OK 718283 passes
5 11785 MB/sec Pattern 3333333333333333 Result OK 719283 passes
6 11699 MB/sec Pattern 0F0F0F0F0F0F0F0F Result OK 714066 passes
Read
1 22860 MB/sec Pattern 0000000000000000 Result OK 2790600 passes
2 22870 MB/sec Pattern FFFFFFFFFFFFFFFF Result OK 2791800 passes
3 22890 MB/sec Pattern A5A5A5A5A5A5A5A5 Result OK 2794200 passes
4 22863 MB/sec Pattern 5555555555555555 Result OK 2790900 passes
5 22874 MB/sec Pattern 3333333333333333 Result OK 2792300 passes
6 22875 MB/sec Pattern F0F0F0F0F0F0F0F0 Result OK 2792400 passes
Reliability Test Ended Mon Aug 8 14:42:55 2005
|
To Start
MB/second results on two dual core CPUs are below. The Athlon 64 L1 cache speeds, on reading using MMX instructions, are faster than the Core 2 Duo. It seems that the code used also favours the Athlon 64 using 64 bit integer instructions. The position is reversed for all other results, where most are comparing Core 2 Duo L2 cache speeds with those from Athlon 64 RAM. The shared Core 2 Duo L2 cache is surprisingly fast when being used by two CPUs, except where both could use most of it. Only the 16,000 KB measurements represent memory speeds on both systems. Here, the slower Athlon 64 RAM throughput improvement is better, when using two CPUs.
Core 2 Duo 2400 MHz Vista Athlon 64 2210 MHz XP Pro
32KB L1 4MB L2 800 MHz RAM 64KB L1 512KB L2 400 MHz RAM
Program 32 Bit 64 Bit 32 Bit 64 Bit
KB CPUs Wrt/Rd Read Wrt/Rd Read Wrt/Rd Read Wrt/Rd Read
4 1 3870 15794 4322 16206 8514 20913 12437 22257
2 7287 31401 7737 32248 16926 41503 24684 44389
% 188 199 179 199 199 198 198 199
16 1 8051 16603 8499 16711 13670 22815 18559 23177
2 15761 33014 16483 33114 27304 45290 36821 45996
% 196 199 194 198 200 199 198 198
64 1 8844 13035 9033 12995 15442 23002 18699 23028
2 15945 24899 16185 24884 30833 45677 37533 45234
% 180 191 179 191 200 199 201 196
500 1 9715 13084 9911 13048 8545 9112 8104 10102
2 17222 25111 17243 25033 17031 18023 16168 19957
% 177 192 174 192 199 198 200 198
1000 1 9756 13098 9737 13007 2125 2897 2072 3050
2 17183 24670 17245 25035 2476 4736 2459 4917
% 176 188 177 192 116 163 119 161
2000 1 9567 12980 9664 12919 2101 2898 2074 3014
2 15611 23144 15672 23399 2480 4629 2445 4904
% 163 178 162 181 118 160 118 163
4000 1 8350 11902 8955 12159 2098 2879 2045 3011
2 4095 6720 4185 6657 2477 4693 2485 4873
% 49 56 47 55 118 163 121 162
16000 1 3466 5433 3370 5408 2086 2872 2055 3009
2 3687 6066 3598 6019 2454 4706 2478 4838
% 106 112 107 111 118 164 121 161
|
To Start
SS3DSoak and SSEBurn64
These benchmarks have the same assembly code for burn-in tests using SSE and SSE2 floating point instructions. The former also has options to use 3DNow and is in SSE3DNow.zip with the latter in More64Bit.zip. Testing options are CPU only, Cache (L1) and RAM (cache and RAM) using either SSE or SSE2. Speed is measured in Millions of Floating Point Instructions Per Second (MFLOPS) or MBytes per second (MFLOPS divide MB/s by 4 for SSE and 8 for SSE2) . Examples of BAT file commands used for two CPUs and log file are as follows. See BurnIn32.htm and BurnIn64.htm for more details.
Start SSEBurn64 SSE, CPU, Mins 5, auto, P1, Log Testx1.txt
Start SSEBurn64 SSE, CPU, Mins 5, auto, P2, Log Testx2.txt
Start SSEBurn64 SSE2, Cache, KB 4, Mins 5, auto, P1, Log Testx1.txt
Start SSEBurn64 SSE2, Cache, KB 4, Mins 5, auto, P2, Log Testx2.txt
Start SSEBurn64 SSE2, RAM, KB 128, Mins 5, auto, P1, Log Testx1.txt
Start SSEBurn64 SSE2, RAM, KB 128, Mins 5, auto, P1, Log Testx2.txt
SSE2 Cache Test at 5 minutes and 4 KB, Start at Sat Oct 27 16:54:13 2007
1.01 Minutes at 4785 MFLOPS, No Errors
2.00 Minutes at 4786 MFLOPS, No Errors
3.01 Minutes at 4786 MFLOPS, No Errors
4.00 Minutes at 4786 MFLOPS, No Errors
5.01 Minutes at 4787 MFLOPS, No Errors
Reliability Test Ended Sat Oct 27 16:59:14 2007
|
To Start
Results on the two dual core CPUs are below. This time, the Core 2 Duo is faster on all tests except the dual core 64 KB memory test. The impact of the 4 MB shared L2 cache is again apparent as are the throughput improvements with two CPUs using RAM.
Core 2 Duo 2400 MHz Vista Athlon 64 2210 MHz XP Pro
32KB L1 4MB L2 800 MHz RAM 64KB L1 512KB L2 400 MHz RAM
Program 32 Bit 64 Bit 64 Bit 32 Bit 64 Bit 64 Bit
SSE SSE SSE2 SSE SSE SSE2
CPUs MFLOPS MFLOPS MFLOPS MFLOPS MFLOPS MFLOPS
CPU 1 10162 10256 4549 5885 6094 3139
2 19989 20113 9040 11733 12152 6274
% 197 196 199 199 199 200
Cache 1 9549 9553 4775 7062 7062 3531
2 18868 18891 9445 14081 14091 7044
% 198 198 198 199 200 199
MB/sec MB/sec MB/sec MB/sec MB/sec MB/sec
Memory 1 37252 37352 37327 16833 16967 16969
4 KB 2 73587 73852 73790 33506 33858 33860
% 198 198 198 199 200 200
Memory 1 17758 17850 17844 17544 17536 17540
64 KB 2 31789 31828 31815 34967 34902 35010
% 179 178 178 199 199 200
Memory 1 17753 17730 17721 8852 8281 8780
256 KB 2 31837 31875 31878 17642 16530 17516
% 179 180 180 199 200 199
Memory 1 15951 15714 15805 2962 2959 2951
4096 KB 2 8144 8109 8096 4720 4714 4696
% 51 52 51 159 159 159
Memory 1 6014 6001 6024 2948 2945 2942
64 MB 2 7225 7249 7282 4664 4667 4662
% 120 121 121 158 158 158
|
To Start
Direct3D and CPU Test
VideoD3D9 32 and 64 bit DirectX 9 graphics tests, in Video64.zip, can also be used in conjunction with the CPU burn-in tests, using the same window position format (P1 to P4). For further details see Direct3D Results.htm, 64 Bit Graphics Tests.htm and BurnIn64.htm. Example command lines and log file output are shown below.
Start SSEBurn64 SSE2, Cache, KB 4, Mins 10, auto, P3, Log T11.txt
Start VideoD3D9_64 Auto, Test 6, Width 640, Height 480, P2, Secs 600, Log T21.txt
Textured Objects at 640 x 480 x 32 bits
770.0 Frames Per Second over 60 seconds
779.2 Frames Per Second over 60 seconds
779.4 Frames Per Second over 60 seconds
779.3 Frames Per Second over 60 seconds
779.0 Frames Per Second over 60 seconds
779.3 Frames Per Second over 60 seconds
779.0 Frames Per Second over 60 seconds
774.0 Frames Per Second over 60 seconds
779.4 Frames Per Second over 60 seconds
777.8 Frames Per Second Overall
|
To Start
Results are below for the two dual core systems. Running the graphics test by itself shows that both CPUs can be used (CPU utilisation > 50%) at one extreme. At the other, the program can be limited by the graphics processor speed with CPU utilisation much lower. Running both programs together results in performance degradation of one or both. This was particularly bad on the Athlon 64 based PC. So further tests were run, dedicating CPUs separately via Task Manager Affinity settings, providing an improvement. For the two Core 2 Duo test runs, alternative CPUs and starting order were used.
Core 2 Duo 2400 MHz Vista Athlon 64 2210 MHz XP Pro
GeForce 8600 GT Radeon X800 XL
MFLOPS FPS % CPU MFLOPS FPS % CPU
Util x2 Util x2
640 x 480 4373 779 100 640 x 480 2008 697 85
1280 x 900 4545 312 77 1280 x 1024 2720 350 92
Dedicated CPUs Dedicated CPUs
640 x 480 4445 727 100 640 x 480 3420 717 100
640 x 480 3632 865 100
Stand alone 4786 50 Stand alone 3527 50
640 x 480 926 65 640 x 480 763 60
1280 x 900 315 27 1280 x 1024 369 55
|
To Start
Maximum Speed
Programs CPUIDMP64 and CPUIDMP in DualCore.zip are the same but compiled for 64 and 32 bits. They execute three passes of simple additions to registers attempting to demonstrate maximum CPU speeds. Firstly an integer and an SSE floating point test are run separately. They are then run as two threads of equal priority, where both should run at full speed with 2 CPUs.
This benchmark has a third section using 4 threads, where speed of the last three can vary quite a bit, but should show all at full speed on a quad core CPU. Multiple runs might identify overhead difference between Operating Systems.
Following are example 64 bit and 32 bit results on a 2400 MHz Core 2 Duo and 2210 MHz Athlon 64 X2.
Further results can be found in WhatCPU Results.htm
Core 2 Duo 32 bit Core 2 Duo 64 bit Athlon 64 X2 64 Bit
64-Bit Vista 64-Bit Vista XP Pro x64
Separate Tests
32 bit SSE MFLOPS 9485 9581 9595 9582 9595 9600 4411 4411 4415
32 bit Integer MIPS 6502 6505 6509 6934 6936 6950 6068 6070 6070
Two Threads Equal Priority
32 bit SSE MFLOPS 9581 9562 9564 9501 9600 9600 4405 4409 4408
32 bit Integer MIPS 6802 7032 7036 7002 7006 7013 6067 6053 5992
Four Threads, First Normal Priority, Others Normal - 1
32 bit SSE MFLOPS 9172 9564 9565 9592 9575 9576 4401 4411 4410
32 bit Integer MIPS 3354 3174 3645 3447 3414 3329 2903 2053 2898
32 bit SSE MFLOPS 4706 0 0 4844 0 0 0 1433 0
32 bit Integer MIPS 290 3800 3338 0 3337 3366 3454 2227 3455
|
To Start
The benchmark also shows performance gains using Hyper-Threading. The following is for a 3 GHz Pentium 4E with HT:
P4E HT 3000 MHz
Speed adding to registers Pass 1 Pass 2 Pass 3
Separate Tests
32 bit SSE MFLOPS 5813 5756 5796
32 bit Integer MIPS 8076 8061 8072
Two Threads Equal Priority
32 bit SSE MFLOPS 5224 5216 5168
32 bit Integer MIPS 3241 3253 3278
Four Threads, First Normal Priority, Others Normal - 1
32 bit SSE MFLOPS 3032 3334 3260
32 bit Integer MIPS 461 755 531
32 bit SSE MFLOPS 2689 2357 2377
32 bit Integer MIPS 533 452 671
|
To Start
Whetstone MP
The Whetstone Benchmark has various routines that execute floating point and integer instructions. Programs Whets64MP and Whets32MP in DualCore.zip are the same but compiled for 64 and 32 bits. The benchmark is run in the main thread and another in a low priority second thread which should mainly run at the same speed with two CPUs. See Win64.htm for more information.
Following are for a dual processor P4 Xeon at 3065 MHz and a 3 GHz Pentium 4E with HyperThreading. The latter produces rather unexpected performance gains. Also note that the lower priority thread appears to have been given an equal share of CPU time. The third results are for a Core 2 Duo at 2400 MHz and show superior performance on this old code.
Further results can be found in Whetstone Results.htm
Dual P4 Xeon 3065 MHz
MFLOPS Vax MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
Gmean MIPS 1 2 3 MOPS MOPS MOPS MOPS MOPS
1200 11020 3215 1464 1462 807 81.7 38.0 1307 2488 3292
Thread 1 732 725 404 41.2 19.4 649 1239 1629
Thread 2 732 737 403 40.5 18.6 658 1249 1663
P4E HT 3000 MHz
MFLOPS Vax MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
Gmean MIPS 1 2 3 MOPS MOPS MOPS MOPS MOPS
967 8464 2605 1161 1187 656 60.1 32.5 1611 1385 2173
Thread 1 580 580 330 30.4 16.4 808 734 1604
Thread 2 581 607 326 29.7 16.1 803 651 569
Core 2 Duo 2400 MHz
MFLOPS Vax MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
Gmean MIPS 1 2 3 MOPS MOPS MOPS MOPS MOPS
1446 23473 4718 1702 1697 1046 113 57.9 3793 3622 7531
Thread 1 840 836 525 57.2 29.2 1959 1777 6477
Thread 2 862 861 522 56.0 28.8 1835 1845 1054
|
To Start
BusSpeed MP
This is a variation of Performance Tests used in BusSpd2K covered above.
Two versions of this read only benchmark are available via - 64 bit compiler using 64 bit integer words and 32 bit compiler with 32 bit words. They are BusMP64 and BusMP in DualCore.zip.
BusSpd2K starts by reading words with 64 byte address increments, to indicate memory bus burst reading speed, then reduces the increment to finally read all words sequentially. Speed is measured using data in caches in RAM. In the MP case, address increments start at 32 words (128 or 256 bytes). Besides reading integer numbers, an extra test reads 128 bit SSE2 data. With two threads, each reads all the data from separate arrays, with total passes same as with one thread. The calculated speed is based on the last thread to finish. Nominal elapsed time for each test is 0.5 seconds.
Further details and other results can be found in:
BusSpd2K Results.htm.
The SSE2 test uses assembly code but all others are compiled. For the latter tests, the compiler produces a sequence of 64 instructions that load the data to one register using the required AND instructions. Via L1 cache, this could be expected to produce 1 result per CPU clock cycle or data transfer rate on a 2000 MHz CPU of 8000 MB/second at 32 bits and 16000 MB/second with the 64 bit version.
Single CPU - The first example below is for the 64 bit version running on a single Core 2 Duo CPU (Affinity flag set). This show that total throughput is virtually the same using one and two threads. Also, other measurements show that each thread obtains approximately equal amounts of time. The expected maximum speeds via L1 cache are demonstrated and this is doubled using 128 bit SSE registers.
32 Bit Versions - Single thread results, as expected, produce integer speeds of up to around half those obtained at 64 bits via caches (but see below) and similar performance from RAM. There are variations due to the smaller address increments. The results also identify different performance attributes between the Intel and AMD CPUs.
The Core 2 Duo PC has faster RAM than the Athlon 64 based system, larger and more efficient L2 cache (see 96 KB and 1536 KB) and can execute SSE2 instructions much faster. The Athlon 64 has a larger L1 cache (not demonstrated) and can produce similar execution speeds on integer calculations.
The results from 2 threads are of no surprise, with Core 2 Duo cache speeds being virtually the same as those on a PC using Windows XP. Multi-Threading clearly does not work very well with this sort of code. Performance improvement using two threads are between 1.1 and 1.8 for Core 2 Duo L1 cache and 1.5 to 1.9 for L2. Corresponding Athlon 64 ratios are 1.2 to 2.0 for L1 cache but all L2 cache ratios are around 2. A surprise is that higher throughput can be obtained using shared memory.
64 Bit Version – Except with data from RAM, throughput from two threads using 64 bit integers is much worse that the 32 bit version. Here, the Core 2 Duo performance gain (or loss) is 0.6 to 1.2 from L1 cache and 1.3 to 1.5 from L2 with the Athlon 64 at 0.8 to 1.2 for L1 and 1.3 to 1.9 for L2.
Looking at single thread results shows that the CPU is struggling to achieve the same instruction execution speeds, for example, 2181 MIPS at 64 bits compared with 2316 MIPS at 32 bits. So this will not help.
In all cases of the SSE2 test, at least as far as L1 cache data is concerned, throughput from 2 threads still approaches twice that of 1 thread.
Assembly Code – The last integer test (Read All) was converted to assembly code using the same sequence of 64 AND instructions to one register, followed by to 2 and 4 registers in turn. Address indexing was somewhat different. Results are shown below.
The assembly code produced a mixed set of single thread results, some faster and some slower than the compiler generated code, but the two thread results were comparatively better. In the single register test and L1 cache data, throughput increased from 1.2 times via compiled code to 1.7 times via assembly instructions.
With two independent threads and two CPUs just reading and processing data in separate caches, it is not unreasonable to expect an almost doubling of throughput compared with a single CPU. This is true using 128 bit SSE2 code, with this benchmark, but not when loading data to integer registers, particularly in 64 bit mode. It does seem that Windows is interfering with the data flow.
To Start
BusSpeed MP MB/second
Core 2 Duo 2400 MHz, Vista and Athlon 64 X2 2210 MHz, XP Pro x64
6KB L1 cache, 384KB L2 cache, 1536KB L2 cache C2D RAM AMD, 131070KB RAM
Kbytes Inc32wds Inc16wds Inc8wds Inc4wds Inc2wds ReadAll 128bSSE2
Core 2 Duo 64 Bit - Using one CPU
1 Thread
6 14079 16065 16562 16952 17210 17268 36814
384 3943 4114 3985 6400 9506 13493 18726
1536 3880 3954 3935 6332 9248 13324 18479
131070 588 629 746 1536 2820 5091 5587
2 Threads
6 14155 16167 16460 17104 17211 17217 36797
384 3934 4110 3964 6363 9458 13242 18822
1536 3820 3985 3882 6252 9189 13106 18436
131070 565 625 743 1555 2825 5062 5614
Core 2 Duo 32 Bit
1 Thread
6 7366 8999 8984 9201 9258 9263 37310
96 2030 2014 3241 4522 6658 7935 19061
1536 1961 1985 3190 4408 6504 7842 18677
131070 315 380 782 1409 2574 4717 5711
2 Threads
6 7271 11341 13427 15096 16511 17116 66635
96 3129 2988 4904 7711 12073 15064 31494
1536 3099 2946 5121 7233 11414 14398 30395
131070 313 429 881 1778 3057 5733 7099
Athlon 64 32 bit
1 Thread
6 8131 8462 10388 10057 9882 9958 17391
96 735 657 1238 2368 4888 6366 8913
1536 359 311 564 886 1434 2788 2949
131070 349 314 559 876 1412 2746 2902
2 Threads
6 9234 10367 14787 16078 17366 18515 34466
96 1474 1311 2461 4685 9734 12659 17792
1536 318 327 665 1297 2368 4807 4723
131070 321 334 669 1292 2347 4723 4669
CORE 2 Duo 64 Bit
1 Thread
6 14261 16380 16736 17330 17414 17449 37143
96 4097 4196 4027 6419 9545 13666 19065
1536 3928 3965 3968 6305 9313 13401 18599
131070 594 636 758 1576 2876 5124 5692
2 Threads
6 8106 12608 15512 17690 19165 20354 71247
96 5124 5840 5590 9943 13461 17703 31333
1536 5275 5766 5518 9369 12764 16723 29287
131070 601 627 834 1685 3166 5261 7142
Athlon 64 64 Bit
1 Thread
6 14563 15920 16078 17827 17772 17443 17358
96 1943 1947 1344 2406 4536 9654 8766
1536 642 717 586 988 1499 2897 2940
131070 639 698 592 983 1476 2860 2919
2 Threads
6 11780 12876 13348 18538 19771 21414 34617
96 3121 3045 2569 4674 8281 12510 17488
1536 557 630 642 1270 2226 4076 4704
131070 558 632 642 1263 2213 4063 4684
64 Bit Compiler, 64 Bit Integers, Read All
Core 2 Duo Athlon 64
Original Assembler Original Assembler
1 Reg 1 Reg 2 Reg 4 Reg 1 Reg 1 Reg 2 Reg 4 Reg
1 Thread
6 17449 18384 18662 17996 17443 16360 30651 29112
96 13666 13221 13392 13378 9654 9446 9493 9487
1536 13401 13227 13299 13129 2897 2911 2933 2930
131070 5124 5111 5126 5052 2860 2843 2917 2900
2 Threads
6 20354 30635 30727 30478 21414 28851 46191 44173
96 17703 24168 23977 24092 12510 18794 18913 18788
1536 16723 22714 22935 22431 4076 4790 4737 4802
131070 5261 6097 6030 6136 4063 4727 4693 4751
|
To Start
RandMP
This is a variation of the benchmark in RandMem.zip. The program uses the same code for serial and random use via a complex indexing structure and comprises Read (RD) and Read/Write (RW) tests. They are run to use data from L1 cache, L2 cache and RAM, firstly as a single thread and secondly using two threads. Indexing overheads lead to slower speed than BusSpeed above (ReadAll).
Programs RandMP64 and RandMP32 in DualCore.zip are compiled to run via Win64 and Win32. See Win64.htm for more information. Following is again for Athlon 64 X2 Dual Core 4200+ 2.21 GHz using Windows XP Pro X64 and
Core 2 Duo 2.4 GHz using Windows Vista 64 Bit Version.
Further results can be found in Randmem Results.htm
Using one thread, AMD RW speed is slower than RD and speed reduces using larger data size with random access. The full benefit of two CPUs is demonstrated by the RD tests but RW functions can show overall lower throughput than using one thread, with L1 cached data.
This is probably due to changed data being written to shared main memory, even though the other CPU may not be using it.
Effectively, Serial RW speed of two CPUs can be half of that of a single CPU and less than a quarter with Random RW. The benefit of the shared Core 2 Duo L2 cache is demonstrated.
Modifying the benchmark, so that each thread accesses its own data array, enables RW cache tests to run at full speed on each CPU.
Athlon 64 X2 Dual Core 2210 MHz, 64 bit version, XP Pro x64
------------------ MBytes Per Second At --------------------
6 KB 24 KB 96 KB 384 KB 768 KB 1536 KB 12 MB 96 MB
1 Thread
Serial RD 8552 8518 5115 5132 2369 2353 2344 2305
Serial RW 4346 4340 2702 2697 1349 1352 1354 1351
Random RD 8176 8244 3733 1620 872 389 255 170
Random RW 4384 4332 2865 1483 563 236 161 136
2 Threads
Serial RD1 8374 8532 5064 5010 2075 2096 2021 2026
Serial RD2 8532 8394 5176 5108 2111 2062 2049 2054
Serial RW1 1090 1172 1110 1096 1041 867 864 866
Serial RW2 1083 1136 1089 1076 1049 866 855 824
Random RD1 8147 8024 3683 1638 485 193 126 100
Random RD2 8154 8158 3701 1637 485 195 125 101
Random RW1 494 489 448 406 352 152 86 75
Random RW2 495 490 449 406 343 152 87 75
For approximate speed in MIPS divide MBytes/Second by 3.2
Core 2 Duo 2400 MHz, 64 bit version, 64 Bit Vista
------------------ MBytes Per Second At --------------------
6 KB 24 KB 96 KB 384 KB 768 KB 1536 KB 12 MB 96 MB
1 Thread
Serial RD 8483 9431 7859 7781 7867 7916 4420 4484
Serial RW 8775 9319 7552 7548 7515 7301 2436 2389
Random RD 8633 9500 4214 3322 3202 2794 630 456
Random RW 8812 9101 3408 2742 2660 2365 405 283
2 Threads
Serial RD1 8699 9194 7623 7581 7571 7729 4069 4117
Serial RD2 8662 9289 7422 7415 7417 7249 4069 4117
Serial RW1 2061 2164 6754 6898 6779 6660 1661 1771
Serial RW2 2049 2168 6881 6728 6864 6787 1561 1771
Random RD1 8660 9267 3461 2675 2563 2278 447 444
Random RD2 8617 9443 3531 2724 2610 2317 453 444
Random RW1 738 775 1355 1937 1954 1859 343 285
Random RW2 735 774 1365 1963 1982 1882 343 285
|
To Start
Next are 32 bit version results for two Core 2 Duo PCs at 2400 MHz with 533 MHz Dual Channel DDR2 RAM and different chipsets, followed by those for Pentium D 2800 MHz (with 400 MHz Dual Channel DDR?).
Each of PD and C2D is faster on certain tests and effects of C2D larger caches are apparent.
The Core 2 Duo PC using the Intel 965 chipset is much faster on some RAM based results and some writing/reading cache tests.
Unlike the other dual CPUs, the random read/write tests on Core Duo 2 show a higher total throughput with two threads when using L2 cache, a clear benefit of using a L2 cache shared by the CPUs.
Core 2 Duo 2400 MHz nForce 570 chipset
------------------ MBytes Per Second At --------------------
6 KB 24 KB 96 KB 384 KB 768 KB 1536 KB 12 MB 96 MB
1 Thread
Serial RD 9487 9426 7919 7880 7931 7795 3344 3130
Serial RW 9515 9271 7594 7518 7562 7244 587 593
Random RD 9501 9448 4251 3317 3217 2781 174 121
Random RW 9254 9208 3383 2762 2649 2375 117 85
2 Threads
Serial RD1 9374 9485 7741 7796 7741 7775 3127 3001
Serial RD2 9477 9430 7413 7533 7438 7499 3083 2966
Serial RW1 678 679 6871 6944 6920 6854 451 566
Serial RW2 679 681 6916 6803 6875 6877 436 566
Random RD1 9373 9482 3502 2715 2597 2289 162 113
Random RD2 9379 9545 3509 2744 2615 2301 158 113
Random RW1 251 165 288 906 1576 1756 69 58
Random RW2 252 165 288 905 1573 1755 69 58
Core 2 Duo 2400 MHz Intel 965 chipset
------------------ MBytes Per Second At --------------------
6 KB 24 KB 96 KB 384 KB 768 KB 1536 KB 12 MB 96 MB
1 Thread
Serial RD 9524 9536 7954 7939 7957 7935 4119 4044
Serial RW 9541 9458 7601 7613 7606 7361 2046 2003
Random RD 9511 9562 4260 3350 3231 2822 548 393
Random RW 9386 9260 3424 2769 2678 2384 349 242
2 Threads
Serial RD1 9479 9519 7839 7824 7842 7797 4040 3996
Serial RD2 9513 9450 7527 7531 7519 7487 4040 3996
Serial RW1 1490 1694 6936 6962 6961 6886 1420 1388
Serial RW2 1487 1695 6939 6956 6949 6886 1420 1388
Random RD1 9484 9524 3538 2720 2618 2318 527 388
Random RD2 9514 9535 3569 2741 2632 2331 527 388
Random RW1 532 634 1101 1965 1988 1896 265 218
Random RW2 534 636 1104 1966 1983 1898 265 218
Pentium D 2800 MHz
------------------ MBytes Per Second At --------------------
6 KB 24 KB 96 KB 384 KB 768 KB 1536 KB 12 MB 96 MB
1 Thread
Serial RD 11614 7288 7192 7087 7054 2952 2853 2905
Serial RW 2279 1657 1377 1446 1439 1360 1359 1359
Random RD 11621 6010 3708 2419 1367 491 182 163
Random RW 3630 2566 1669 1467 1215 340 144 131
2 Threads
Serial RD1 11472 7189 7191 6924 6799 1918 1917 1907
Serial RD2 11533 7065 6948 6981 6701 1933 1967 1939
Serial RW1 715 863 762 808 829 987 946 939
Serial RW2 710 853 769 805 814 892 868 862
Random RD1 11467 5973 3649 2341 1332 245 91 81
Random RD2 11547 6042 3767 2300 1348 251 93 83
Random RW1 191 172 156 146 141 112 71 65
Random RW2 193 174 157 147 142 114 72 65
|
To Start
This benchmark again shows some performance gains and some losses using Hyper-Threading. The following is for a 3 GHz Pentium 4E with HT:
Pentium 4E HT 3000 MHz
------------------ MBytes Per Second At --------------------
6 KB 24 KB 96 KB 384 KB 768 KB 1536 KB 12 MB 96 MB
1 Thread
Serial RD 11217 7146 7030 6943 6911 2979 2982 2983
Serial RW 2255 1641 1429 1423 1425 1352 1351 1344
Random RD 11410 6017 3843 2391 1368 471 181 168
Random RW 3593 2649 1793 1465 1234 315 135 125
2 Threads
Serial RD1 5134 4143 4035 3950 3935 2583 1964 1862
Serial RD2 5095 4089 4009 3910 3890 2564 1930 1840
Serial RW1 1522 1262 1148 1188 1193 1036 1024 1001
Serial RW2 1500 1239 1133 1146 1184 1019 912 789
Random RD1 5125 3580 2594 733 564 279 96 95
Random RD2 5070 3544 2558 720 557 275 92 93
Random RW1 1709 1536 1367 714 560 205 74 80
Random RW2 1677 1501 1342 702 552 199 71 78
|
To Start
Updated October 2007
The new Internet Home for my PC Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection
|