Title

Dual Core CPU PC Benchmarks

Main Page
Internet Home

This page was set up as 770 pixels wide and accommodates preformatted text <PRE> results tables. Some browsers
produce monospaced font of an unexpected size but this might be adjustable via browser Preferences.

General

Performance of multiple processors can be measured by using multiple programs or multiple threads in a single program. Examples for the former BusSpd2K, SS3DSoak and with graphics are below. Some new multi-threaded benchmarks have been produced with versions to run via 32 bit and 64 bit Windows. They can also be used to demonstrate Pentium 4 Hyper-Threading. These tests are described in Win64.htm with the benchmarks being in DualCore.zip plus C/C++ and Assembler programs in NewSource.zip. Other single CPU 64 bit compilations can be found in Win64.zip and More64Bit.zip. The output from the new MP programs is also shown below at:

Maximum Speed   Whetstone MP   BusSpeed MP   Rand MP.


To Start

BusSpd2K and IntBurn64

One of my older benchmarks, BusSpd2K in BusSpd2K.zip, and Intburn64, a 64 bit version in More64Bit.zip, can be used to measure performance of multiple processors via a Reliability Test, an additional useful feature. To run the test a .BAT file is used, with an example following. This specifies KBytes of memory to use, running time of each of 12 tests and different log files to be used. The memory size can be adjusted to test L1 caches, L2 caches or RAM.


 Start BusSpd2k Reliability, KB 8, Seconds 1, Log Log1.txt
 Start BusSpd2k Reliability, KB 8, Seconds 1, Log Log2.txt

 Start IntBurn64 Auto, KB 4, Secs 1, P1, Log testCPU1.txt 
 Start IntBurn64 Auto, KB 4, Secs 1, P2, Log testCPU2.txt

To Start


The 32 bit program runs tests with different data patterns using MMX instructions for the highest speed. The 64 bit version uses the same data patterns with normal integer arithmetic and up to eight 64 bit registers. Six tests use write followed by read and 6 tests do multiple reads. An example log file output is given below. The start and end times used can verify that multiple programs have been running at the same time and of the correct duration, 12 seconds in this case. Further 64 bit version results are in BurnIn64.htm.

 
 Reliability Test 8 KB, 1 seconds per test, Mon Aug  8 14:42:43 2005
 
 Write/Read
  1  11797 MB/sec  Pattern FFFFFFFFFFFFFFFF 	 Result OK    720062 passes
  2  11767 MB/sec  Pattern FFFFFFFFFFFFFFFF 	 Result OK    718231 passes
  3  11763 MB/sec  Pattern 5A5A5A5A5A5A5A5A 	 Result OK    717968 passes
  4  11768 MB/sec  Pattern 5555555555555555 	 Result OK    718283 passes
  5  11785 MB/sec  Pattern 3333333333333333 	 Result OK    719283 passes
  6  11699 MB/sec  Pattern 0F0F0F0F0F0F0F0F 	 Result OK    714066 passes
 Read
  1  22860 MB/sec  Pattern 0000000000000000 	 Result OK   2790600 passes
  2  22870 MB/sec  Pattern FFFFFFFFFFFFFFFF 	 Result OK   2791800 passes
  3  22890 MB/sec  Pattern A5A5A5A5A5A5A5A5 	 Result OK   2794200 passes
  4  22863 MB/sec  Pattern 5555555555555555 	 Result OK   2790900 passes
  5  22874 MB/sec  Pattern 3333333333333333 	 Result OK   2792300 passes
  6  22875 MB/sec  Pattern F0F0F0F0F0F0F0F0 	 Result OK   2792400 passes

             Reliability Test Ended Mon Aug  8 14:42:55 2005

To Start


MB/second results on two dual core CPUs are below. The Athlon 64 L1 cache speeds, on reading using MMX instructions, are faster than the Core 2 Duo. It seems that the code used also favours the Athlon 64 using 64 bit integer instructions. The position is reversed for all other results, where most are comparing Core 2 Duo L2 cache speeds with those from Athlon 64 RAM. The shared Core 2 Duo L2 cache is surprisingly fast when being used by two CPUs, except where both could use most of it. Only the 16,000 KB measurements represent memory speeds on both systems. Here, the slower Athlon 64 RAM throughput improvement is better, when using two CPUs.


               Core 2 Duo 2400 MHz Vista       Athlon 64 2210 MHz XP Pro
              32KB L1 4MB L2 800 MHz RAM     64KB L1 512KB L2 400 MHz RAM

  Program        32 Bit          64 Bit          32 Bit          64 Bit

    KB  CPUs Wrt/Rd   Read   Wrt/Rd   Read   Wrt/Rd   Read   Wrt/Rd   Read

     4     1   3870  15794     4322  16206     8514  20913    12437  22257
           2   7287  31401     7737  32248    16926  41503    24684  44389
        %       188    199      179    199      199    198      198    199

    16     1   8051  16603     8499  16711    13670  22815    18559  23177
           2  15761  33014    16483  33114    27304  45290    36821  45996
        %       196    199      194    198      200    199      198    198

    64     1   8844  13035     9033  12995    15442  23002    18699  23028
           2  15945  24899    16185  24884    30833  45677    37533  45234
        %       180    191      179    191      200    199      201    196

   500     1   9715  13084     9911  13048     8545   9112     8104  10102
           2  17222  25111    17243  25033    17031  18023    16168  19957
        %       177    192      174    192      199    198      200    198

  1000     1   9756  13098     9737  13007     2125   2897     2072   3050
           2  17183  24670    17245  25035     2476   4736     2459   4917
        %       176    188      177    192      116    163      119    161

  2000     1   9567  12980     9664  12919     2101   2898     2074   3014
           2  15611  23144    15672  23399     2480   4629     2445   4904
        %       163    178      162    181      118    160      118    163

  4000     1   8350  11902     8955  12159     2098   2879     2045   3011
           2   4095   6720     4185   6657     2477   4693     2485   4873
        %        49     56       47     55      118    163      121    162

 16000     1   3466   5433     3370   5408     2086   2872     2055   3009
           2   3687   6066     3598   6019     2454   4706     2478   4838
        %       106    112      107    111      118    164      121    161


To Start

SS3DSoak and SSEBurn64

These benchmarks have the same assembly code for burn-in tests using SSE and SSE2 floating point instructions. The former also has options to use 3DNow and is in SSE3DNow.zip with the latter in More64Bit.zip. Testing options are CPU only, Cache (L1) and RAM (cache and RAM) using either SSE or SSE2. Speed is measured in Millions of Floating Point Instructions Per Second (MFLOPS) or MBytes per second (MFLOPS divide MB/s by 4 for SSE and 8 for SSE2) . Examples of BAT file commands used for two CPUs and log file are as follows. See BurnIn32.htm and BurnIn64.htm for more details.


  Start SSEBurn64 SSE, CPU, Mins 5, auto, P1, Log Testx1.txt
  Start SSEBurn64 SSE, CPU, Mins 5, auto, P2, Log Testx2.txt

  Start SSEBurn64 SSE2, Cache, KB 4, Mins 5, auto, P1, Log Testx1.txt
  Start SSEBurn64 SSE2, Cache, KB 4, Mins 5, auto, P2, Log Testx2.txt

  Start SSEBurn64 SSE2, RAM, KB 128, Mins 5, auto, P1, Log Testx1.txt
  Start SSEBurn64 SSE2, RAM, KB 128, Mins 5, auto, P1, Log Testx2.txt


  SSE2 Cache Test at 5 minutes and 4 KB, Start at Sat Oct 27 16:54:13 2007

     1.01 Minutes at 4785 MFLOPS, No Errors
     2.00 Minutes at 4786 MFLOPS, No Errors
     3.01 Minutes at 4786 MFLOPS, No Errors
     4.00 Minutes at 4786 MFLOPS, No Errors
     5.01 Minutes at 4787 MFLOPS, No Errors

  Reliability Test Ended Sat Oct 27 16:59:14 2007

To Start


Results on the two dual core CPUs are below. This time, the Core 2 Duo is faster on all tests except the dual core 64 KB memory test. The impact of the 4 MB shared L2 cache is again apparent as are the throughput improvements with two CPUs using RAM.



               Core 2 Duo 2400 MHz Vista     Athlon 64 2210 MHz XP Pro
              32KB L1 4MB L2 800 MHz RAM    64KB L1 512KB L2 400 MHz RAM

  Program          32 Bit  64 Bit  64 Bit     32 Bit  64 Bit  64 Bit

                      SSE     SSE    SSE2        SSE     SSE    SSE2

           CPUs    MFLOPS  MFLOPS  MFLOPS     MFLOPS  MFLOPS  MFLOPS

  CPU        1      10162   10256    4549       5885    6094    3139
             2      19989   20113    9040      11733   12152    6274
           %          197     196     199        199     199     200

  Cache      1       9549    9553    4775       7062    7062    3531
             2      18868   18891    9445      14081   14091    7044
           %          198     198     198        199     200     199

                   MB/sec  MB/sec  MB/sec     MB/sec  MB/sec  MB/sec

  Memory     1      37252   37352   37327      16833   16967   16969
  4 KB       2      73587   73852   73790      33506   33858   33860
           %          198     198     198        199     200     200

  Memory     1      17758   17850   17844      17544   17536   17540
  64 KB      2      31789   31828   31815      34967   34902   35010
           %          179     178     178        199     199     200

  Memory     1      17753   17730   17721       8852    8281    8780
  256 KB     2      31837   31875   31878      17642   16530   17516
           %          179     180     180        199     200     199

  Memory     1      15951   15714   15805       2962    2959    2951
  4096 KB    2       8144    8109    8096       4720    4714    4696
           %           51      52      51        159     159     159

  Memory     1       6014    6001    6024       2948    2945    2942
  64 MB      2       7225    7249    7282       4664    4667    4662
           %          120     121     121        158     158     158
 

To Start

Direct3D and CPU Test

VideoD3D9 32 and 64 bit DirectX 9 graphics tests, in Video64.zip, can also be used in conjunction with the CPU burn-in tests, using the same window position format (P1 to P4). For further details see Direct3D Results.htm, 64 Bit Graphics Tests.htm and BurnIn64.htm. Example command lines and log file output are shown below.


  Start SSEBurn64 SSE2, Cache, KB 4, Mins 10, auto, P3, Log T11.txt
  Start VideoD3D9_64 Auto, Test 6, Width 640, Height 480, P2, Secs 600, Log T21.txt

 
  Textured Objects at 640 x 480 x 32 bits
  770.0 Frames Per Second over 60 seconds
  779.2 Frames Per Second over 60 seconds
  779.4 Frames Per Second over 60 seconds
  779.3 Frames Per Second over 60 seconds
  779.0 Frames Per Second over 60 seconds
  779.3 Frames Per Second over 60 seconds
  779.0 Frames Per Second over 60 seconds
  774.0 Frames Per Second over 60 seconds
  779.4 Frames Per Second over 60 seconds
  777.8 Frames Per Second Overall

To Start


Results are below for the two dual core systems. Running the graphics test by itself shows that both CPUs can be used (CPU utilisation > 50%) at one extreme. At the other, the program can be limited by the graphics processor speed with CPU utilisation much lower. Running both programs together results in performance degradation of one or both. This was particularly bad on the Athlon 64 based PC. So further tests were run, dedicating CPUs separately via Task Manager Affinity settings, providing an improvement. For the two Core 2 Duo test runs, alternative CPUs and starting order were used.



          Core 2 Duo 2400 MHz Vista              Athlon 64 2210 MHz XP Pro
                GeForce 8600 GT                        Radeon X800 XL

                 MFLOPS     FPS   % CPU                 MFLOPS     FPS   % CPU
                                 Util x2                                Util x2

      640 x 480    4373     779     100      640 x 480    2008     697      85
     1280 x 900    4545     312      77     1280 x 1024   2720     350      92

     Dedicated CPUs                         Dedicated CPUs
      640 x 480    4445     727     100      640 x 480    3420     717     100
      640 x 480    3632     865     100

     Stand alone   4786              50     Stand alone   3527              50
      640 x 480             926      65      640 x 480             763      60
     1280 x 900             315      27     1280 x 1024            369      55
 

To Start

Maximum Speed

Programs CPUIDMP64 and CPUIDMP in DualCore.zip are the same but compiled for 64 and 32 bits. They execute three passes of simple additions to registers attempting to demonstrate maximum CPU speeds. Firstly an integer and an SSE floating point test are run separately. They are then run as two threads of equal priority, where both should run at full speed with 2 CPUs. This benchmark has a third section using 4 threads, where speed of the last three can vary quite a bit, but should show all at full speed on a quad core CPU. Multiple runs might identify overhead difference between Operating Systems. Following are example 64 bit and 32 bit results on a 2400 MHz Core 2 Duo and 2210 MHz Athlon 64 X2. Further results can be found in WhatCPU Results.htm


                         Core 2 Duo 32 bit   Core 2 Duo 64 bit    Athlon 64 X2 64 Bit
                            64-Bit Vista        64-Bit Vista          XP Pro x64
 Separate Tests
 32 bit SSE   MFLOPS      9485  9581  9595     9582  9595  9600     4411  4411  4415
 32 bit Integer MIPS      6502  6505  6509     6934  6936  6950     6068  6070  6070

 Two Threads Equal Priority
 32 bit SSE   MFLOPS      9581  9562  9564     9501  9600  9600     4405  4409  4408
 32 bit Integer MIPS      6802  7032  7036     7002  7006  7013     6067  6053  5992

 Four Threads, First Normal Priority, Others Normal - 1
 32 bit SSE   MFLOPS      9172  9564  9565     9592  9575  9576     4401  4411  4410
 32 bit Integer MIPS      3354  3174  3645     3447  3414  3329     2903  2053  2898
 32 bit SSE   MFLOPS      4706     0     0     4844     0     0        0  1433     0
 32 bit Integer MIPS       290  3800  3338        0  3337  3366     3454  2227  3455

To Start


The benchmark also shows performance gains using Hyper-Threading. The following is for a 3 GHz Pentium 4E with HT:


 P4E HT 3000 MHz

  Speed adding to registers   Pass 1   Pass 2   Pass 3

  Separate Tests
  32 bit SSE   MFLOPS          5813     5756     5796
  32 bit Integer MIPS          8076     8061     8072

  Two Threads Equal Priority
  32 bit SSE   MFLOPS          5224     5216     5168
  32 bit Integer MIPS          3241     3253     3278

  Four Threads, First Normal Priority, Others Normal - 1
  32 bit SSE   MFLOPS          3032     3334     3260
  32 bit Integer MIPS           461      755      531
  32 bit SSE   MFLOPS          2689     2357     2377
  32 bit Integer MIPS           533      452      671

To Start


Whetstone MP

The Whetstone Benchmark has various routines that execute floating point and integer instructions. Programs Whets64MP and Whets32MP in DualCore.zip are the same but compiled for 64 and 32 bits. The benchmark is run in the main thread and another in a low priority second thread which should mainly run at the same speed with two CPUs. See Win64.htm for more information. Following are for a dual processor P4 Xeon at 3065 MHz and a 3 GHz Pentium 4E with HyperThreading. The latter produces rather unexpected performance gains. Also note that the lower priority thread appears to have been given an equal share of CPU time. The third results are for a Core 2 Duo at 2400 MHz and show superior performance on this old code. Further results can be found in Whetstone Results.htm


 Dual P4 Xeon 3065 MHz

 MFLOPS    Vax  MWIPS MFLOPS MFLOPS MFLOPS    Cos    Exp  Fixpt     If  Equal
  Gmean   MIPS            1      2      3    MOPS   MOPS   MOPS   MOPS   MOPS

   1200  11020   3215   1464   1462    807   81.7   38.0   1307   2488   3292
  Thread 1               732    725    404   41.2   19.4    649   1239   1629
  Thread 2               732    737    403   40.5   18.6    658   1249   1663


 P4E HT 3000 MHz

 MFLOPS    Vax  MWIPS MFLOPS MFLOPS MFLOPS    Cos    Exp  Fixpt     If  Equal
  Gmean   MIPS            1      2      3    MOPS   MOPS   MOPS   MOPS   MOPS

    967   8464   2605   1161   1187    656   60.1   32.5   1611   1385   2173
  Thread 1               580    580    330   30.4   16.4    808    734   1604
  Thread 2               581    607    326   29.7   16.1    803    651    569


 Core 2 Duo 2400 MHz

 MFLOPS    Vax  MWIPS MFLOPS MFLOPS MFLOPS    Cos    Exp  Fixpt     If  Equal
  Gmean   MIPS            1      2      3    MOPS   MOPS   MOPS   MOPS   MOPS

   1446  23473   4718   1702   1697   1046    113   57.9   3793   3622   7531
  Thread 1               840    836    525   57.2   29.2   1959   1777   6477
  Thread 2               862    861    522   56.0   28.8   1835   1845   1054

To Start


BusSpeed MP

This is a variation of Performance Tests used in BusSpd2K covered above. Two versions of this read only benchmark are available via - 64 bit compiler using 64 bit integer words and 32 bit compiler with 32 bit words. They are BusMP64 and BusMP in DualCore.zip. BusSpd2K starts by reading words with 64 byte address increments, to indicate memory bus burst reading speed, then reduces the increment to finally read all words sequentially. Speed is measured using data in caches in RAM. In the MP case, address increments start at 32 words (128 or 256 bytes). Besides reading integer numbers, an extra test reads 128 bit SSE2 data. With two threads, each reads all the data from separate arrays, with total passes same as with one thread. The calculated speed is based on the last thread to finish. Nominal elapsed time for each test is 0.5 seconds. Further details and other results can be found in: BusSpd2K Results.htm.

The SSE2 test uses assembly code but all others are compiled. For the latter tests, the compiler produces a sequence of 64 instructions that load the data to one register using the required AND instructions. Via L1 cache, this could be expected to produce 1 result per CPU clock cycle or data transfer rate on a 2000 MHz CPU of 8000 MB/second at 32 bits and 16000 MB/second with the 64 bit version.

Single CPU - The first example below is for the 64 bit version running on a single Core 2 Duo CPU (Affinity flag set). This show that total throughput is virtually the same using one and two threads. Also, other measurements show that each thread obtains approximately equal amounts of time. The expected maximum speeds via L1 cache are demonstrated and this is doubled using 128 bit SSE registers.

32 Bit Versions - Single thread results, as expected, produce integer speeds of up to around half those obtained at 64 bits via caches (but see below) and similar performance from RAM. There are variations due to the smaller address increments. The results also identify different performance attributes between the Intel and AMD CPUs. The Core 2 Duo PC has faster RAM than the Athlon 64 based system, larger and more efficient L2 cache (see 96 KB and 1536 KB) and can execute SSE2 instructions much faster. The Athlon 64 has a larger L1 cache (not demonstrated) and can produce similar execution speeds on integer calculations. The results from 2 threads are of no surprise, with Core 2 Duo cache speeds being virtually the same as those on a PC using Windows XP. Multi-Threading clearly does not work very well with this sort of code. Performance improvement using two threads are between 1.1 and 1.8 for Core 2 Duo L1 cache and 1.5 to 1.9 for L2. Corresponding Athlon 64 ratios are 1.2 to 2.0 for L1 cache but all L2 cache ratios are around 2. A surprise is that higher throughput can be obtained using shared memory.

64 Bit Version – Except with data from RAM, throughput from two threads using 64 bit integers is much worse that the 32 bit version. Here, the Core 2 Duo performance gain (or loss) is 0.6 to 1.2 from L1 cache and 1.3 to 1.5 from L2 with the Athlon 64 at 0.8 to 1.2 for L1 and 1.3 to 1.9 for L2. Looking at single thread results shows that the CPU is struggling to achieve the same instruction execution speeds, for example, 2181 MIPS at 64 bits compared with 2316 MIPS at 32 bits. So this will not help. In all cases of the SSE2 test, at least as far as L1 cache data is concerned, throughput from 2 threads still approaches twice that of 1 thread.

Assembly Code – The last integer test (Read All) was converted to assembly code using the same sequence of 64 AND instructions to one register, followed by to 2 and 4 registers in turn. Address indexing was somewhat different. Results are shown below. The assembly code produced a mixed set of single thread results, some faster and some slower than the compiler generated code, but the two thread results were comparatively better. In the single register test and L1 cache data, throughput increased from 1.2 times via compiled code to 1.7 times via assembly instructions.

With two independent threads and two CPUs just reading and processing data in separate caches, it is not unreasonable to expect an almost doubling of throughput compared with a single CPU. This is true using 128 bit SSE2 code, with this benchmark, but not when loading data to integer registers, particularly in 64 bit mode. It does seem that Windows is interfering with the data flow.

To Start


                        BusSpeed MP MB/second

    Core 2 Duo 2400 MHz, Vista and Athlon 64 X2 2210 MHz, XP Pro x64

 6KB L1 cache, 384KB L2 cache, 1536KB L2 cache C2D RAM AMD, 131070KB RAM

   Kbytes Inc32wds Inc16wds  Inc8wds  Inc4wds  Inc2wds  ReadAll 128bSSE2

 Core 2 Duo 64 Bit - Using one CPU
 1 Thread
        6    14079    16065    16562    16952    17210    17268    36814
      384     3943     4114     3985     6400     9506    13493    18726
     1536     3880     3954     3935     6332     9248    13324    18479
   131070      588      629      746     1536     2820     5091     5587
 2 Threads
        6    14155    16167    16460    17104    17211    17217    36797
      384     3934     4110     3964     6363     9458    13242    18822
     1536     3820     3985     3882     6252     9189    13106    18436
   131070      565      625      743     1555     2825     5062     5614

 Core 2 Duo 32 Bit
 1 Thread
        6     7366     8999     8984     9201     9258     9263    37310
       96     2030     2014     3241     4522     6658     7935    19061
     1536     1961     1985     3190     4408     6504     7842    18677
   131070      315      380      782     1409     2574     4717     5711
 2 Threads
        6     7271    11341    13427    15096    16511    17116    66635
       96     3129     2988     4904     7711    12073    15064    31494
     1536     3099     2946     5121     7233    11414    14398    30395
   131070      313      429      881     1778     3057     5733     7099

 Athlon 64 32 bit
 1 Thread
        6     8131     8462    10388    10057     9882     9958    17391
       96      735      657     1238     2368     4888     6366     8913
     1536      359      311      564      886     1434     2788     2949
   131070      349      314      559      876     1412     2746     2902
 2 Threads
        6     9234    10367    14787    16078    17366    18515    34466
       96     1474     1311     2461     4685     9734    12659    17792
     1536      318      327      665     1297     2368     4807     4723
   131070      321      334      669     1292     2347     4723     4669

 CORE 2 Duo 64 Bit
 1 Thread
        6    14261    16380    16736    17330    17414    17449    37143
       96     4097     4196     4027     6419     9545    13666    19065
     1536     3928     3965     3968     6305     9313    13401    18599
   131070      594      636      758     1576     2876     5124     5692
 2 Threads
        6     8106    12608    15512    17690    19165    20354    71247
       96     5124     5840     5590     9943    13461    17703    31333
     1536     5275     5766     5518     9369    12764    16723    29287
   131070      601      627      834     1685     3166     5261     7142

 Athlon 64 64 Bit
 1 Thread
        6    14563    15920    16078    17827    17772    17443    17358
       96     1943     1947     1344     2406     4536     9654     8766
     1536      642      717      586      988     1499     2897     2940
   131070      639      698      592      983     1476     2860     2919
 2 Threads
        6    11780    12876    13348    18538    19771    21414    34617
       96     3121     3045     2569     4674     8281    12510    17488
     1536      557      630      642     1270     2226     4076     4704
   131070      558      632      642     1263     2213     4063     4684


                64 Bit Compiler, 64 Bit Integers, Read All

           Core 2 Duo                      Athlon 64
           Original        Assembler       Original        Assembler
             1 Reg   1 Reg   2 Reg   4 Reg   1 Reg   1 Reg   2 Reg   4 Reg
 1 Thread
        6    17449   18384   18662   17996   17443   16360   30651   29112
       96    13666   13221   13392   13378    9654    9446    9493    9487
     1536    13401   13227   13299   13129    2897    2911    2933    2930
   131070     5124    5111    5126    5052    2860    2843    2917    2900
 2 Threads
        6    20354   30635   30727   30478   21414   28851   46191   44173
       96    17703   24168   23977   24092   12510   18794   18913   18788
     1536    16723   22714   22935   22431    4076    4790    4737    4802
   131070     5261    6097    6030    6136    4063    4727    4693    4751


To Start


RandMP

This is a variation of the benchmark in RandMem.zip. The program uses the same code for serial and random use via a complex indexing structure and comprises Read (RD) and Read/Write (RW) tests. They are run to use data from L1 cache, L2 cache and RAM, firstly as a single thread and secondly using two threads. Indexing overheads lead to slower speed than BusSpeed above (ReadAll). Programs RandMP64 and RandMP32 in DualCore.zip are compiled to run via Win64 and Win32. See Win64.htm for more information. Following is again for Athlon 64 X2 Dual Core 4200+ 2.21 GHz using Windows XP Pro X64 and Core 2 Duo 2.4 GHz using Windows Vista 64 Bit Version. Further results can be found in Randmem Results.htm

Using one thread, AMD RW speed is slower than RD and speed reduces using larger data size with random access. The full benefit of two CPUs is demonstrated by the RD tests but RW functions can show overall lower throughput than using one thread, with L1 cached data. This is probably due to changed data being written to shared main memory, even though the other CPU may not be using it. Effectively, Serial RW speed of two CPUs can be half of that of a single CPU and less than a quarter with Random RW. The benefit of the shared Core 2 Duo L2 cache is demonstrated. Modifying the benchmark, so that each thread accesses its own data array, enables RW cache tests to run at full speed on each CPU.


 Athlon 64 X2 Dual Core 2210 MHz, 64 bit version, XP Pro x64

               ------------------ MBytes Per Second At -------------------- 
               6 KB   24 KB   96 KB  384 KB  768 KB 1536 KB   12 MB   96 MB 
 1 Thread
 Serial RD     8552    8518    5115    5132    2369    2353    2344    2305 
 Serial RW     4346    4340    2702    2697    1349    1352    1354    1351 
 Random RD     8176    8244    3733    1620     872     389     255     170 
 Random RW     4384    4332    2865    1483     563     236     161     136 

 2 Threads
 Serial RD1    8374    8532    5064    5010    2075    2096    2021    2026 
 Serial RD2    8532    8394    5176    5108    2111    2062    2049    2054 

 Serial RW1    1090    1172    1110    1096    1041     867     864     866 
 Serial RW2    1083    1136    1089    1076    1049     866     855     824 

 Random RD1    8147    8024    3683    1638     485     193     126     100 
 Random RD2    8154    8158    3701    1637     485     195     125     101 

 Random RW1     494     489     448     406     352     152      86      75 
 Random RW2     495     490     449     406     343     152      87      75 

           For approximate speed in MIPS divide MBytes/Second by 3.2


  Core 2 Duo 2400 MHz, 64 bit version, 64 Bit Vista

               ------------------ MBytes Per Second At --------------------
               6 KB   24 KB   96 KB  384 KB  768 KB 1536 KB   12 MB   96 MB
 1 Thread
 Serial RD     8483    9431    7859    7781    7867    7916    4420    4484
 Serial RW     8775    9319    7552    7548    7515    7301    2436    2389
 Random RD     8633    9500    4214    3322    3202    2794     630     456
 Random RW     8812    9101    3408    2742    2660    2365     405     283

 2 Threads
 Serial RD1    8699    9194    7623    7581    7571    7729    4069    4117
 Serial RD2    8662    9289    7422    7415    7417    7249    4069    4117

 Serial RW1    2061    2164    6754    6898    6779    6660    1661    1771
 Serial RW2    2049    2168    6881    6728    6864    6787    1561    1771

 Random RD1    8660    9267    3461    2675    2563    2278     447     444
 Random RD2    8617    9443    3531    2724    2610    2317     453     444

 Random RW1     738     775    1355    1937    1954    1859     343     285
 Random RW2     735     774    1365    1963    1982    1882     343     285

To Start

Next are 32 bit version results for two Core 2 Duo PCs at 2400 MHz with 533 MHz Dual Channel DDR2 RAM and different chipsets, followed by those for Pentium D 2800 MHz (with 400 MHz Dual Channel DDR?). Each of PD and C2D is faster on certain tests and effects of C2D larger caches are apparent. The Core 2 Duo PC using the Intel 965 chipset is much faster on some RAM based results and some writing/reading cache tests. Unlike the other dual CPUs, the random read/write tests on Core Duo 2 show a higher total throughput with two threads when using L2 cache, a clear benefit of using a L2 cache shared by the CPUs.



 Core 2 Duo 2400 MHz nForce 570 chipset

               ------------------ MBytes Per Second At -------------------- 
               6 KB   24 KB   96 KB  384 KB  768 KB 1536 KB   12 MB   96 MB 
 1 Thread
 Serial RD     9487    9426    7919    7880    7931    7795    3344    3130 
 Serial RW     9515    9271    7594    7518    7562    7244     587     593 
 Random RD     9501    9448    4251    3317    3217    2781     174     121 
 Random RW     9254    9208    3383    2762    2649    2375     117      85 

 2 Threads
 Serial RD1    9374    9485    7741    7796    7741    7775    3127    3001 
 Serial RD2    9477    9430    7413    7533    7438    7499    3083    2966 

 Serial RW1     678     679    6871    6944    6920    6854     451     566 
 Serial RW2     679     681    6916    6803    6875    6877     436     566 

 Random RD1    9373    9482    3502    2715    2597    2289     162     113 
 Random RD2    9379    9545    3509    2744    2615    2301     158     113 

 Random RW1     251     165     288     906    1576    1756      69      58 
 Random RW2     252     165     288     905    1573    1755      69      58 


 Core 2 Duo 2400 MHz Intel 965 chipset

               ------------------ MBytes Per Second At --------------------
               6 KB   24 KB   96 KB  384 KB  768 KB 1536 KB   12 MB   96 MB
 1 Thread
 Serial RD     9524    9536    7954    7939    7957    7935    4119    4044
 Serial RW     9541    9458    7601    7613    7606    7361    2046    2003
 Random RD     9511    9562    4260    3350    3231    2822     548     393
 Random RW     9386    9260    3424    2769    2678    2384     349     242

 2 Threads
 Serial RD1    9479    9519    7839    7824    7842    7797    4040    3996
 Serial RD2    9513    9450    7527    7531    7519    7487    4040    3996

 Serial RW1    1490    1694    6936    6962    6961    6886    1420    1388
 Serial RW2    1487    1695    6939    6956    6949    6886    1420    1388

 Random RD1    9484    9524    3538    2720    2618    2318     527     388
 Random RD2    9514    9535    3569    2741    2632    2331     527     388

 Random RW1     532     634    1101    1965    1988    1896     265     218
 Random RW2     534     636    1104    1966    1983    1898     265     218


 Pentium D 2800 MHz 

               ------------------ MBytes Per Second At -------------------- 
               6 KB   24 KB   96 KB  384 KB  768 KB 1536 KB   12 MB   96 MB 
 1 Thread
 Serial RD    11614    7288    7192    7087    7054    2952    2853    2905 
 Serial RW     2279    1657    1377    1446    1439    1360    1359    1359 
 Random RD    11621    6010    3708    2419    1367     491     182     163 
 Random RW     3630    2566    1669    1467    1215     340     144     131 

 2 Threads
 Serial RD1   11472    7189    7191    6924    6799    1918    1917    1907 
 Serial RD2   11533    7065    6948    6981    6701    1933    1967    1939 

 Serial RW1     715     863     762     808     829     987     946     939 
 Serial RW2     710     853     769     805     814     892     868     862 

 Random RD1   11467    5973    3649    2341    1332     245      91      81 
 Random RD2   11547    6042    3767    2300    1348     251      93      83 

 Random RW1     191     172     156     146     141     112      71      65 
 Random RW2     193     174     157     147     142     114      72      65 

To Start

This benchmark again shows some performance gains and some losses using Hyper-Threading. The following is for a 3 GHz Pentium 4E with HT:



 Pentium 4E HT 3000 MHz

               ------------------ MBytes Per Second At -------------------- 
               6 KB   24 KB   96 KB  384 KB  768 KB 1536 KB   12 MB   96 MB 
 1 Thread
 Serial RD    11217    7146    7030    6943    6911    2979    2982    2983 
 Serial RW     2255    1641    1429    1423    1425    1352    1351    1344 
 Random RD    11410    6017    3843    2391    1368     471     181     168 
 Random RW     3593    2649    1793    1465    1234     315     135     125 

 2 Threads
 Serial RD1    5134    4143    4035    3950    3935    2583    1964    1862 
 Serial RD2    5095    4089    4009    3910    3890    2564    1930    1840 

 Serial RW1    1522    1262    1148    1188    1193    1036    1024    1001 
 Serial RW2    1500    1239    1133    1146    1184    1019     912     789 

 Random RD1    5125    3580    2594     733     564     279      96      95 
 Random RD2    5070    3544    2558     720     557     275      92      93 

 Random RW1    1709    1536    1367     714     560     205      74      80 
 Random RW2    1677    1501    1342     702     552     199      71      78 

To Start


Updated October 2007

The new Internet Home for my PC Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection