Title

Paging Issues With 64-Bit Vista and Windows 7


Summary

The following new PC with 4 GB RAM initially appeared to have only 3 GB. This was corrected by enabling Memory Remapping in BIOS.

Core 2 Duo 2400 MHz, Asus P5B motherboard, 800 MHz DDR2 RAM, Seagate ST3400633AS SATA-300 disk, 16 MB buffer, 7200 RPM, GeForce 8600 GT graphics, Windows Vista 64-Bit.

Testing at 3 GB indicated slow performance on a benchmark that requested little more than 1 GB. Earlier tests, using Windows XP Pro x64 on a PC with 1 GB RAM, produced worse than expected speeds with paging. This appears to be an issue with 64-Bit Windows relating to creation of bitmaps and fast BitBlt copying being available for use with larger images.

With 4 GB of RAM being available and usable via 64-Bit Vista (and 1 GB with XP x64), a benchmark was run to measure the impact of paging. The main observation is the speed contrast due to paging, when too much RAM is requested, can be enormous and much slower than using normal disk input/output. So, careful consideration of data size is needed when programming. Further measurements show that 64-Bit Vista can be significantly faster that Windows XP x64 as paging speeds are random access linked and Vista can read up to 64 KB at a time, compared with a fixed 4 KB with XP.

Data that can be allocated for a single data array within the 2 GB User Virtual Space with 32 bit Windows was found to be 1.2 GB with XP and 1.5 GB using Windows 2000. Virtual Space for a 32 bit application is shown as 4 GB via 64 bit Windows but only 2 GB could be used. With 64 bit applications, 8192 GB is shown and arrays of up to 8 GB could be allocated using 64-Bit Vista (and 4 GB RAM) but less than 6 GB with XP Pro x64 (1 GB RAM).

Later tests on a new PC, with 8 GB RAM, showed that a single array of 14 GB could be allocated but not 15 GB. The PC comprised:

Phenom II X4 3000 MHz, Asus M4A785TD-V, 8 GB DC DDR3 RAM Western Digital 5400 RPM Green SATA disk, 16 MB buffer GeForce GTS 250 card and on-board ATI graphics, 64-Bit Windows 7

Here, up to 7 GB of data remained in RAM, where it could be accessed at 4 GB/second, but performance was 400 times slower at 14 GB. Block sizes tended to be even larger using Windows 7 and a higher proportion of data remained in RAM. Disk data transfer speed was the highest but was offset by slower random access time at 5400 RPM.


BMPSpeed Benchmark

BMPSpeed Benchmark generates BMP files up to 512 MB. It measures speed of saving, loading, scrolling, rotating and editing of 0.5, 1, 2, 4 etc. MB files upwards. Pre-compiled versions of the benchmarks can be found in BMPSpd.zip which also contains the source code and more detailed explanations. Results for a wide range of systems are in BMPSpeed Results.htm. A 64 bit version is also available in Video64.zip with comparisons in 64 Bit Graphics Tests.htm. See also My Home Page for other PC benchmarks and results.

Extra copies of the images for editing result in memory demands of more than twice the largest image size, leading to possible paging to/from disk. Five tests are run at each size, run times being saved in log file BMPTime.txt.

1 - Enlarge with blur editing (copy with add/divide instructions) and display.
2 - Save enlargement to disk.
3 - Load from disk, format and display.
4 - Copy from memory scrolling.
5 - Make an extra copy rotating 90 degrees and display.

Data transfer speeds in MB/second are also recorded for Test 4 where displayed data might be from video RAM cache, main RAM or disk page file. The benchmark also produces real and virtual memory usage statistics.

To Start

Results With 3 GB


 BMP Benchmark Version 2.2x for 64 bit Windows Fri Jul 20 15:37:32 2007

           Copyright Roy Longbottom 1999 - 2006

   Input Enlarge    Save    Load  Scroll  Scroll  Rotate     Use
   Image Display         Display /Repeat Overall  90 deg    Fast
  Mbytes    Secs    Secs    Secs   msecs  MB/Sec    Secs  BitBlt

     0.5    0.05    0.01    0.05     0.1  4748.4    0.02      3
     1.0    0.05    0.02    0.08     0.3  4463.6    0.03      3
     2.0    0.07    0.02    0.11     1.1  2475.2    0.04      3
     4.0    0.09    0.03    0.19     2.4  1866.0    0.06      3
     8.0    0.13    0.08    0.31     2.9  1765.0    0.10      3
    16.0    0.20    0.24    0.48     2.7  1832.5    0.17      3
    32.0    0.26    0.52    0.78     2.9  1741.2    0.28      3
    64.0    0.39    1.08    1.38     2.9  1760.0    0.52      3
   128.0    0.68    2.37    2.63     2.9  1740.3    1.03      3
   256.0    1.35    4.62    5.38     3.1  1645.6    4.39      3
   512.0   27.91   13.05   10.59     3.2  1595.6   57.11      3

  CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000006F6
  Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz Measured 2402 MHz
  AMD64 processor architecture, 2 CPUs 
  Windows NT  Version 6.0, build 6000, 
  Memory Status Maximum Use
  Mbytes of physical memory    3006 ------##################
  Percent of memory in use     81
  Free physical memory Mbytes  567
  Mbytes of paging file        6215
  Free Mbytes of paging file   2967
  User Mbytes of virtual space 8388607
  Free user virtual Mbytes     8387500
  Screen setting 1280 x 1024 x 32 bits =  5.2 MB

                    End at Fri Jul 20 15:40:34 2007


   Example Results Using 4 GB RAM 

    32.0    0.23    0.54    0.78     3.1  1629.5    0.28      3
    64.0    0.36    1.12    1.36     2.9  1729.9    0.53      3
   128.0    0.68    2.66    2.62     2.9  1725.9    1.00      3
   256.0    1.20    5.02    5.30     3.0  1706.6    4.13      3
   512.0    2.32   10.84   12.47     3.1  1603.4    5.80      3



To Start

More Results

The displaying method comes from a 1997 Microsoft sample program, ShowDib. This uses CreateDIBitmap so that fast BitBlt copying can be used. In the past, the size that can be created for fast copying could vary, depending on the version of Windows and graphics driver. Most recent results via Windows XP showed a limit of 64 MB. In the case of my benchmark, when the DIB cannot be created, the slower StretchDIBits method is used to copy part of the image to the display. Although it should have been clear that CreateDIBitmap would use more memory, it was not obvious on older systems with limited and slower main RAM.

Tests show that the DIBs are at 32 bits, 33% larger than the original BMP data. So, a 512 MB image increases to 682 MB and the program can have two open. RAM space used is outside the user’s virtual space but can show up via free memory space (if large enough) and free paging file space.

Below are Enlarge and Rotate speeds at 256 and 512 MB using 64-Bit Vista and XP Pro x64 with four versions of the benchmark, the original, the 64 bit version, a 32 bit version via a later MS compiler and a version that uses StretchDIBits for the larger images. Also shown are RAM, PageFile and User Virtual Space usage. Some Windows XP results with different RAM size are shown for comparison purposes.

  • 64-Bit Vista speeds are much better than 3 GB RAM when 4 GB is available
  • Speed and memory occupancy is similar with 64 bit, 32 bit and original benchmarks
  • RAM and PageFile use is increased when using CreateDIBitmap (for fast BitBlt copying) vs StretchDIBits
  • 64-Bit Windows can use CreateDIBitmap for larger images and this can lead to poor performance due to excessive paging
  • Enlarge/Rotate (no paging) speeds can be faster when StretchDIBits is used
  • 64-Bit Windows uses 50 to 60 MB more User Virtual Space than Windows XP
What is not shown is the reduction in scrolling speed using StretchDIBits which, on the Vista PC at 256 MB, was 1706 MB/second or 3.0 milliseconds per screen using BitBlt, to 171 MB/second or 29.5 milliseconds with Stretch. The XP x64 PC results were 4.3 to 32.9 milliseconds.

Latest results shown are for 64-Bit Windows 7 on a PC with 8 GB RAM. Here, real and virtual memory usage is similar to other 64 bit versions of Windows. Besides showing results for the main graphics card, others are provided for motherboard integrated graphics where measured performance is similar. For this PC and others, the latest compilers appear to generate rotation times taking around twice as long as they should but only with 256 MB images.

To Start


           RAM BMP Enlarge Rotate    Free    Free    Used    Used    Used    Used    Used
            GB  MB   Secs    Secs  MB RAM  MB RAM      MB  Pgfile  Pgfile      MB      MB
                                    Start     End     RAM   Start     End  Pgfile Virtual

 Phenom Win7 8 256   0.95    7.21
 64 bit        512   1.68    7.85    6493    4164    2329    1601    3954    2353    1102

 32 bit      8 256   1.29    7.23   
               512   2.38    8.04    6496    4164    2332    1596    3952    2356    1099

 Original    8 256   1.35    4.35
               512   2.49    8.68   >4095   >4095     N/A   <4096   <4096     N/A    1103

 Stretch     8 256   0.54    6.82
 64 bit        512   0.88    7.10    6310    5301    1009    1797    2824    1027    1102

 On board    8 256   0.95    7.29
 64 bit        412   1.74    7.89    6465    4158    2307    1593    3824    2331    1102
 ________________________________________________________________________________________

 C2D Vista   3 256   1.35    4.39
 64 bit        512  27.91   57.11             567                    3248            1107

             4 256   1.20    4.13
               512   2.32    5.80    3126     877    2249     959    3288    2329    1107

 32 bit      4 256   1.30    4.22
               512   2.53    5.91    3170     897    2273     957    3275    2318    1094

 Original    4 256   1.48    4.52
               512   2.80    8.15    3182     900    2282     N/A     N/A     N/A    1094

 Stretch     4 256   0.76    3.76
 64 bit        512   1.35    4.53    3169    2170     999     915    1936    1021    1107
 ________________________________________________________________________________________

 AMD XP x64  1 256 119.28   58.51
 64 bit        512 335.83  832.41     518     183     N/A     415    2734    2319    1081

 32 bit      1 256  71.92   88.30
               512 246.43  971.95     801     129     N/A     407    2736    2329    1076

 Original    1 256  47.39   99.84
               512 189.28 1061.02     524     192     N/A     411    2616    2205    1072

 Stretch     1 256   0.59    9.27
 64 bit        512   8.40  160.08     607      60     N/A     409    1439    1030    1081
 ________________________________________________________________________________________

 P4 XP     0.5 256  66.79  184.56
 Original      512 140.41  148.08     421      40     N/A      18    1037    1019    1047
 ________________________________________________________________________________________

 P4 XP       1 256   1.30    7.05
 Original      512   1.88   35.21             131                    1122            1036
 ________________________________________________________________________________________

 C2D XP      2 256   1.21    5.48
 Original      512   1.71    6.53             608                    1302            1054

To Start


4 GB Data

With 8192 GB of user virtual memory available using 64-Bit Windows, compared with 2 GB via 32-Bit versions, it is tempting to write programs with vast data arrays instead of bothering with frequent disk input and output. Some would claim that, when paging is necessary, it will be just as fast as normal disk data transfers.

I ran some tests using IntBurn64 in More64bit.zip and the 32 bit version or reliability test in BusSpd2k.zip. These are designed to run at the highest speed whilst checking for correct results at a chosen data size and minimum running time. There are six tests with write and read once, using different data patterns. This is followed by 6 tests with read only. Each of the latter is preceded by an untimed write/read and an extra read pass to calibrate the number of read passes needed for the chosen time. This is a significant overhead when one pass is used.

Following is an example log file for the Core 2 Duo with 64-Bit Vista, running for the minimum time at 3860000 KB (3.68 GB) where Vista managed to find sufficient memory space for the last three reading tests at full speed. Maximum write/read speed, at lower memory demands, is around 3300 MB/second, with the first test usually at about 2200 MB/second. With the total running time being too long at 1 hour 24 minutes, I produced a version of the 64 bit benchmark that runs just one write/read test in order to measure paging speeds with data size up to 4 GB and higher.

To Start



         64 Bit Integer Reliability Test Version 1.0 for 64 bit OS

                   Copyright (C) Roy Longbottom 2006

  Batch Command KB 3860000 SECS 1 P1 LOG INT64RAM.TXT 

  Test 3860000 KB at 1 seconds per test, Start at Mon Aug 06 20:09:49 2007

 Write/Read
  1      52 MB/sec  Pattern 0000000000000000 	 Result OK         1 passes
  2      21 MB/sec  Pattern FFFFFFFFFFFFFFFF 	 Result OK         1 passes
  3      17 MB/sec  Pattern A5A5A5A5A5A5A5A5 	 Result OK         1 passes
  4      28 MB/sec  Pattern 5555555555555555 	 Result OK         1 passes
  5      24 MB/sec  Pattern 3333333333333333 	 Result OK         1 passes
  6      18 MB/sec  Pattern F0F0F0F0F0F0F0F0 	 Result OK         1 passes

 Read
  1      14 MB/sec  Pattern 0000000000000000 	 Result OK         1 passes
  2      23 MB/sec  Pattern FFFFFFFFFFFFFFFF 	 Result OK         1 passes
  3      21 MB/sec  Pattern A5A5A5A5A5A5A5A5 	 Result OK         1 passes
  4    5265 MB/sec  Pattern 5555555555555555 	 Result OK         2 passes
  5    5330 MB/sec  Pattern 3333333333333333 	 Result OK         2 passes
  6    5301 MB/sec  Pattern F0F0F0F0F0F0F0F0 	 Result OK         2 passes

             Reliability Test Ended Mon Aug 06 21:34:04 2007


To Start

Paging Test

As can be seen above, running all 12 tests to measure paging speeds with those memory demands took nearly 25 minutes. The benchmarks have been modified to use a Paging parameter that runs just one write/read test (now in More64bit.zip and BusSpd2k.zip.). The test can only be run from a BAT file with the following example parameters:

Start BusSpd2k Reliability, Paging, KB 100000, Log Paging.txt
Start IntBurn64 Auto, Paging, KB 100000, Log Paging.txt

Following are 32 bit and 64 bit results representing the situation where memory demands are slowly increased. Data transfer speed with paging depends on what has run before. For example, suddenly demanding 80% of memory capacity is likely to produce very slow speed.

For 32 bit Windows, the 2 GB virtual memory space is allocated to the application via a table of unmovable sequential addresses. This space also addresses the EXE file and some items for use by Windows. The table can become fragmented, further reducing space available for a single data array. The maximum that could be used was 1,200,000 KB with Windows XP and 1,500,000 KB using Windows 2000. Sometime ago, the BMPSpeed benchmark (see above) was modified so that XP could run using 512 MB images, where memory demands included 2 x 512 MB, 256 MB and 128 MB. The 256 MB was dropped for the last test.

The tables also show normal disk writing/reading speeds. With 32 bit Windows and the two PCs with 512 MB RAM, data transfer rates with paging were relatively good using data size somewhat greater than RAM capacity. Worst case was 3 to 4 times slower than normal disk transfers and 40 to 65 times slower than with data in RAM.

With the 32 bit application running on 64 bit Windows, User Virtual Space is detected as 4 GB by the program. Maximum array size that could be allocated was 2,000,000 KB. At this size with 1 GB RAM, paging speed was 9 times slower than normal disk transfers and 340 times slower than memory based data. Speed had also reduced considerably with 1 GB data.

User Virtual Space is detected as 8192 GB by the 64 bit benchmark but maximum data array size was between 5,000,000 and 6,000,000 KB on the PC with 1 GB RAM and Windows XP x64 then 7,900,000 KB with Vista and 4 GB memory. Performance of the former was essentially the same as the 32 bit program. Vista paging speeds had a higher tendency to improve with a larger data array with worst case 5.5 times slower than normal disk but still 340 times slower than with data in RAM.

Later, the 64 bit tests were run on a PC using a Phenom II with 8 GB RAM and Windows 7. In this case, maximum array size that could be allocated was 14 GB. As with the Vista based PC, allocation of paging file size was set as automatic. Data mainly remained in RAM up to a file size of 7 GB but data transfer speed was 400 times slower at 14 GB.

To Start


           32 Bit BusSpd2K          32 Bit BusSpd2K           32 Bit BusSpd2K

      CPU     Athlon XP                Pentium 4                 Athlon 64
      MHz       2088                     1900                       2210
   RAM MB        512                      512                       1024
  Windows       2000                       XP                      XP x64
 Disk W/R
   MB/sec         50                       49                        55

         KB    Secs  MB/sec         KB    Secs  MB/sec         KB    Secs  MB/sec

     100000             970     100000             532     100000            2040

     300000       1     932     300000       2     285     800000      66      25
     350000       1     929     350000      13      56     850000      31      56
     400000       6     127     400000      22      38     900000      61      30
     450000       8     117     450000      19      48     920000     118      16
     470000       8     118     470000      14      70     930000     112      17
     480000       8     123     480000      15      64     940000      92      21
     490000       9     116     490000      24      41     950000     114      17
     500000      13      80     500000      21      49     960000     123      16
     510000      15      68     510000      27      38     970000     124      16
     520000      16      65     520000      23      46     980000     125      16
     530000      19      58     530000      29      37     990000     135      15
     540000      21      53     540000      32      35    1000000     137      15
                                                          1100000     188      12
    1200000     154      16    1200000     189      13    1200000     223      11
    1300000                    1300000             N/A    1300000     380       7
    1500000     205      15                               1400000     358       8
    1600000             N/A
                                                          2000000     683       6
    N/A Cannot allocate memory                            2100000             N/A


           64 Bit IntBurn64         64 Bit IntBurn64         64 Bit IntBurn64

      CPU     Athlon 64               Core 2 Duo                 Phenom II
      MHz       2210                     2400                       3000
   RAM MB       1024                     4096                       8192
  Windows      XP x64               64-Bit Vista             64-Bit Windows 7
 Disk W/R
   MB/sec         55                       55                         92

         KB    Secs  MB/sec         KB    Secs  MB/sec         KB    Secs  MB/sec

     100000            2041     100000            3393     100000            5146

     800000       1    1976    2500000       2    2868    2000000       1    4900
     850000      23      77    3000000       2    2878    3000000       1    4658
     900000      58      32    3100000       2    2847    3500000       2    4651
     920000      61      31    3200000       2    2899    4000000       2    4488
     930000      91      21    3300000       3    2698    4500000       2    4489
     940000      96      20    3400000       3    2610    5000000       2    4477
     950000      93      21    3500000       7    1075    5500000       3    4166
     960000      89      22    3600000      10     750    6000000       3    4051
     970000     142      14    3700000      17     459    6500000       3    4036
     980000     125      16    3800000     107      73    7000000       4    4078
     990000     119      17    3900000     210      38    7500000      72     214
    1000000     128      16    4000000     146      56    7600000     170      91
    1100000     188      12                               7700000     168      94
    1200000     205      12    5000000    1024      10    7800000     230      69
    1300000     266      10    7000000     652      22    7900000     239      68
    1400000     358       8    7900000     770      21    8000000     227      72
                               8000000             N/A    9000000     697      26
    2000000     683       6                              10000000    1231      17
    2100000                    32 Bit  BusSpd2K          14000000    2742      10
    5000000    1707       6                              15000000             N/A
    6000000             N/A    2000000       2    2139  
                               2100000             N/A

    N/A Cannot allocate memory 


To Start


Paging Disk Activity

Tests were run with Performance Monitor logging of Physical Disk Write and Read Bytes and Bytes/Second. The graphs below are extrapolations of million bytes written and read over the 5 second monitoring periods. At least they confirm that the disks can run at 50 to 60 MB/second. CPU utilisation was also reported and was extremely low for most of the time. These tests were run cold, that is without gradual increase in memory demands.

Other calculations carried out were for average KB per transfer and this showed a significant difference between XP x64 and 64-Bit Vista. The former consistently produced (approximately) 4 KB per read and 64 KB per write. Vista was completely different, mainly averaging nearly 1000 KB per write for the main writing period. On loading the data there seem to be periods of 16, 32 and 64 KB per read down to 8 KB at the end. Peak reading speeds are as reflected in my DiskGraf Benchmark for block sizes 4 KB and 64 KB respectively.

The slow speeds will be caused by limited random access due to read following write or accessing fragmented data space. At the end of the Core 2 Duo test, when mainly reading is taking place, average time per read access is around 4 milliseconds, about half a disk revolution. With sequential data, speed would be much faster, either reading directly from the disk or via the disk’s buffer. Overall, it appears that Vista paging can be faster than Windows XP as it has the ability to read data at a larger page size.

Note that performance log files need checking to see that all samples are recorded as the clock and recording can stop with the unusual heavy load, and up to 5 missing minutes have been noted.

Windows 7 results also indicate large block size for writing, with reading mainly using average block sizes of 50 KB to 64 KB over the five second sampling periods. The disk used runs at 5400 RPM which will produce longer random access times than on the other PCs. Then it has faster data transfer speed and, for the particular test, only pages out about half of the data.

To Start



Athlon 64, XP x64, 1 GB RAM, 819 MB data
Write/Read 205.2 seconds, 8.0 MB/second
Time at RAM speed would be < 1 second
Data 572 MB written, 571 MB read


Core 2 Duo, 64-Bit Vista, 4 GB RAM, 5120 MB data
Write/Read 499.0 seconds, 21 MB/second
Time at RAM speed would be < 3 seconds
Data 4372 MB written, 3804 MB read


  Phenom II Quad CPU 3.0 GHz
  8 GB DDR3 RAM, >90 MB/second disk 
  64-Bit Windows 7

  Data Volume           8192 MB
  Write/Read Time        252 seconds
  Speed (8192x2/252) -    65 MB/second
 
  Data written to disk  4529 MB
  Data read from disk   3802 MB

  Average write block    911 KB
  Average read block      54 KB

  Maximum writing speed   62 MB/second
  Maximum reading speed   85 MB/second
  Average disk read+write 33 MB/second

  Average CPU Utilisation 27% of 1 CPU
  First 15 seconds       100% of 1 CPU


To Start




Roy Longbottom January 2010

The new Internet Home for my PC Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection