Paging Issues With 64-Bit Vista and Windows 7
Summary
The following new PC with 4 GB RAM initially appeared to have only 3 GB. This was corrected by enabling Memory Remapping in BIOS.
Core 2 Duo 2400 MHz, Asus P5B motherboard, 800 MHz DDR2 RAM,
Seagate ST3400633AS SATA-300 disk, 16 MB buffer, 7200 RPM,
GeForce 8600 GT graphics, Windows Vista 64-Bit.
Testing at 3 GB indicated slow performance on a benchmark that requested little more than 1 GB.
Earlier tests, using Windows XP Pro x64 on a PC with 1 GB RAM, produced worse than expected speeds
with paging. This appears to be an issue with 64-Bit Windows relating to creation of bitmaps and fast BitBlt copying being available for use with larger images.
With 4 GB of RAM being available and usable via 64-Bit Vista (and 1 GB with XP x64), a benchmark was run to measure the impact of paging. The main observation is the speed contrast due to paging, when too much RAM is requested, can be enormous and much slower than using normal disk input/output. So, careful consideration of data size is needed when programming.
Further measurements show that 64-Bit Vista can be significantly faster that Windows XP x64 as paging speeds are random access linked and Vista can read up to 64 KB at a time, compared with a fixed 4 KB with XP.
Data that can be allocated for a single data array within the 2 GB User Virtual Space with 32 bit Windows was found to be 1.2 GB with XP and 1.5 GB using Windows 2000. Virtual Space for a 32 bit application is shown as 4 GB via 64 bit Windows but only 2 GB could be used. With 64 bit applications, 8192 GB is shown and arrays of up to 8 GB could be allocated using 64-Bit Vista (and 4 GB RAM) but less than 6 GB with XP Pro x64 (1 GB RAM).
Later tests on a new PC, with 8 GB RAM, showed that a single array of 14 GB could be allocated but not 15 GB. The PC comprised:
Phenom II X4 3000 MHz, Asus M4A785TD-V, 8 GB DC DDR3 RAM
Western Digital 5400 RPM Green SATA disk, 16 MB buffer
GeForce GTS 250 card and on-board ATI graphics, 64-Bit Windows 7
Here, up to 7 GB of data remained in RAM, where it could be accessed at 4 GB/second, but performance was 400 times slower at 14 GB.
Block sizes tended to be even larger using Windows 7 and a higher proportion of data remained in RAM. Disk data transfer speed was the highest but was offset by slower random access time at 5400 RPM.
BMPSpeed Benchmark
BMPSpeed Benchmark generates BMP files up to 512 MB. It measures speed of saving, loading, scrolling, rotating and editing of 0.5, 1, 2, 4 etc. MB files upwards.
Pre-compiled versions of the benchmarks can be found in
BMPSpd.zip
which also contains the source code and more detailed explanations.
Results for a wide range of systems are in
BMPSpeed Results.htm.
A 64 bit version is also available in
Video64.zip
with comparisons in
64 Bit Graphics Tests.htm.
See also
My Home Page
for other PC benchmarks and results.
Extra copies of the images for
editing result in memory demands of more than twice the largest image size,
leading to possible paging to/from disk. Five tests are run at each size, run
times being saved in log file BMPTime.txt.
1 - Enlarge with blur editing (copy with add/divide instructions) and display.
2 - Save enlargement to disk.
3 - Load from disk, format and display.
4 - Copy from memory scrolling.
5 - Make an extra copy rotating 90 degrees and display.
Data transfer speeds in MB/second are also recorded for Test 4 where
displayed data might be from video RAM cache, main RAM or disk page
file. The benchmark also produces real and virtual memory usage statistics.
To Start
Results With 3 GB
BMP Benchmark Version 2.2x for 64 bit Windows Fri Jul 20 15:37:32 2007
Copyright Roy Longbottom 1999 - 2006
Input Enlarge Save Load Scroll Scroll Rotate Use
Image Display Display /Repeat Overall 90 deg Fast
Mbytes Secs Secs Secs msecs MB/Sec Secs BitBlt
0.5 0.05 0.01 0.05 0.1 4748.4 0.02 3
1.0 0.05 0.02 0.08 0.3 4463.6 0.03 3
2.0 0.07 0.02 0.11 1.1 2475.2 0.04 3
4.0 0.09 0.03 0.19 2.4 1866.0 0.06 3
8.0 0.13 0.08 0.31 2.9 1765.0 0.10 3
16.0 0.20 0.24 0.48 2.7 1832.5 0.17 3
32.0 0.26 0.52 0.78 2.9 1741.2 0.28 3
64.0 0.39 1.08 1.38 2.9 1760.0 0.52 3
128.0 0.68 2.37 2.63 2.9 1740.3 1.03 3
256.0 1.35 4.62 5.38 3.1 1645.6 4.39 3
512.0 27.91 13.05 10.59 3.2 1595.6 57.11 3
CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000006F6
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Measured 2402 MHz
AMD64 processor architecture, 2 CPUs
Windows NT Version 6.0, build 6000,
Memory Status Maximum Use
Mbytes of physical memory 3006 ------##################
Percent of memory in use 81
Free physical memory Mbytes 567
Mbytes of paging file 6215
Free Mbytes of paging file 2967
User Mbytes of virtual space 8388607
Free user virtual Mbytes 8387500
Screen setting 1280 x 1024 x 32 bits = 5.2 MB
End at Fri Jul 20 15:40:34 2007
Example Results Using 4 GB RAM
32.0 0.23 0.54 0.78 3.1 1629.5 0.28 3
64.0 0.36 1.12 1.36 2.9 1729.9 0.53 3
128.0 0.68 2.66 2.62 2.9 1725.9 1.00 3
256.0 1.20 5.02 5.30 3.0 1706.6 4.13 3
512.0 2.32 10.84 12.47 3.1 1603.4 5.80 3
|
To Start
More Results
The displaying method comes from a 1997 Microsoft sample program, ShowDib. This uses CreateDIBitmap so that fast BitBlt copying can be used. In the past, the size that can be created for fast copying could vary, depending on the version of Windows and graphics driver. Most recent results via Windows XP showed a limit of 64 MB.
In the case of my benchmark, when the DIB cannot be created, the slower StretchDIBits method is used to copy part of the image to the display. Although it should have been clear that CreateDIBitmap would use more memory, it was not obvious on older systems with limited and slower main RAM.
Tests show that the DIBs are at 32 bits, 33% larger than the original BMP data. So, a 512 MB image increases to 682 MB and the program can have two open. RAM space used is outside the user’s virtual space but can show up via free memory space (if large enough) and free paging file space.
Below are Enlarge and Rotate speeds at 256 and 512 MB using 64-Bit Vista and XP Pro x64 with four versions of the benchmark, the original, the 64 bit version, a 32 bit version via a later MS compiler and a version that uses StretchDIBits for the larger images. Also shown are RAM, PageFile and User Virtual Space usage. Some Windows XP results with different RAM size are shown for comparison purposes.
- 64-Bit Vista speeds are much better than 3 GB RAM when 4 GB is available
- Speed and memory occupancy is similar with 64 bit, 32 bit and original benchmarks
- RAM and PageFile use is increased when using CreateDIBitmap (for fast BitBlt copying) vs StretchDIBits
- 64-Bit Windows can use CreateDIBitmap for larger images and this can lead to poor performance due to excessive paging
- Enlarge/Rotate (no paging) speeds can be faster when StretchDIBits is used
- 64-Bit Windows uses 50 to 60 MB more User Virtual Space than Windows XP
|
What is not shown is the reduction in scrolling speed using StretchDIBits which, on the Vista PC at 256 MB, was 1706 MB/second or 3.0 milliseconds per screen using BitBlt, to 171 MB/second or 29.5 milliseconds with Stretch. The XP x64 PC results were 4.3 to 32.9 milliseconds.
Latest results shown are for 64-Bit Windows 7 on a PC with 8 GB RAM. Here, real and virtual memory usage is similar to other 64 bit versions of Windows.
Besides showing results for the main graphics card, others are provided for motherboard integrated graphics where measured performance is similar. For this PC and others, the latest compilers appear to generate rotation times taking around twice as long as they should but only with 256 MB images.
To Start
|
RAM BMP Enlarge Rotate Free Free Used Used Used Used Used
GB MB Secs Secs MB RAM MB RAM MB Pgfile Pgfile MB MB
Start End RAM Start End Pgfile Virtual
Phenom Win7 8 256 0.95 7.21
64 bit 512 1.68 7.85 6493 4164 2329 1601 3954 2353 1102
32 bit 8 256 1.29 7.23
512 2.38 8.04 6496 4164 2332 1596 3952 2356 1099
Original 8 256 1.35 4.35
512 2.49 8.68 >4095 >4095 N/A <4096 <4096 N/A 1103
Stretch 8 256 0.54 6.82
64 bit 512 0.88 7.10 6310 5301 1009 1797 2824 1027 1102
On board 8 256 0.95 7.29
64 bit 412 1.74 7.89 6465 4158 2307 1593 3824 2331 1102
________________________________________________________________________________________
C2D Vista 3 256 1.35 4.39
64 bit 512 27.91 57.11 567 3248 1107
4 256 1.20 4.13
512 2.32 5.80 3126 877 2249 959 3288 2329 1107
32 bit 4 256 1.30 4.22
512 2.53 5.91 3170 897 2273 957 3275 2318 1094
Original 4 256 1.48 4.52
512 2.80 8.15 3182 900 2282 N/A N/A N/A 1094
Stretch 4 256 0.76 3.76
64 bit 512 1.35 4.53 3169 2170 999 915 1936 1021 1107
________________________________________________________________________________________
AMD XP x64 1 256 119.28 58.51
64 bit 512 335.83 832.41 518 183 N/A 415 2734 2319 1081
32 bit 1 256 71.92 88.30
512 246.43 971.95 801 129 N/A 407 2736 2329 1076
Original 1 256 47.39 99.84
512 189.28 1061.02 524 192 N/A 411 2616 2205 1072
Stretch 1 256 0.59 9.27
64 bit 512 8.40 160.08 607 60 N/A 409 1439 1030 1081
________________________________________________________________________________________
P4 XP 0.5 256 66.79 184.56
Original 512 140.41 148.08 421 40 N/A 18 1037 1019 1047
________________________________________________________________________________________
P4 XP 1 256 1.30 7.05
Original 512 1.88 35.21 131 1122 1036
________________________________________________________________________________________
C2D XP 2 256 1.21 5.48
Original 512 1.71 6.53 608 1302 1054
|
To Start
|
4 GB Data
With 8192 GB of user virtual memory available using 64-Bit Windows, compared with 2 GB via 32-Bit versions, it is tempting to write programs with vast data arrays instead of bothering with frequent disk input and output. Some would claim that, when paging is necessary, it will be just as fast as normal disk data transfers.
I ran some tests using IntBurn64 in
More64bit.zip
and the 32 bit version or reliability test in
BusSpd2k.zip.
These are designed to run at the highest speed whilst checking for correct results at a chosen data size and minimum running time. There are six tests with write and read once, using different data patterns. This is followed by 6 tests with read only. Each of the latter is preceded by an untimed write/read and an extra read pass to calibrate the number of read passes needed for the chosen time. This is a significant overhead when one pass is used.
Following is an example log file for the Core 2 Duo with 64-Bit Vista, running for the minimum time at 3860000 KB (3.68 GB) where Vista managed to find sufficient memory space for the last three reading tests at full speed. Maximum write/read speed, at lower memory demands, is around 3300 MB/second, with the first test usually at about 2200 MB/second. With the total running time being too long at 1 hour 24 minutes, I produced a version of the 64 bit benchmark that runs just one write/read test in order to measure paging speeds with data size up to 4 GB and higher.
To Start
64 Bit Integer Reliability Test Version 1.0 for 64 bit OS
Copyright (C) Roy Longbottom 2006
Batch Command KB 3860000 SECS 1 P1 LOG INT64RAM.TXT
Test 3860000 KB at 1 seconds per test, Start at Mon Aug 06 20:09:49 2007
Write/Read
1 52 MB/sec Pattern 0000000000000000 Result OK 1 passes
2 21 MB/sec Pattern FFFFFFFFFFFFFFFF Result OK 1 passes
3 17 MB/sec Pattern A5A5A5A5A5A5A5A5 Result OK 1 passes
4 28 MB/sec Pattern 5555555555555555 Result OK 1 passes
5 24 MB/sec Pattern 3333333333333333 Result OK 1 passes
6 18 MB/sec Pattern F0F0F0F0F0F0F0F0 Result OK 1 passes
Read
1 14 MB/sec Pattern 0000000000000000 Result OK 1 passes
2 23 MB/sec Pattern FFFFFFFFFFFFFFFF Result OK 1 passes
3 21 MB/sec Pattern A5A5A5A5A5A5A5A5 Result OK 1 passes
4 5265 MB/sec Pattern 5555555555555555 Result OK 2 passes
5 5330 MB/sec Pattern 3333333333333333 Result OK 2 passes
6 5301 MB/sec Pattern F0F0F0F0F0F0F0F0 Result OK 2 passes
Reliability Test Ended Mon Aug 06 21:34:04 2007
|
To Start
Paging Test
As can be seen above, running all 12 tests to measure paging speeds with those memory demands took nearly 25 minutes. The benchmarks have been modified to use a Paging parameter that runs just one write/read test (now in More64bit.zip and BusSpd2k.zip.). The test can only be run from a BAT file with the following example parameters:
Start BusSpd2k Reliability, Paging, KB 100000, Log Paging.txt
Start IntBurn64 Auto, Paging, KB 100000, Log Paging.txt
|
Following are 32 bit and 64 bit results representing the situation where memory demands are slowly increased. Data transfer speed with paging depends on what has run before. For example, suddenly demanding 80% of memory capacity is likely to produce very slow speed.
For 32 bit Windows, the 2 GB virtual memory space is allocated to the application via a table of unmovable sequential addresses. This space also addresses the EXE file and some items for use by Windows. The table can become fragmented, further reducing space available for a single data array. The maximum that could be used was 1,200,000 KB with Windows XP and 1,500,000 KB using Windows 2000. Sometime ago, the BMPSpeed benchmark (see above) was modified so that XP could run using 512 MB images, where memory demands included 2 x 512 MB, 256 MB and 128 MB. The 256 MB was dropped for the last test.
The tables also show normal disk writing/reading speeds. With 32 bit Windows and the two PCs with 512 MB RAM, data transfer rates with paging were relatively good using data size somewhat greater than RAM capacity. Worst case was 3 to 4 times slower than normal disk transfers and 40 to 65 times slower than with data in RAM.
With the 32 bit application running on 64 bit Windows, User Virtual Space is detected as 4 GB by the program. Maximum array size that could be allocated was 2,000,000 KB. At this size with 1 GB RAM, paging speed was 9 times slower than normal disk transfers and 340 times slower than memory based data. Speed had also reduced considerably with 1 GB data.
User Virtual Space is detected as 8192 GB by the 64 bit benchmark but maximum data array size was between 5,000,000 and 6,000,000 KB on the PC with 1 GB RAM and Windows XP x64 then 7,900,000 KB with Vista and 4 GB memory. Performance of the former was essentially the same as the 32 bit program. Vista paging speeds had a higher tendency to improve with a larger data array with worst case 5.5 times slower than normal disk but still 340 times slower than with data in RAM.
Later, the 64 bit tests were run on a PC using a Phenom II with 8 GB RAM and Windows 7. In this case, maximum array size that could be allocated was 14 GB. As with the Vista based PC, allocation of paging file size was set as automatic.
Data mainly remained in RAM up to a file size of 7 GB but data transfer speed was 400 times slower at 14 GB.
To Start
32 Bit BusSpd2K 32 Bit BusSpd2K 32 Bit BusSpd2K
CPU Athlon XP Pentium 4 Athlon 64
MHz 2088 1900 2210
RAM MB 512 512 1024
Windows 2000 XP XP x64
Disk W/R
MB/sec 50 49 55
KB Secs MB/sec KB Secs MB/sec KB Secs MB/sec
100000 970 100000 532 100000 2040
300000 1 932 300000 2 285 800000 66 25
350000 1 929 350000 13 56 850000 31 56
400000 6 127 400000 22 38 900000 61 30
450000 8 117 450000 19 48 920000 118 16
470000 8 118 470000 14 70 930000 112 17
480000 8 123 480000 15 64 940000 92 21
490000 9 116 490000 24 41 950000 114 17
500000 13 80 500000 21 49 960000 123 16
510000 15 68 510000 27 38 970000 124 16
520000 16 65 520000 23 46 980000 125 16
530000 19 58 530000 29 37 990000 135 15
540000 21 53 540000 32 35 1000000 137 15
1100000 188 12
1200000 154 16 1200000 189 13 1200000 223 11
1300000 1300000 N/A 1300000 380 7
1500000 205 15 1400000 358 8
1600000 N/A
2000000 683 6
N/A Cannot allocate memory 2100000 N/A
64 Bit IntBurn64 64 Bit IntBurn64 64 Bit IntBurn64
CPU Athlon 64 Core 2 Duo Phenom II
MHz 2210 2400 3000
RAM MB 1024 4096 8192
Windows XP x64 64-Bit Vista 64-Bit Windows 7
Disk W/R
MB/sec 55 55 92
KB Secs MB/sec KB Secs MB/sec KB Secs MB/sec
100000 2041 100000 3393 100000 5146
800000 1 1976 2500000 2 2868 2000000 1 4900
850000 23 77 3000000 2 2878 3000000 1 4658
900000 58 32 3100000 2 2847 3500000 2 4651
920000 61 31 3200000 2 2899 4000000 2 4488
930000 91 21 3300000 3 2698 4500000 2 4489
940000 96 20 3400000 3 2610 5000000 2 4477
950000 93 21 3500000 7 1075 5500000 3 4166
960000 89 22 3600000 10 750 6000000 3 4051
970000 142 14 3700000 17 459 6500000 3 4036
980000 125 16 3800000 107 73 7000000 4 4078
990000 119 17 3900000 210 38 7500000 72 214
1000000 128 16 4000000 146 56 7600000 170 91
1100000 188 12 7700000 168 94
1200000 205 12 5000000 1024 10 7800000 230 69
1300000 266 10 7000000 652 22 7900000 239 68
1400000 358 8 7900000 770 21 8000000 227 72
8000000 N/A 9000000 697 26
2000000 683 6 10000000 1231 17
2100000 32 Bit BusSpd2K 14000000 2742 10
5000000 1707 6 15000000 N/A
6000000 N/A 2000000 2 2139
2100000 N/A
N/A Cannot allocate memory
|
To Start
Paging Disk Activity
Tests were run with Performance Monitor logging of Physical Disk Write and Read Bytes and Bytes/Second. The graphs below are extrapolations of million bytes written and read over the 5 second monitoring periods. At least they confirm that the disks can run at 50 to 60 MB/second. CPU utilisation was also reported and was extremely low for most of the time.
These tests were run cold, that is without gradual increase in memory demands.
Other calculations carried out were for average KB per transfer and this showed a significant difference between XP x64 and 64-Bit Vista. The former consistently produced (approximately) 4 KB per read and 64 KB per write. Vista was completely different, mainly averaging nearly 1000 KB per write for the main writing period. On loading the data there seem to be periods of 16, 32 and 64 KB per read down to 8 KB at the end.
Peak reading speeds are as reflected in my DiskGraf Benchmark for block sizes 4 KB and 64 KB respectively.
The slow speeds will be caused by limited random access due to read following write or accessing fragmented data space. At the end of the Core 2 Duo test, when mainly reading is taking place, average time per read access is around 4 milliseconds, about half a disk revolution. With sequential data, speed would be much faster, either reading directly from the disk or via the disk’s buffer. Overall, it appears that Vista paging can be faster than Windows XP as it has the ability to read data at a larger page size.
Note that performance log files need checking to see that all samples are recorded as the clock and recording can stop with the unusual heavy load, and up to 5 missing minutes have been noted.
Windows 7 results also indicate large block size for writing, with reading mainly using average block sizes of 50 KB to 64 KB over the five second sampling periods. The disk used runs at 5400 RPM which will produce longer random access times than on the other PCs. Then it has faster data transfer speed and, for the particular test, only pages out about half of the data.
To Start
Athlon 64, XP x64, 1 GB RAM, 819 MB data
Write/Read 205.2 seconds, 8.0 MB/second
Time at RAM speed would be < 1 second
Data 572 MB written, 571 MB read
|
Core 2 Duo, 64-Bit Vista, 4 GB RAM, 5120 MB data
Write/Read 499.0 seconds, 21 MB/second
Time at RAM speed would be < 3 seconds
Data 4372 MB written, 3804 MB read
|
Phenom II Quad CPU 3.0 GHz
8 GB DDR3 RAM, >90 MB/second disk
64-Bit Windows 7
Data Volume 8192 MB
Write/Read Time 252 seconds
Speed (8192x2/252) - 65 MB/second
Data written to disk 4529 MB
Data read from disk 3802 MB
Average write block 911 KB
Average read block 54 KB
Maximum writing speed 62 MB/second
Maximum reading speed 85 MB/second
Average disk read+write 33 MB/second
Average CPU Utilisation 27% of 1 CPU
First 15 seconds 100% of 1 CPU
|
|
To Start
|
Roy Longbottom January 2010
The new Internet Home for my PC Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection
|