I have an AMD Athlon 64 system, and of course, my main preferred platform today is still Win32 on XP when I develop for server environments.
In the past, I always choose an Intel platform, because of driver madness and just the affinity that you feel with the OS that Intel has. But now, on my system, I never have problems. Great. But that's not my story; I just wanted a system that could do x64 as x32 as well. Unfortunately, I still cannot test IA64 systems.
Question: Is going 64-bit just hype? mmm, when I was the CEO of my CPU baking company, I'd told you, 'of course not!'.
When I was eprogrammer, I'd told you I agree with him :). Why? Because of hard numbers!
And if you don't love hard numbers, you're certainly not like me...
Most tests involve graphics. Well, that's important as well, but what if we skip graphics? Does a 64-bits environment really beat 32-bits even when we don't do 64bit math?
So here we go.
I have a testing system. An AMD Athlon64 3200+, 1 gigabyte RAM and the fastest memory that exists for it. Don't be jealous, you soon will get a better system from your mom!
Our test does the following. It opens a big file; it will encode it to a base64 string. This is a good test, since both integer math and memory allocations play a role.
The encoding is done through a tiny COM component that I quickly wrote for this purpose. It just uses the ATL framework, <atlenc.h> which has full support for base64 coding and decoding.
Then I have a vbscript tester. It has the following lines...
(it opens oembios.bin, I just choose this because, it had to be big, so take your pick to redo the test)
Dim obj, v, t
Set obj = Createobject("NWC.Decode")
t = Timer
obj.readfile "c:\windows\system32\oembios.bin" ’12.5 MB
WScript.Echo “FileRead: “ & Timer - t
t = Timer
obj.ToBase64 v
WScript.Echo “Encode: “ & timer -t
On XP, the 32bit OS, this takes a whopping 0.21 seconds.
FYI, on Windows 2000, it also takes a 0.21 seconds. You see, the OS is at -this- particular test, is not showing improvements over oleautomation & COM performance.
When I compile the CPP COM object, with all optimizations disabled, it would take 0.65 seconds. So you get an idea of the difference when a compiler gets smart.
On Windows Server x64 edition, the same script (it opens the same file), and the COM object compiled to x64 code, takes just 0.11 seconds!
Of course, a good performance test, would involve pure tests, so measuring a mix of operations, would garble our output.
So I improved the COM object, and it would use two memory buffers (one for decoding, the other for binary contents) and only resize them, if a bigger allocation was needed.
In addition, the function ToString() which returns a variant, became ‘byref’ method. If you do so, you can reuse and reallocate string space (not many oleautomation programmers are aware of this efficiency step).
This makes a difference, since our file was 12.5 MB in size. An encoded Base64 unicode string, would need 35MB RAM string storage, and 12.5 MB binary space plus a conversion buffer of 17 MB (because ATL assumes you use a non wide-string). It makes sense, not to destroy that memory heap space and not recreate it at each call.
So, the first time, we decode, we measure memory allocations and math, the second time, all strings and allocations would be -used-, not (re)allocated!
'Hard' numbers! (finally)
Here is a typical output of our script doing 3 times a base64 encoding big files of 12.5 MB in size. The first time (yellow), the heap cache is not effective, the second time (blue), I open a different file (not 10.5 MB) and the third time (yellow again), I reopen the first file again.
|
x64 environment |
win32 environment |
|
readfile1 (test1) (12.5MB) |
0.011 |
0.044 |
|
encode |
0.097 |
0.203 (! See remarks *) |
|
Final base64 string length |
35MB |
identical |
|
Readfile 2 (test2) (10.5MB) |
0,015 |
0.015 |
|
Encode |
0.063 |
0.125 |
|
Final base64 string length |
28MB |
identical |
|
Readfile 1 (test3) (12.5MB) |
0.015 |
0.016 |
|
Encode |
0.109 |
0.125 |
|
Final string length |
35 MB |
|
Update: Added WOW64 test. ie, a vbs in 32-bit mode and a 32-bit Com server in emulation mode.
Readfile 1: 0.016
Encode: 0.09
ReadFile 2: 0.015
Encode: 0.047
Readfile 1: 0.016
Encode: 0.09
On a intel 4 with HT and 2.7 GH speed, the numbers hardly differ with the results on Windows x64.
Differences measured
File I/O read performance: hardly any difference
Memory allocation: +/- 100% faster on x64
(non 64 bit) 32 bit Integer Math: +/- 25% faster on x64.
Conclusion:
The biggest difference in our silly simple test was how memory is dealt with. After al, the conclusion is fairly explainable. On the X64 system, disk I/O read performance is not measurable faster, since the real bottleneck, must be hardware here, not integer math.
Memory heap allocation speed: Here we see a 100% performance boost. Or is our test on the AMD 64 running with a 32-bit just 100% slower because of the 'AMD 32-bit CPU emulation mode'? I think that the test that also ran on an Intel P4 system, answers that question.