Sometimes, I am willing to disclose :) some secrets of real performance. Even in the .NET world, we can't avoid BSTR allocation, and in the unmanaged coding world, automation and COM will be at our path once more.
In my honest opinion, many, many programmers make or made the mistake, in the unmanaged world, of not reallocating resources, instead, they just destroy the resource, and allocate it again. MS tried to fix this 'bug' by caching allocations, as much as possible. I find this decision, to start caching allocations, I mean for BSTR allocation, not a good decision. This must be one of the reasons, that COM in a multitasked, MPS environment sometimes, simply cannot scale!
I'll explain why. In a single user environment, caching data for a thread is a good idea, since say MS Word, and scripts like in VBA and VBS, might reuse data/allocations. But as soon as our ASP/COM server environment starts to do this, and the code is reentrant, caching is useless, since threads that allocated data, might not be allowed to reuse zombie-data (if a caching-pattern is used) from another thread.
It could have been solved so easily! (Now, I might sound presumptuous to say that, I agree) How? Just don't cache but reallocate!
VB6 and automation clients for instance, uses the BSTR datatype all over the place, and ATL when used in a COM environement, does as well. If you look at the compiled code that programmers deliver, they never reallocate (only the runtime does sometimes). So for instance myString = "Hello" and myString = "bye" could have been compiled internally by:
myString = SysAllocString(L"Hello");
SysReAllocString(myString, L"bye");
And that's really all!
SysReAllocString(Len) internally uses CoTaskMemRealloc and that function tries to enlarge the memory allocation in-place, and this at its turn minimizes RAM-synchronization in MPS systems on the CPU.
So far, my theory. Am I just filling up your internet-html disk-cache and chit-chatting because I'm just idle for an hour? No.
Let's just try this out!
I've rewritten CComBSTR (from the ATL namespace) and you can find this in the platform SDK at \PlatSDK\Include\atl\atlbase.h (in case you have Visual Studio 2005, don't use this location, but use the most recent header files).
This silly little program does nothing but appending random (sort of) wide strings to a BSTR allocation. Let's rewrite CComBSTR and measure it!
I assume that you can get the headers right to get the program below compile and run.
int
_tmain()
{
HRESULT hr = S_OK;
CoInitialize(NULL);
{
CComBSTR appendPlay;
DWORD timer = GetTickCount();
// Ethan Winer, an Assembly coding specialist, once thought me that loops counting down to zero are faster, this still is the case!, just a silly fact.
for(int xy = 10000; xy > 0; xy--)
{
PWSTR zy = xy % 2 == 0 ? L"hiya" : L"bye";
appendPlay.Append(zy);
}
wprintf(L"speed %d\n", GetTickCount() - timer);
}
CoUninitialize();
}
Now run the code and on my AMD 3200+ system, this takes 578 time ticks. This is even with the OLE BSTR cache enabled! (When caching is disabled, this takes 520 time ticks).
Let's improve the Append part of CComBSTR (make sure you keep the original atlbase.h intact). In my case, I just redefined CComBSTR to CComBSTR2 and copy-pasted all of it and rewrote the slow parts.
The slow original code is using the 'delete' 'allocate' sequence.
// very slow original
HRESULT __stdcall Append(LPCOLESTR lpsz, int nLen) throw()
{
if(lpsz == NULL)
{
if(nLen != 0)
return E_INVALIDARG;
else
return S_OK;
}
int n1 = Length();
if (n1+nLen < n1)
return E_OUTOFMEMORY;
BSTR b;
b = ::SysAllocStringLen(NULL, n1+nLen);
if (b == NULL)
return E_OUTOFMEMORY;
if(m_str != NULL)
memcpy(b, m_str, n1*sizeof(OLECHAR));
memcpy(b+n1, lpsz, nLen*sizeof(OLECHAR));
b[n1+nLen] = NULL;
SysFreeString(m_str);
m_str = b;
return S_OK;
}
And here goes the improved code. It has the Automation runtime resize the BSTR while the string in most cases remains at the same memory address. This is how the original BSTR programmers have designed for performance, while nobody is utilizing it! But we instead, do use it, as you understand.
HRESULT __stdcall Append(LPCOLESTR lpsz, int nLen) throw()
{
if (lpsz == NULL || (m_str != NULL && nLen == 0))
return S_OK;
int n1 = Length();
HRESULT hr = SetLength(n1 + nLen);
if ( SUCCEEDED(hr) )
memcpy(m_str+n1, lpsz, nLen*sizeof(OLECHAR));
return hr;
}
We need to append the SetLength function, which is a static wrapper for SysReAllocStringLen(..)
// Cuts the length to specified but does not clear contents
HRESULT __stdcall SetLength(unsigned int length) throw()
{
return _SetLength(&m_str, length);
}
static HRESULT __stdcall _SetLength(BSTR * str, unsigned int length) throw()
{
return ::SysReAllocStringLen(str, NULL, length) == FALSE ? E_OUTOFMEMORY : S_OK;
}
I've included the full 99% compatible CComBSTR2 replacement for you, as a handy dowload so bother about that later. :)
Now, get me to the results please, how much faster would this code run now?
Yes, it takes a whopping 15 milliseconds! And figure that, against 578 milliseconds, which makes the improvement 3800%
Now you might understand why .NET had to be invented by MS :) they figured that the maximum scalability limit was hit on real MPS systems, and that COM never could perform the task of being scalable just because the BSTR sucks in performance! And now, you know that, I'm just now on the conspiracy path, and I'm lying :-).
Anyway, the conclusion is, that current COM clients, and applications and servers, could, if they would like to, improve a lot for free, by just removing the BSTR reallocation barrier and take advantage of the maximum 'unmanaged code' speed possible on a Windows (r) System.
The non-conspiracy theory is that caching was made in times, when Microsoft did not play a big role in server environments, and that computers were relativily slow. A conclusion would be that if MS would like to improve an old car (COM) for free, they'd just remove the caching and implement the idea that I've proven to be very good. This would be good for classic ASP pages as well that are still very popular on the internet.
Just a final question. Why bother? If you are an Automation developer, creating services that depend heavily on BSTRs?
The answer is, I bothered once a while ago, just because of some artistic feeling (good programmers are artists, not scientists :) ) that I could improve the enormous BSTR stress on my product here Isp Session Manager. Of course, the biggest part of such managed COM server (as COM+ was called in the past!) is doing talking to a DBMS. But after implementing the CComBSTR replacement, the performance on a MPS server, suddenly got very easy and the pages per second throughput went up and showed a flat line (that's the wet dream of each webmaster). Before, without using the CComBSTR replacement the throughtput was erratic, so this again proved my point, that not-reallocating BSTRs causes a huge demand on RAM synchronization and makes scalability limited because of wrong usage of the COM runtime.
Here, you got the CComBSTR whopper for your own downloads. Please do not forget to deploy it after download...