Invoice Automator code snippet

Advanced Automation with DLL Injection

Sometimes, you need to capture text from a window in a business application that doesn’t export any controls to the Windows UI Automation or MSAA interfaces. You might be able to use OCR, depending on your toolkit, but what about when that fails?

We have two applications which both want to capture the same data. The vendor of Application A provided us with a tool that could capture the screen of Application B, parse out the keywords it needed, and enter them automatically in Application A. This worked great for a while, but something mysteriously changed and the OCR started capturing data unreliably. Most of the keywords would be correct, but every so often it would mis-translate part of an invoice number, sometimes so subtly that the end user didn’t notice.

This, of course, became a major headache, and so the hunt for a resolution began. After testing a few different options, we decided to learn some new technologies and roll our own replacement.

Hooking TextOuts

The screens of Application B, which had the data we needed to capture, were created with an in-house GUI framework that wasn’t enabled for Microsoft’s accessibility layer. This meant there weren’t any documented automation hooks to find the data we needed. But we knew it was possible, because we had another scripting solution that could fetch that data, so we decompiled that scripting software to look for clues as to how it managed the trick.

After searching function calls on Google for a few hours one evening, we figured out that the scripting software was hooking some low-level rendering methods that Application B used to render its text into a bitmap for displaying on the screen. We experimented a bit and found that we could do the same with EasyHook, a .NET library that does most of the hard work on the back end. Copying from their remote injection tutorial, we were able to create the main process and a DLL to inject into Application B.

It took a couple tries before we figured out the exact function used – DrawTextA – and set up an IPC client to send the captured text back to the main process.

Coordinates and Double Buffering

But of course it couldn’t be that easy: the coordinates our DrawTextA hook was sending back were all zeroes! It turns out that Application B was using double-buffering: writing the DrawTextA bitmap to a temporary device context, and then using BitBlt to write it to the screen. (This, it seems, is a common technique to prevent flickering.)

We needed the coordinates to tell where on the window each “textout” was being displayed (so we could find the coordinates we were looking for). After a bit more research, it turned out the solution was simple: we modified the injected DLL from the earlier tutorial to track the temporary DCs seen by our DrawTextA hook, and then hooked BitBlt to capture the target coordinates when those temporary DCs were copied to the screen. At that point we forwarded the text and the real coordinates to the main process:

Finding Targets

At this point the main process is receiving a list of strings that look something like this:

It’s a simple enough matter to parse them out with a regex. But how can we reliably identify the target fields we’re looking for?

We decided that tracking just the x-y coordinates would be a bit too fragile – if the window dimensions changed, the position of the fields might be adjusted automatically. Instead, we specified regexes to match field labels that are always in the same position relative to the target, and recorded the offset to the textout we’re looking for. Now, we can cycle through the list and find the most recent textout at those coordinates.

Because we don’t have a way to recognize when the screen refreshes, these textouts can pile up and increase the risk of inaccuracy. We settled on a simple expedient: Whenever we recognized that a certain required field on the screen was blank, we cleared the textout list. Once the field was filled in, the textout list would populate appropriately.

So as to maximize the future extensibility of the tool, this configuration was recorded in an XML file:

Feeding Application A

So we’ve successfully extracted the target keywords from Application B. Time to figure out how to import them to Application A! This one did have support for Microsoft’s UI Automation APIs, mostly – but the specific fields we needed were a custom control that wasn’t enabled. Fooey.

Luckily Application A was designed to be very extensible, so they had a couple different API options. After experimenting, it became clear that the most seamless option was to run a VBScript internally that had access to those controls. The only difficulty was figuring out how to connect to an application running in the background to fetch that data.

Inter-Process Communication

For all the options .NET has to allow processes to communicate, none of them were exactly trivial. We initially explored using a COM interface, but couldn’t quite work out how to implement it on the background process. So we took a slightly more circuitous route.

We set up a memory-mapped file in the main process, which could be shared between processes. Here, we serialized the observed keywords (as defined in the config file, above). Although the main process supports monitoring multiple instances of Application B, we’re only interested in the active (and hence most recent) window, so this memory-mapped file always contains the latest set of observed keywords, recalculated after every update from the injected DLLs.

VBScript, unfortunately, is very limited and does not support accessing memory-mapped files. It does support COM objects, however, so we created a very minimal COM-enabled DLL for the sole purpose of interfacing with that memory-mapped file:

After a little finagling with the APIs, we got the VBScript working, and our prototype had a seamless one-click workflow that copied the fields perfectly. After some further testing and building an installer, we were ready for deployment!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.