Skip to content
Naked Security Naked Security

From the Labs: New developments in Microsoft Office malware

Malware that arrives inside innocent-looking documents has taken a new turn recently. Graham Chantry of SophosLabs investigates...

In September 2014, we wrote about a resurgence in VBA malware.

VBA stands for Visual Basic for Applications: it is a powerful and very widely-used programming tool that can be used right inside applications such as Microsoft Office.

That makes it common, and indeed perfectly usual, in legitimate files.

But, as we we wrote last time:

Visual Basic code is easy to write, flexible and easy to refactor. Similar functionality can often be expressed in many different ways which gives malware authors more options for producing distinct, workable versions of their software than they have with exploits.

In short, what is good for the gander is equally good for the goose.

Indeed, over the past six months, malware that arrives as a VBA program inside an innocent-looking document has become an all-too-common occurrence in the threat landscape, and an essential weapon in spam campaigns.

Backward compatibility

Obviously, attackers who use VBA rely on their victims having some version of Office installed.

As you can see, SophosLabs statistics show that malware writers prefer Word and Excel to PowerPoint.

The reason for this is likely because malware delivered in spam very commonly pretends to be a courier delivery notice or an invoice, or similar, and these are typically stored as Word documents or Excel spreadsheets.

But the crooks also greatly prefer the older “1997-2003” Office file format.

Files in the 1997-2003 format are stored in what Microsoft calls the Object Linking and Embedding (OLE) Compound File format, often just called OLE2 for short.

OLE2 uses a FAT-like structure to define various streams (which you can think of as files in a disk image) consisting of fixed-size blocks; these streams declare the structure and content of the document.

The rest of the VBA malware we see is in the more recent “2007 and later” format.

These files are denoted with an -X appended to the file extension (e.g. DOCX instead of DOC, XLSX instead of XLS).

Dash-X files are stored in a file specification known as Office Open XML (OOXML).

Files of this type take the form of a ZIP archive containing a series of XML files that define the document’s content and presentation.

We can only guess why malware writers have been reluctant to commit to the 2007 format, but a good bet would be the increased likelihood of a successful infection.

Newer versions of Office can open both new and old file types, thanks to backward compatibility, but the old Office versions were never patched to let them handle the new formats.

Office XML

Interestingly, there is another, little-used file format that was introduced way back in Office 2003.

Files in this format consist of a standalone XML file, and they are sufficiently unusual that they don’t appear at all in the pie chart above.

To our surprise, however, we have recently seen a surge in brand new VBA malware packaged in this old and unusual format.

Once again, we have to guess why the crooks have decided to revive this format, which might simply be down to the fact it is little used, and thus not commonly associated with attacks.

Perhaps, also, malware authors hope that the rarity of XML-type files means that some security products are unable to deconstruct it properly.

→ Sophos products can decompose OLE2, XML and OOXML type files and extract their contents in a similar way. In other words, the same malware saved in three different formats will be detected identically.

Opening the container

The process of extracting a VBA program from an Office file depends on the container format that is used.

In “1997-2003” files, VBA code is stored in a number of streams which are enclosed within the same OLE2 container as the other document streams, such as the WordDocument stream which contains the document’s text.

Office 2007 files also store their VBA code as streams in an OLE2 file, but the other document data is detached into separate XML files in the main container file, which is in the ZIP format.

So the OLE2 container that holds the VBA code is simply a file named vbaProject.bin inamongst the XML files in the outer ZIP file.

And the Office 2003 XML format also uses a dedicated OLE2 container to store VBA code, with the structural difference that the data is compressed into MSO format (a proprietary Microsoft format also used for email attachments) and then text-encoded into Base64.

If we extract the Base64 data and decode it, we obtain the MSO file, indicated by the text “ActiveMime” at the start.

Unpacking the MSO file leaves us with an OLE2 container with the VBA progam.

What next?

Using a recent malware example, we extracted the VBA code from its XML wrapper.

Here’s what we found:

At first glance the code might appear complex but it is actually very simple code that has been deliberately padded out in an attempt to disguise its true intentions.

This subroutine is the entry point of the VBA and the first points of interest are the seemingly nonsense strings declared at the start of the file and what appears to be the same four lines of code repeated in groups of three.

We will look at the strings in depth later but first let’s look at the duplicated code (highlighted in red).

These four lines appear to have no effect on the final outcome of the subroutine.

The code declares a variable that is never usefully referenced, a for loop whose termination condition assures that it is never executed and a conditional if statement that is always false.

Programming like this is often referred to as dead code, probably created automatically by a code generation engine.

Removing this dead code leaves us with a much smaller, more readable subroutine, although it is still not clear what the code actually does:

A noticeable trait is the repeated function calls to “ho3NnG”.

Each call is accompanied with one of a number of hardcoded string constants declared at the top of the file.

Jumping to the “ho3NnG” function, contained in a separate code module, once again seems to plunge us into complexity.

But notice that there are numerous GoTo statements scattered amongst the function’s body:

Since these jumps are non-conditional, and there are no labels between each jump and its destination, the code sandwiched between them can never be triggered.

Code like this is known as unreachable code, we can simply remove it from consideration.

Without the unreachable code noise, and with a little bit of re-arrangement, we are left with a much simpler function:

The code above loops through the passed-in string and XORs each character with the decimal value 255. (This has the effect of flipping each bit in each byte.)

The result of each XOR is appended to a new string which is returned to the caller.

This sort of text-unscrambling function is very common in malware, because it is a simple way of disguising data such as filenames, messages and URLs that would otherwise be both obvious and suspicious.

We can now simply replace the original calls to “ho3NnG” with the unscrambled data that comes back each time.

Now it looks more like malware:

With the formatting cleaned up a little and the variables renamed, the true intentions of this file are clear.

Simply put, this code:

  • Makes an HTTP connection to port 8080 on the server 173.xx.xx.xx
  • Downloads a file on the server called abs5ajsu.exe.
  • Saves it in the TEMP folder as fdgffdgdfga.exe.
  • Runs it.

Why use a downloader?

The crooks could simply have embedded the content of abs5ajsu.exe as scrambled data in the VBA code, so that the malware would work even when offline.

But by using a downloader, they delay showing their hand until the last moment.

Only when the Office file is opened (rather than when it is received) do they reveal what malware they are actually using in the attack.

That gives them extra flexibility: they can change the malware at any any time; adapt it depending on the geolocation of the victim; or even download clean files as decoys.

In this example, the malware that was downloaded next was a variant of Dridex, a banking Trojan derived from Cridex.

This particular sort of VBA downloader is commonly associated with Dridex payloads, accounting for around 70% of all VBA-based malware in the past three months.

What’s old is new again!

→ Sophos detects and blocks the malware described above as Troj/DocDl-GO (VBA downloader part) and Troj/Dridex-AZ (dropped malware part).


Congratulations Sophos, only 3 months late to this macro party.


This is not something we found and wrote about as breaking news, as the article makes clear. It is a discussion (“From the Labs,” as the headline says), based around a malware sample to whet the appetite of our technical readers, of trends in document-based malware over the last six months. (Does that make us three months *early* by your estimation?)

Dealing with the ever-changing malware threat isn’t a “party,” where we all have a big blowout on macro malware until 3am on Saturday morning, then sleep it off and go back to real life on Monday. It’s more of a lifestyle thing…


Thanks for an interesting article. My version of Word/Excel warns me on opening files containing macros so I guess I am safe. Is this true of the targeted products by default or must such warnings be turned on manually. If so, perhaps explaining (or linking to) how to do this would be useful.


I am not an Office user (one of the few! Pages on OS X is enough for me :-) but IIRC macro execution in external files is off by default. Of course, some, if not many, users may have turned this setting off for convenience in the past…

Indeed, there is, or has been, macro malware that urges you to turn the feature off as a necessary part of viewing a document, even including helpful instructions in the malicious document, with an arrow pointing at the right button to click :-)

See, for example, this SophosLabs paper, which shows an example of this:


Slight nit-picking point: DOCX and XLSX files *cannot* contain VBA code. To send the malware in a 2007 format file, it would have to be saved as a DOCM / XLSM file.


Ah, but that’s largely a naming convention, right? Like DOT versus DOC for templates in the olden days?


Apparently not.

I’ve just tried in Word 2013, saving a DOCM with some simple VBA code, renaming it to DOCX, and trying to open it. Word displays an error:

“We’re sorry. We can’t open Doc1.docx because we found a problem with its contents. (No error detail available.)”

Looks like Microsoft have done the right thing.


That’s not how the malware is working. It’s saving in a zipped XML collection of files (a 2003 format), which apparently Microsoft still allows to run macros (if they’re properly formed).

Sounds like a bug in Office to me. It SHOULD work exactly as you say, but the hackers have apparently found a way around that limitation.


Fair point, but I was referring to Graham’s comment in the article about 2007-format files:

“The rest of the VBA malware we see is in the more recent “2007 and later” format. These files are denoted with an -X appended to the file extension (e.g. DOCX instead of DOC, XLSX instead of XLS).”

DOCX and XLSX files cannot contain VBA code.


Point taken. You are correct as written.

Other readers should take my post as more like an addendum than a correction, please.


Graham Chantry wrote “Newer versions of Office can open both new and old file types, thanks to backward compatibility, but the old Office versions were never patched to let them handle the new formats.”

This is incorrect. Microsoft provide(s) the converters for free and has done so since 2007. See and select “1”, the Compatibility Pack.

A much more likely reason the malware authors select the older versions of MS Office is that macro execution is ON by default in the old versions and OFF by default in the newer versions.

I’m wondering if this report was reviewed by someone familiar with MS Office prior to publication.


Good point about the availability of the converters.

I think it’s still fair to say that “old versions weren’t patched to support new formats.”

It’s certainly strictly opt-in for the conversions to work at all :-) But using the old format Just Works for everyone…


Two likely reasons for PowerPoint’s low popularity among malware authors:
1) As you suggest, it’s easier to craft a credible cover letter for a Word or Excel document.
2) PowerPoint VBA is very arcane–and there’s no “Record” option to build the basic structure for you. I needed to reformat some text in multiple text boxes on every page of PowerPoint Documents and also reformat text another way in one specific text box on every page. It took me three or four days to get it right. Doing the same in MS Word first, as a pilot project, took an hour.


Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to get the latest updates in your inbox.
Which categories are you interested in?
You’re now subscribed!