Maldocs payload extraction

By Filip Olszak
Beginner friendly introduction to analyzing malicious Office documents and VBA macros dynamically for payload extraction.


In 2020 many behavioral analysis tools and public services like make it easy to detonate and identify malicious Office documents on scale, highlighting the suspicious indicators for you. That said, like with any kind of malware, newer samples will go undetected and manual analysis will be needed upon discovery to extract the final payload, understand the functionality, and properly scope the incident while also generating operational intel and IoCs that are invaluable during incident response.
In this short post I want to summarize a few tips and tricks that will help you get past the most common obfuscation and anti-analysis techniques to accomplish your core goals when reversing modern VBA-enabled malicious Office documents.

Why not analyze code statically?

You can, and to some extent you will need to. There is a massive set of tools created by @decalage2, @DidierStevens and others, that will aid you in pretty much any step you might want to take when performing static analysis of maldocs. That said, reading through heavily obfuscated code which is often purposefully scattered with undocumented functions and misleading syntax may be a time consuming process. During incident response when speed is key, this becomes an issue.
The most common approach malware researchers choose when triaging a new and unfamiliar sample, is to start with behavioral/dynamic analysis in order to identify some key goals the code is trying to accomplish. In many cases confirming whether a given sample is malicious or not is the only objective, and the task can be finished there. At this stage, we might also already have enough information to produce IOCs, classify the malware and build signatures. The gathered information will also be an excellent entry point for more involved tasks such as malware research, if needed.

Debugging with MS Office

The first and most typical method used for macro-enabled specimens is to debug them with the built-in MS Office’s VBA editor - this to some level guarantees the observed behavior will not deviate much from that on the victim machine. Nevertheless, we need to keep in mind that some techniques such as VBA-stomping may prevent us from using it effectively.
(Most of the techniques described below will require the MS Office suite to be installed. Using free alternatives like LibreOffice or OpenOffice can work in some cases, but is not advised since malware likes to leverage various undocumented functionalities, that are likely to be missing in non-Microsoft implementations of VBA.)
Since macro code will usually incorporate a tone of mostly dead functions to confuse analysts, it’s best to start debugging from a document life-cycle event function — like Auto_Open(), Document_Open() etc. and follow the execution flow a user would have experienced upon opening the document and “Enabling Content”.
While stepping through the code (F8), we can peek at current variable contents by hovering over, adding them to a watch list, or perhaps most conveniently - in the “Locals” window which lists all the initialized local function variables, together with the current values.
This is a powerful technique allowing us to resolve any type of complex expression, back into the human-readable code.
Locals window
Hover over initialized variable
As we monitor these variables we are likely to observe the macro deobfuscating itself, extracting base64/hex blobs from Form objects and embedded files, finishing with constructing the final dropper payload, or calling Native API functions to inject and migrate into a more stable host process.


On the way we might encounter anti-analysis tricks such as pointless loops forcing us to set breakpoints to get past them, misdirection and simply dead code that populates variables which never get used or introduces short delays in execution to fool behavioral analysis.
An unreferenced Dim
A pointless For loop

Final payload

If you follow the execution flow, with a few breakpoints (to jump by the loops) you will reach a point where the final payload is being assembled and executed.
The most common execution techniques (as of 2020) use either Application.Shell or WMI methods to run command lines. Benefit of the latter is that it does not create a suspicious and rather easy to detect parent-child process relationship.
Since VBA can use ActiveX objects, and functions imported from DLLs (including kernel32, ntdll), there is a variety of other execution techniques, similar to those leveraged by malicious executables. Most notably, the default CobaltStrike maldoc uses WinAPIs to inject shellcode into newly spawned rundll32. A bit on what is possible:
In case of our sample we will eventually stumble upon the Win32_Process class name, and later a decoded PowerShell command line with a Start-BitsTransfer cmdlet being passed to the newly created object — this technique will be less likely to decrease in popularity in the coming years as most cyber criminals are moving away from using highly logged PowerShell, to less transparent interfaces like WMI or direct API calls.
Start-BitsTransfer cmdlet
Win32_Process WMI class

Debugging with x64dbg

Letting macros do the deobfuscation for us is usually straight forward. But what if there are too many anti-debugging tricks involved? This is when we might want to move the debugging a level lower, and intercept the final payload as it is being passed to an API.
Knowing that most commonly running malicious macros will result in creation of a new process or a form of process injection we can set breakpoints on the limited set of WinAPIs that can be used for this purpose — e.g. CreateProcess, CreateProcessAsUser.
After reaching the call, this will allow us to review the function arguments and effectively intercept the final dropper payload. To do this we can attach our favorite debugger to EXCEL.EXE and set breakpoints on kernel32.CreateProcessW / CreateProcessA. We should reach the call to it after running the macro.
At this point we can investigate the parameters being passed to that function. The Microsoft docs tell us that a new process command line can be found at the second argument — lpCommandLine.
In case of Intel’s x64 and fastcall convention, that means the RDX register, which indeed seems to contain our payload. (on x86 systems you’d find all the parameters on the stack)
This maldoc utilizes inline Microsoft Scripting Host (mshta) vbscript to execute remote code – a common and very old technique. This kind of process ancestry is fairly rare on clean systems, will be caught by any decent EDR software, and similarly to PowerShell adversaries are moving away from it in favor of stealthier methods.
One thing you might encounter is using mshta in conjunction with WMI — this is also easily detectable, but not well covered by many internal SIEMs and EPP software.
Dechained mshta + wmi
If during early analysis stages we notice strings indicating WMI abuse, we can attach a high integrity (running as NT AUTHORITY/SYSTEM) debugger to the WmiPrvSE.Exe process, and dump parameters of the CreateProcessAsUser function.
Dereferencing the memory pointer at the 3rd (R8 — lpCommandLine) parameter in the dump reveals a PowerShell payload which can be copied and further analyzed in CyberChef:
The CreateProcess API is not used here due to the difference in security context of EXCEL and WMI Privileged Hoster.

Leveraging AMSI sensor (ETS)

Microsoft’s Antimalware Scan Interface (AMSI) is natively hooked into the VBA7 DLL used by Office to run macros, it is usually able to log things like deobfuscated WMI objects and WinAPI functions together with their parameters, right before those are passed for execution.
We can use the AMSI event tracing session log to dump the final macro payload with very low effort. To do this, we first enable AMSI ETS logging — it is easiest done with logman.
After we run our malicious macro, we can import the logs to Event Viewer and investigate the OFFICE_VBA entries. There will often be multiple events generated throughout the execution — Microsoft does not document this well, but experience shows that the bigger the contentsize of an event, the more complete it is.
With some luck, the contents will contain something close to a “clear-text” payload.
If it does not, we might want to look at other events logged by AMSI like the WMI events.

Shellcode (scdbg, x64dbg)

In most cases the shell code is injected into the Office process itself, it is then executed as a new thread, and results in some form of further code execution (drops malware / injects into a running process).
VirtualAlloc is often substituted with HeapCreate and HeapAlloc, while CreateThread can be replaced with, for example — EnumResourceTypesW for the same results.
While its possible to inject in a stealthy manner, in most cases all of these techniques are likely to leave unbacked RWX memory pages in the target process.
For quick wins we usually want to identify these anomalous pages, extract and detonate the injected assembly with a shellcode emulator like scdbg, to understand it in context of WinAPIs.
AMSI logs can also be helpful when analyzing where and how the payload is being written. For example, this is how moving shell code byte by byte, using the RtlMoveMemory API could get logged.
The injected code is likely to be XOR’ed and prologued with a decoder stub. Unfortunately, often it also involves an “egg hunter” and execution outside the Office process will not reveal anything meaningful. This is where ASM knowledge and IDA skills come in handy.