This is not really about suppressing/bypassing event collection, and more on understanding EDR architecture design flaws, lazy detection logic, and correlation to minimize the chance of triggering alerts with events that are (at least partially) collected.
In the previous post we discussed how solutions that use reliable, kernel-based sources for remote memory allocation events can use these to identify many of the in-the-wild injections with relative ease, regardless of the specific technique used, and without worrying that the event source is trivial to bypass from the user-mode. Most notably Microsoft uses that ETW, though there are vendors who do it better.
Today I wanted to share how easy it is to bypass any memory allocation-based logic. We will also bypass thread initialization alerting, which combined give us a technique undetectable by MDATP and many other EDRs out there, as of today.
It is important to expose detection gaps like this, not only to force security vendors to improve defenses, but primarily to build awareness around inherent limitations of these solutions and the need for in-house security R&D programs, or at least use of well-engineered managed detection services for more complete coverage.
Let's first take a look at what independent evaluations can tell us about process injections, and if there is even anything to bypass.
It's definitely good to know the product you're using is not able to flag Meterpreter's migrate command and process hollowing procedures from a 5+-year-old Carbanak malware available on GitHub, even with prior knowledge of what is going to be tested, and half a year to prepare if needed.
Other than that value of the last evaluation in the context of injections is very limited, and we are not getting the full picture of how much each vendor invests into researching TTPs relevant right now, and in the future, or how robust the detection capability and data sources really are.
While some EDRs were not able to flag on the elementary techniques, many improved detection capabilities to the point that today, it is not uncommon for process injection to be considered OPSEC-expensive by red teams. Experienced operators tend to tailor detection bypasses per-solution, and in some environments, they choose to avoid injecting altogether, as the very limited set of APIs Windows exposes for memory and thread management are under close surveillance.
We are going to talk about bypassing the mature solutions today - for the ones with T1055 misses here just use APC injection and you'll probably be fine.
Let's first discuss all the detection opportunities for anomalous remote thread creation.
The API getting the most attention has to be kernel32!CreateRemoteThread, but we are really talking about ntdll!NtCreateThreadEx, or the kernel-mode target intercepted through kernel callbacks.
Here we have a basic detection for a specific Windows process - msbuild.exe creating a new thread in a remote process. Even though the criticality of a potential true positive would be quite high, after testing the rule author decided it is only suitable for low severity (probably due to FP-rate), which likely degrades the rule to an IR label/enrichment in most environments.
Such a simple detection rule is unlikely to be part of a mature EDR solution where customers expect to receive alerts for activities like this with high severity while keeping noise down to allow their analysts to review and classify the important stuff.
A more generic, custom MDATP thread creation rule-based around the new FileProfile() enrichment function - detects extremely rare files creating threads in remote processes. Very useful to implement in-house, but still unlikely to be found in EDRsin such a simple form, as it would cause substantial amounts of false positives in certain environments, and could prove difficult to maintain.
As an example, Defender logs most remote thread creations as labeled events, but low file prevalence is not good enough of an indicator to trigger an alert, and there is more advanced logic in play - true for most decent EDRs.
CRT events logged by Defender
By "detections" and "alerts" I do not just mean labeled activity that can be found somewhere in the platform, but rather independent pieces of logic able to signal threats with high enough fidelity to generate user-facing security incidents with no additional activity tagged on the endpoint.
(I also assume the platform is not incredibly noisy, to the level of it being unusable)
This is important to remember as EDRs use various kinds of correlation to link otherwise undetected activities to existing incidents initiated by high fidelity alerts, or generate them based on some risk score analysis often affectionately called "AI", making it difficult to judge whether some particular TTP would be detected in isolation. Some types of correlation can be very complex and difficult for adversaries to guess, but due to the high costs associated with preserving active context and using it in detection, time-based correlation plays a role in most.
On-agent detections, activity, and software inventories are often not implemented or limited in scope due to reverse engineering concerns or architecting difficulties.
We will exploit this fact later on when building our shellcode injector by introducing delays in execution as one way to avoid detection. The concept is not new and is commonly used in network attacks where IDS solutions tend to detect based on thresholds.
For the same reason choosing your EDR vendor based on the numerical results of things like the Mitre evaluation and percentage of coverage - is not a good idea. Among other issues, the test rounds are executed in an unrealistically short time window of around 30 minutes for the whole attack kill chain, which means the time correlation of labeled events from the host to a single alert is good enough to score 100% coverage.
High fidelity alerts
So we know that even though the number of functions to monitor is limited, the volume of legitimate events poses significant challenges for high fidelity detection, and forces defenders to narrow down what constitutes "suspicious", resulting in heavy filtering or log&ignore of many collected events.
For thread creation, the most common constraint is a thread starting process ≠ hosting process - so monitoring only remote thread creation, usually also limited to those with:
thread start in image "unbacked" MEM_COMMIT-type segment
the size of segment being larger than X
and on a scale this will still generate a very significant amount of false positives, which may lead to further filtering, for example:
thread location (target) only in Windows built-in executables
only a subset of these
thread initiator (source) only in risky executables
low file prevalence
risky paths (%userprofile%, %temp% etc.)
not seen on the network/on the host
memory page contains suspicious stuff
Machine learning models are often employed to attempt solving this issue, and so on - these assumptions will differ for vendors, but the idea is to tame thread creation. The less mature solutions in fact often rely on thread creation hooking/callbacks as the only source of data for injection detection.
While it is true that for the majority of injection techniques a new thread will be created in the target process at some point, how it's created is often unexpected and makes monitoring infeasible, thus relying exclusively on ntdll!NtCreateThread(Ex) hooking/thread creation callbacks nowadays is an easily exploitable design flaw.
In case of process hollowing or thread hijacking our target thread has already been created legitimately by the Windows Loader or target application locally, and thus there is nothing to detect upon. This is one of the reasons CobaltStrike execute-assembly uses SetThreadContext instead of CRT injection on the sacrificial process.
Once we have the telemetry, on a scale it's much easier to detect certain SetThreadContext anomalies, than CRT-injection, and today in many environments it generates high criticality alerts, rendering fork&run useless in stealthy offensive ops.
QueueUserAPCAsynchronous Procedure Calls provide another avenue for avoiding thread creation. An APC can be queued for an existing thread, and executed once it enters an alertable state.
In recent years userland hooking evasion is getting a lot of coverage, and Early Bird injection has popularized the use of APCs for that purpose. The idea is to queue an APC in a newly spawned, suspended process before the ntdll!LdrpInitializeProcess function had a chance to run. That way our scheduled routine is executed before the hooking DLLs are loaded into the target process.
DripLoader is an evasive shellcode loader (injector) for bypassing event-based injection detection, without necessarily suppressing event collection.
The project is aiming to highlight limitations of event-driven injection identification, and show the need for more advanced memory scanning and smarter local agent inventories in EDR.
DripLoader evades EDRs by
using the most risky APIs possible like NtAllocateVirtualMemory and NtCreateThreadEx
blending in with call arguments to create events that vendors are forced to drop or log&ignore due to volume
avoiding multi-event correlation by introducing delays
To bypass any memory allocation based logic we will only commit page granularity, or PageSizesized pages, which on Windows 10 with a modern processor is 4kB:
this constant found in the SYSTEM_INFO structure tells us the lowest possible size of a VM allocation
since most legitimate remote VM operations work on a single, or a few bytes, 4kB is by far the most prevalent allocation size (>95%), making it extremely challenging to detect on
To accomplish this we need to deal with some inconveniences
we need our shellcode in memory as a continuous byte sequence which means we cannot let kernel32!VirtualAllocEx choose the base, as it might reserve memory at an address where the other allocations will not fit
in Windows, any new VM allocation made with kernel32!VirtualAllocEx and similar is rounded up to AllocationGranularity which is another constant found in SYSTEM_INFO and is usually 64kB
for example, if we allocate 4kB of MEM_COMMIT | MEM_RESERVE memory at 0x40000000, the whole 0x40010000 (64kB) region will be unavailable for new allocations
Steps we take
pre-define a list of 64-bit base addresses and VirtualQueryEx the target process to find the first region able to fit our shellcode blob
The pages are also written to and individually reprotected with each run to avoid a large RegionSize of a target memory page in properties of logged VirtualProtectEx events. (TiEtw provides this, and hooks can too).
Creating the thread
Now that we have our shellcode in the remote process we need to initiate its execution.
To do this we will use the CreateThreadEx native API which is the ntdll target of CRT, and hence very commonly called by legitimate software. To bypass any detections we will:
create the new thread from MEM_IMAGE base address
moreover, we use a known-good module loaded by the Windows Loader, ntdll.dll
the location will be patched with a far jmp to our shellcode base at the time of thread creation
Note that we do not need to run in a MEM_IMAGE segment, as we only care about logging arguments in the TiEtw/Hook event.
If our shellcode creates a new thread (which would happen for example when using sRDI beacon.dll), the locally created thread won't be tagged on by most EDRs, but it will no longer have ntdll as it's start address which could get it detected by basic Endpoint Protection, and will get it detected by Get-InjectedThread.