# Reverse Engineering a Cobalt Strike Dropper With Binary Ninja **[binary.ninja/2022/07/22/reverse-engineering-cobalt-strike.html](https://binary.ninja/2022/07/22/reverse-engineering-cobalt-strike.html)** In this blog post, I will explain how I reverse engineered a Cobalt Strike dropper and obtained its payload. The payload is a custom executable file format based on DLL. The dropper decrypts, loads, and executes the payload. Initially, I thought this must not be a PE executable at all, but I gradually realized it was. Much of the effort was spent on fixing the file so it could be loaded by Binary Ninja for further analysis. ## First Impressions [A friend of mine shared with me this sample (zip password: infected). It is an x86 PE binary](https://binary.ninja/blog/images/cobaltstrike/U7F16J72.zip) that is 284kB in size. After loading it into Binary Ninja, I saw it was not packed or encrypted by any well-known packer or protector. However, there were only dozens of functions recognized, which is quite a small number relative to its size. This suggested the sample was packed by a custom packer/encryptor. As is routine for malware analysis, I started by executing the sample in an online sandbox. In this case, I used [Triage. The sample executed fine in the sandbox and was recognized](https://tria.ge/) as `cobaltstrike .` [Then, I uploaded the sample to UnpacMe to see if it could be unpacked automatically.](https://www.unpac.me/) UnpacMe also processed the sample and recognized it as Cobalt Strike, but the unpacked artifact did not make any sense. At this point, I realized I wasn’t going to get much further without analyzing the sample with Binary Ninja to see how it worked. ## Thread and Pipe The sample seemed to be compiler-generated and not obfuscated, so I decided to mainly analyze the sample in HLIL. Viewing code in HLIL can often speed up analysis. However, for handwritten or obfuscated code, I prefer to look at the disassembly, which offers a closer view of what is happening. Binary Ninja now supports split views, so we can conveniently view HLIL and disassembly side-by-side: ----- The `main function is rather short. The first function call is part of the runtime and it is` doing some initialization which we can ignore. The next function creates a new thread within it which we will analyze later. Then it enters into a loop that calls `Sleep(10000)` indefinitely. As a note, the sample is stripped so it does not contain any function or variable names in it (except the Windows API imports). All names in the following screenshots were recovered or created during reverse engineering. The `create_thread function is also not complex. It formats a string using values derived` from `GetTickCount, probably to make it random and avoid conflict. This string is later` used as a name for a pipe. Then it creates a new thread by calling `CreateThread .` The `thread_proc pushes two arguments onto the stack, and then calls` ``` write_into_pipe . ``` ----- The `write_into_pipe creates a named pipe using the randomized string, connects to it,` and writes the buffer into it. I quickly noticed `size_of_data is huge –` `0x33400 bytes. Almost the entire sample is` made up of this huge buffer. This suggested the buffer was encrypted or compressed, and the dozens of functions that we see merely restore the code to its original content. Typically, at the end of it, execution will be handed to the decrypted/decompressed buffer. At this point, we are only seeing the data being written into the named pipe. We cannot see how it is being accessed. ## Decrypting the Buffer After browsing the code, I realized that there was a function call at the end of ``` create_thead that I had originally ignored. ``` This function first uses `malloc to allocate a buffer of the same size as the data written into` the named pipe. It then loops and reads the content of the buffer. At the end of it, it decrypts the code and executes it. ----- The decryption function first calls `VirtualAlloc to allocate a buffer and sets its` permission to `PAGE_READWRITE . Then, it XORs the content with a four-byte hard-coded` key. The key is `72432a9c, in this case. Near the end of the function, it sets the permission` of the buffer to `PAGE_EXECUTE_READ . Finally, it creates another thread, which just jumps to` its first argument. The address of the buffer is passed as the first argument. This starts execution from the beginning of the buffer. The code could, of course, have used the address of the buffer as the entry point of the thread. However, that might cause anti-virus software to detect it, so it used this small trick instead to disguise it. So, in order to analyze the code of the payload, I needed to first decrypt the buffer by XORing with the four-byte key. There are two ways to do this. The first is to select the buffer, right-click, and then click `Transform -> XOR . This is not super convenient in this` case as the input buffer is huge and selecting it with a precise size is not easy. The second way is to use the Python API, which is what I did: ``` data = bv.read(0x403014, 0x33400) xor = Transform['XOR'] output = xor.encode(data, {'key': b'\x72\x43\x2a\x9c'}) bv.write(0x403014, output) ``` Before I discuss analyzing the code in this buffer, there was a function that I initially did not quite understand. See the name I give it – `preparation ? I guessed it was doing some` final preparation before executing the buffer. The HLIL for the function was also not very easy to read. However, after switching to disassembly and reading the instructions one by one, there came an “A-ha!” moment. ----- This function first tests whether two signed DWORDs are positive. If both of them are larger than 0, they are treated as offsets into the buffer. The code takes the address of functions ``` GetModuleHandleA and GetProcessAddress and writes their addresses at the given ``` offsets. In other words, it does the following: ``` *(uint32_t)(buffer + 0x7c71) = GetModuleHandleA; *(uint32_t)(buffer + 0x7c78) = GetProcessAddress; ``` Why would the code write the address of these two functions into the middle of the buffer? Well, it is passing the function pointer into the code so that it can be used by it. This is a clever trick because the author does not have to use other (more complex) techniques to ----- obtain these values while maintaining a low footprint in AV s eye. Viewing the original content at those offsets confirms my guess: The original value at the two offsets is `0x41414141 and` `0x42424242, which are` obviously placeholder values. We can fix the values by writing the actual address of the two functions here. This can be done by hand, or using the following Python code: ``` addr = bv.get_symbols_by_name('GetModuleHandleA')[0].address bv.write(0x403014 + 0x7c71, struct.pack('