1/21 Dissecting Smoke Loader | CERT Polska cert.pl/en/news/single/dissecting-smoke-loader/ Smoke Loader (also known as Dofoil) is a relatively small, modular bot that is mainly used to drop various malware families. Even though it’s designed to drop other malware, it has some pretty hefty malware-like capabilities on its own. Despite being quite old, it’s still going strong, recently being dropped from RigEK and MalSpam campaigns. In this article we’ll see how Smoke Loader unpacks itself and interacts with the C2 server. Smoke Loader first surfaced in June 2011 when it was advertiesed for sale on grabberz.com and xaker.name by a user called SmokeLdr. Smoke Loader being sold on grabberz.com What’s interesting is that Smoke Loader is sold only to Russian-language speakers . Since all functionalities are clearly described in the mentioned forum posts up to 2016 there is no point in listing them all here. 1 2 3 https://www.cert.pl/en/news/single/dissecting-smoke-loader/ https://www.cert.pl/uploads/2018/07/smoke-24.jpg https://www.cert.pl/uploads/2018/07/forum.png 2/21 The sample we’ll be analysing is d32834d4b087ead2e7a2817db67ba8ca. Diagram presenting the unpacking timeline If you’re only interested in the final payload you can take a quick glance at the diagram above and skip to the final layer. Table of contents Layer I The first thing Smoke Loader hits us with is a simple PECompact2 or UPX compression. As with many executable compressions, both are pretty easy do decompress using publicly- accessible software: PECompact being used to decompress the first layer Decompressing UPX-packed sample That wasn’t hard, let’s move on. https://www.virustotal.com/en/file/20dce650c10545ae85005b3fe159df250c4f1275edfe4439e2d5a2d0515029de/analysis/1524764893/ https://www.cert.pl/uploads/2018/07/layers.png https://www.cert.pl/uploads/2018/07/pecompact.png 3/21 Layer II Entry function, which handles the debugging check and performs some useless api calls as a disguise Debugger checks The PEB structure is checked against some debugging challenges: Lots of garbage code Almost every function is injected with pointless instructions in order to make the disassembly more complicated than it really is. https://www.cert.pl/uploads/2018/07/function_first.png 4/21 A part of RC4 function, which contains a lot of useless code RC4-encrypted imports In this stage, almost all imports and library names are encrypted with RC4 before being passed to LoadLibraryA and then to GetProcAddress. The encrypted imports are first placed on stack: https://www.cert.pl/uploads/2018/07/trash.png 5/21 Then they are decrypted using RC4 with the hardcoded key: Finally, the library name is passed to LoadLibrary and the function name to GetProcAddress: A custom import table is populated this way and used further in execution. Unpacking Finally, a new process is created and two calls to WriteProcessMemory are performed: The writes are pretty characteristic and can be easily noticed in the Cuckoo report One of them writes the MZ header and the other rest of the binary. If we concatenate these two writes we’ll get the next layer. Layer III We’re welcomed with: The exported start address https://www.cert.pl/uploads/2018/07/woops.png 6/21 Well, that’s not good. What we see is a result of several obfuscation methods and tricks, We’ll look at each one and try to understand how it works. Jump chains Almost all early-executed functions adapt a chained jumps obfuscation technique. Instead of placing the instructions in a normal, linear manner, instructions are mixed within the functions with jump instructions connecting consecutive instructions. The control flow is all over the place https://thisissecurity.stormshield.com/2018/03/20/de-obfuscating-jump-chains-with-binary-ninja/ https://www.cert.pl/uploads/2018/07/arrows.png 7/21 If we were to write a script to follow the program’s flow and graph instructions we’d probably get something like this: 8/21 https://www.cert.pl/uploads/2018/07/jumps.png Ox402992: call $45; first Ox402997: jnz short loc_40255F OudO2999: jz short loc_40255F Ox40299f: pop ebx OxdO29a0: jmp short loc_4025AC) [Ox4029ac: jmp short loc_4029A3\ Ox4029a3: sub ebx, 23957h Oxd029a9: jmp short loc_4025B80 Ox40295b0: jz short loc_40253B9 OxdO29b2: jnz short loc_4023B3 Ox4029b3: push 30h OxdO29bb: jnz short loc_40239C2 OxdO29bd: jz short loc_4029C2 OxdO29c2: pop eax OxdO29c3: jmp short loc_4025CD Ox4029cd: jmp short loc_4029C7; get PEB| : Ox4029c7: mov Ox4d029ca: jmp eax, fs: [eax]; short loc _4025D0 get PED [ox4029d0: jmp short loc_4029D8; OSMajorVersion| Oxd0230d8: cmp dword ptr [eax+0Adh], 6; OSMajorVersion Ox4d4029df: jl short Locret_40Z2A2F; probs 0x40254d Ox4029e1: mp short loc_4025EF a. lOx4d02a2f: retn; probs 0x40294dq [0x4029ef: jmp short lLoc_4029E7| Ox4029e7: mov esi, eax Ox4029e3: jmp short loc_40253F2 fox4029f2: jmp short loc_4029FA; beingDebugged| : rege : Movzx eax, byte ptr [eaxt+2]; beingDebugged) Ox4029fe: jmp short loc_402A02 Oxd02a02: inc eax Ox402a03: jmp short loc_402A14 fox402a14: jmp short loc_402A03) Ox402a09: mov ecx, 294Dh Oxd02a0e: jmp short loc_402A17 fox402a17: jmp short loc_402A1A\ Oxd02Zala: mul eck Oxd402alec: jmp short loc_402A23) 8/21 9/21 Partially deobufscated start function One can almost immediately see that a vast majority of instructions are used only to divert the natural program flow. Defeating Attempt I We tried creating an idaapi script that looks through all instruction blocks within a function and tries to concat blocks that are connected with each other via a 1:1 jump (jump from one possible address to one possible location). The author had probably thought about that and implemented jmp instructions using consecutive jnz and jz instructions. This doesn’t complicate our solution too much though. A very naive Python script implementing the mentioned approach If we run it on the start function and strip the jumps we get: A lot better! But we can actually do even better by letting IDA do most of the work for us. Attempt II The only thing we need to do in order to make IDA recognize these blocks as a valid function is to make sure that all of the jumps are marked as a definitive change of flow control. While jmp instructions are marked as such by default, the jz/jnz instructions need to by patched to jmp instructions: https://www.cert.pl/uploads/2018/07/jumps.png https://www.cert.pl/uploads/2018/07/patched_jump.png 10/21 Notice the newly-created dotted line that denotes an end of function code This trick allows IDA to recognize function bodies and even attempt to decompile them: Decompiled start function after patching all jn/jnz instructions While (as almost always) the decompilation isn’t 100% correct, it gives us a good basic idea what the function does. This function, for example, loads the PEB structure and then accessess the OSMajorVersion and BeingDebugged fields. Debugging checks In this layer, we’ve noticed 2 debugging checks, conveniently located right at the beginning of execution. While they are the same as in the previous stage the approach differs slightly. What is interesting is that the debugging checks values are used in calculating the next functions addresses: Reading the BeingDebugged field from PEB Reading the NtGlobalFlag field from PEB The code calculates the next jump address based on the values of BeingDebugged and NtGlobalFlag fields, if either one is not equal to 0 the execution jumps to a random invalid place in memory, harsh. Normally patching the binary or changing the values mid-debugging works though. Virtualization checks Binary tries to get the module handle of “sbiedll” (a library that is used in sandboxing processes in Sandboxie) using GetModuleHandleA, if it succeds and thus Sandboxie is installed on the system, the program exits. A registry key System\CurrentControlSet\Services\Disk\Enum is checked and if any of the following values are found within the string, the program exits. qemu virtio vmware vbox xen Function body encryption 11/21 A vast majority of functions are encrypted: A function that is partially encrypted After deobufscation the encryption function turns out to be pretty simple: Decompiled code decryption method It accepts an address and number of bytes in eax and ecx registers respectively and xors all bytes in that range with a hardcoded byte. What’s also interesting is that the binary tries to keep as little code unencrypted at a time as possible: Example of keeping the code encrypted We’re able to decrypt the chunks using an idaapi patching script: Simple idaapi script that xors a given region with a byte Assembly tricks This layer employs a few neat position-independent-code assembly tricks. Assembly Trick I call loc_4024A7 puts the next instructions (in this case string “kernel32”) address onto stack and jumps over the data to the code pop esi puts the string’s address into esi register cmp byte ptr [esi], 0 the pointer can be now used as a normal rdata string https://www.cert.pl/uploads/2018/07/string_call.png 12/21 Assembly Trick II Instead of executing jmp eax, eax is firstly pushed onto stack and then retn is executed. Assembly Trick III call $+5 jumps to the next instruction (as call $+5 instruction lengths is 5) but because it’s a call it also pushes the address onto stack. In this case this is used to calculate the program’s base address (0x004023AA – 0x23AA) Custom imports This stage uses a custom import table using a djb2 hash lookup. It first iterates over 4 hardcoded library names, loads each one using LdrLoadDll and stores the handle. https://www.cert.pl/uploads/2018/07/jump_return.png https://www.cert.pl/uploads/2018/07/call_next.png https://gist.github.com/lmas/664afa94f922c1e58d5c3d73aed98f3f 13/21 Next, it iterates over 4 corresponding import hashes arrays and looks for matching values. When a match is found, it grabs the functions address from the library thunk and stores it in an api table that is stored on the stack. https://www.cert.pl/uploads/2018/07/load_libraries.png 14/21 Hashes of functions to be imported https://www.cert.pl/uploads/2018/07/home_imports.png 15/21 Constructed api function table Unpacking Finally, the program uses RtlDecompressBuffer with COMPRESSION_FORMAT_LZNT1 to decompress the buffer and execute the final payload using PROPagate injection . Layer IV (final) String encryption All strings are encrypted using RC4 with a hardcoded key: 4 https://www.cert.pl/uploads/2018/07/api_table.png 16/21 Function used to get a decrypted string from a specific index in the encrypted blob Structure of encrypted strings blob In this sample, the buffer decrypts to: Decrypted strings C2 URLs C2 URLs are stored encrypted in the data section: Part of data section that contains the encrypted URLs The encrypted URL structure can be represented as: Encrypted C2 URL structure The encryption method is a simple xor routine with the byte key being derived from the dword key: Decompiled function used to decrypt C2 URLs Which can be rewritten to Python as: Output example https://www.cert.pl/uploads/2018/07/string_packet.png https://www.cert.pl/uploads/2018/07/cncs.png https://www.cert.pl/uploads/2018/07/c2_packet.png 17/21 Packet structure Decompiled function used to pack and send command packets Which can be represented as a C structure: A struct representing the structure of command packet Packet encryption is done using RC4 yet again. It’s worth nothing, however, that different keys are used for encrypting the outbound packets and decrypting the inbound ones: A part of decompiled function responsible for encrypting packets before sending them to the C2 https://www.cert.pl/uploads/2018/07/encrypt_packet.png 18/21 A part of decompiled function responsible for decrypting packets before parsing them Program routine The binary starts by obtaining a User Agent for IE version acquired by querying registry key Software\Microsoft\Internet Explorer and values svcVersion and Version. The obtained User Agent is used in later HTTP requests. Next, it tries to connect continuously to http://www.msftncsi.com/ncsi.txt until it gets a response, this way it makes sure that the machine is connected to the internet. Finallly, Smoke Loader begins its communication routine by sending a 10001 packet to the C&C. It gets a response with a list of plugins to be installed and a number of tasks to be fetched. The bot iterates over the task range and tries to get each task by sending a 10002 packet with the task number as an argument. The tasks payload is often not hosted on the C&C server but on a different host and a Location header with the real binary URL is returned instead. Upon execution of the task, a 10003 packet is sent back with arg_1 equal to task number and arg_2 equal to 1 if the task executed succesfully. https://www.cert.pl/uploads/2018/07/decrypt_packet.png 19/21 Graph representation of the communication between bot and C2 General IOCs Program dumps itself to %APPDATA%\Microsoft\Windows\[a-z]{8}\[a-z]{8}.exe Program creates a shortcut to itself in %APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup\[a-z]{8}.lnk Performs a System\CurrentControlSet\Services\Disk\Enum\0 registry query GET requests to http://www.msftncsi.com/ncsi.txt POST requests with HTTP 404 responses that include data Example request and response: https://www.cert.pl/uploads/2018/07/communication.png 20/21 Yara rule: Collected IOCs Malware configs: Hashes: References https://grabberz.com/showthread.php?t=29680 https://web.archive.org/web/20160419010008/http://xaker.name/threads/22008/ http://stopmalvertising.com/rootkits/analysis-of-smoke-loader.html http://www.hexacorn.com/blog/2017/10/26/propagate-a-new-code-injection-trick/ 1 2 3 4 https://www.cert.pl/uploads/2018/07/packet_sample.png https://grabberz.com/showthread.php?t=29680 https://web.archive.org/web/20160419010008/http://xaker.name/threads/22008/ http://stopmalvertising.com/rootkits/analysis-of-smoke-loader.html http://www.hexacorn.com/blog/2017/10/26/propagate-a-new-code-injection-trick/ 21/21 https://blog.malwarebytes.com/threat-analysis/2016/08/smoke-loader-downloader-with-a- smokescreen-still-alive/ https://blog.malwarebytes.com/threat-analysis/2016/08/smoke-loader-downloader-with-a-smokescreen-still-alive/