THE ART OF MALWARE C2 SCANNING - HOW TO REVERSE AND EMULATE PROTOCOL OBFUSCATED BY COMPILER TAKAHIRO HARUYAMA BINARLY 1 WHO AM I? • Takahiro Haruyama (@cci_forensics) • Principal Security Researcher at Binarly • Previously Staff Threat Researcher at Carbon Black TAU • Past Research • Scalable RE automation (e.g., hunting vulnerable drivers) • Anti-Forensics (e.g., firmware acquisition MitM attack) • Malware Analysis (e.g., Internet-wide C2 scanning) 2 https://twitter.com/cci_forensics https://speakerdeck.com/takahiro_haruyama AGENDA BACKGROUND PEELING HODUR: DEFEATING COMPILER-LEVEL OBFUSCATIONS HODUR PROTOCOL REVERSING HODUR PROTOCOL EMULATION WRAP-UP 3 BACKGROUND 4 WHY MALWARE C2 SCANNING? 5 • IP reputation is not effective for catching fresh C2s • Internet-wide C2 scanning is beneficial from both detection and threat intel perspectives HOW MALWARE C2 SCANNING? Protocol reversing • Identify • Data format • Encoding/encryption algorithm Protocol emulation • Develop PoC scanner • Validate request/response with fake/real C2 6 CASE: PLUGX • Long used, but still many variants in the wild • Most variants has almost the same C2 protocol except the packet encoding algorithm • The “Hodur” variants (aka MiniPlug) were obfuscated with multiple methods likely applied at compile time • EclecticIQ and Check Point reported the latest variants last year, but no one had described the updated C2 protocol details • I focus on the Hodur de-obfuscations, then explain the protocol reversing and emulation briefly 7 https://www.welivesecurity.com/2022/03/23/mustang-panda-hodur-old-tricks-new-korplug-variant/ https://jsac.jpcert.or.jp/archive/2023/pdf/JSAC2023_2_LT4.pdf https://blog.eclecticiq.com/mustang-panda-apt-group-uses-european-commission-themed-lure-to-deliver-plugx-malware https://research.checkpoint.com/2023/chinese-threat-actors-targeting-europe-in-smugx-campaign/ PEELING HODUR: DEFEATING COMPILER-LEVEL OBFUSCATIONS 8 CONTROL FLOW FLATTENING DEFEATING COMPILER-LEVEL OBFUSCATIONS 9 WHAT’S CONTROL FLOW FLATTENING? • Control flow flattening (CFF) transforms a program's control flow to make it much harder to understand, while preserving the original functionality 10 http://tigress.cs.arizona.edu/transformPage/docs/flatten/index.html First Block(s) Control Flow Dispatcher(s) Flattened Blocks HOW CFF WORKS • Control flow dispatchers decide which block to execute next based on a state variable • The state variable is updated in first/flattened blocks 11 CONTROL FLOW UNFLATTENING: BASIC STRATEGY 1. Identify control flow dispatchers and state variables 2. Trace back the state variable values from the end of flattened blocks 3. Associate the values with the block IDs 4. Re-order the code flow based on the associations • I Use IDA Pro microcode for the unflattening task • Intermediate representation used by Hex-Rays decompiler • We can implement the algorithm in the optblock_t callback 12 https://i.blackhat.com/us-18/Thu-August-9/us-18-Guilfanov-Decompiler-Internals-Microcode-wp.pdf CONTROL FLOW UNFLATTENING: BASIC STRATEGY 1. Identify control flow dispatchers and state variables 2. Track back the state variable values from the end of flattened blocks 3. Associate the values with the block IDs 4. Re-order the code flow based on the associations • I Use IDA Pro microcode for the unflattening task • Intermediate representation used by Hex-Rays decompiler • We can implement the algorithm in the optblock_t callback 13 https://i.blackhat.com/us-18/Thu-August-9/us-18-Guilfanov-Decompiler-Internals-Microcode-wp.pdf CONTROL FLOW UNFLATTENING: IDA MICROCODE TOOL HISTORY • HexRaysDeob (2018) • The first implementation breaking CFF • Ported to IDAPython by Hex-Rays (2019) • Tested on only one binary, so some versions implemented • APT10 ANEL (2019), Emotet (2022) • D-810 (2020) • Effective for not only OLLVM but also Tigress Flatten • Works reliably with different binaries 14 https://github.com/RolfRolles/HexRaysDeob https://github.com/idapython/pyhexraysdeob https://blogs.vmware.com/security/2019/02/defeating-compiler-level-obfuscations-used-in-apt10-malware.html https://news.sophos.com/en-us/2022/05/04/attacking-emotets-control-flow-flattening/ https://gitlab.com/eshard/d810 https://github.com/obfuscator-llvm/obfuscator/wiki https://tigress.wtf/flatten.html D-810 ISSUES • D-810 worked for the most functions of the Hodur samples, but some key functions related to the C2 protocol were still flattened • Additional CFF settings? • Two issues 1. The control flow dispatcher detections failed 2. The block state variable tracking failed 15 ISSUE1: CONTROL FLOW DISPATCHER DETECTION FAILURE • The dispatcher detection algorithm misses dispatchers whose predecessors are conditional jumps by the state variable • The genmc plugin was useful for troubleshooting 16 dispatcher predecessor https://github.com/patois/genmc ISSUE1: FIX • I added another dispatcher detection algorithm • The algorithm simply guesses a dispatcher block based on the biggest number of predecessors • The dispatcher will be validated based on the entropy value of the state variable (only effective for OLLVM) 17 ISSUE1: FIX • I added another dispatcher detection algorithm • The algorithm simply guesses a dispatcher block based on the biggest number of predecessors • The dispatcher will be validated based on the entropy value of the state variable (only effective for OLLVM) 18 ISSUE2: BLOCK STATE VARIABLE TRACKING FAILURE • The state variable tracking fails if the value is assigned in the first blocks • D-810 only traces in the flattened blocks and doesn’t recognize the dispatcher has been reached -> loop L 19 Tracking fails The value is assigned D810.emulator - WARNING - Can't evaluate instruction: ..Variable '%var_depend_on_a10_1.4{24}' is not defined D810.tracker - DEBUG - Computing: ['ebx.4'] for path [8, 22, 44, 45, 46, 47, 48, 49, 50, 8, 9, 35, 36, 109, 110, 111, 112] ISSUE2: FIX • The added code detects dispatchers in tracking and resumes the tracking from the end of the first blocks • The unflattening performance is also improved 20 ISSUE2: FIX • The added code detects dispatchers in tracking and resumes the tracking from the end of the first blocks • The unflattening performance is also improved 21 MIXED BOOLEAN ARITHMETIC EXPRESSIONS DEFEATING COMPILER-LEVEL OBFUSCATIONS 22 • Mixed Boolean Arithmetic (MBA) expressions transform a simple expression into a complex but semantically equivalent form 23 The same encoded string is decoded in different expressions The same encoded string is decoded in different expressions The same encoded string is decoded in different expressions SIMPLIFYING MBA EXPRESSIONS 1. Find an obfuscation pattern and hypothesize for simplification 2. Validate the hypothesis by equivalence checking • e.g., using Z3 or Arybo 3. Replace the pattern with the simplified one 24 $ iarybo 8 In [1]: ~(x ^ ~y) == x ^ y Out[1]: True $ ipython In [1]: import z3 In [2]: x, y = z3.BitVecs("x y", 8) In [3]: s = z3.SolverFor("QF_BV") In [4]: s.add((~(x ^ ~y)) != (x ^ y)) In [5]: s.check() Out[5]: unsat https://github.com/Z3Prover/z3 https://github.com/quarkslab/arybo SIMPLIFICATION ON IDA + D-810 • D-810 uses a custom AstNode class to represent an (abstract) microcode instruction • I could easily define several new replacement patterns • genmc is useful to show microcode instruction structures 25 https://github.com/patois/genmc SIMPLIFICATION ON IDA + D-810 • D-810 uses a custom AstNode class to represent an (abstract) microcode instruction • I could easily define several new replacement patterns • genmc is useful to show microcode instruction structures 26 https://github.com/patois/genmc LIMITATION • More functions, more complicated patterns L • It was difficult to defeat all MBA expressions perfectly • I only handled interesting patterns, especially related to the string decoding used by the samples 27 POLYMORPHIC STACK STRINGS DEFEATING COMPILER-LEVEL OBFUSCATIONS 28 STACK STRINGS 29 • All strings are constructed and decoded in the stack area • After defeating CFF and MBA expressions, the decoding algorithm was identified • enc[i] ^= (i + Const) ^ Const • The constant value is different per function COPYING THE ENCODED STRING BYTES INTO STACK • Sometimes the Hex-Rays decompiler partially recognizes the copy or only shows the assignments • For static decoding, we need to • Construct the bytes from the assigned variables • Detect the length and constant value used in the decoding algorithm 30 Length and constant value Length and constant value Combination of global variable and hard-coded bytes VARIOUS ACCESS PATTERNS 31 Referencing another variable (enc is decoded) Defeating MBA expressions is not perfect I decided to take an emulation approach Additional XORs before decoding EMULATION ISSUE IN GENERAL • Unicorn-based flare-emu library provides users with a flexible interface for scripting emulation tasks on IDA • The iterateAllPaths API emulates all basic block paths in a function • Looked to be useful to de-obfuscate stack strings (e.g., ironstrings) • This API emulates only once per basic block • I modified the code to reproduce xor loops detected by CAPA 32 https://github.com/mandiant/flare-emu https://github.com/unicorn-engine/unicorn https://github.com/mandiant/flare-emu https://github.com/mandiant/flare-ida/tree/master/python/flare/ironstrings https://github.com/mandiant/capa EMULATION ISSUE IN THIS SAMPLE • The flare-emu API takes only one path in CFF functions • The code simply tracks basic block successors • The search ends when revisiting the CFF dispatchers • Microcode-based solutions • Emulate x86 code in an unflattened microcode block order • Extend D-810 microcode emulation functionality • I tried both a little bit, but I realized that they are not straightforward L 33 SOLUTION • I utilized another flare-emu API (emulateRange) that emulates the code as is, without changing the code flow • Some quick hacks added to flare-emu (e.g., LoadLibrary/GetProcAddress hook, infinite loop detection, etc.) • The created script worked for 58% of the tested functions • I also implemented a script based on the IDA debug hook class (DBG_Hooks) to handle the failed functions • Not elegant, but the combination covers most strings quickly 34 SOLUTION (CONT.) • Both scripts recover argument strings on call instructions in emulation/debugging • The information such as calling convention and argument type is taken through the Hex-Rays decompiler APIs • The sample dynamically resolves all API addresses except GetProcAddress after decoding the API name strings • When an address assignment is detected, the script applies the API function type to the local variable pointer • GetTypeSignature() written by Rolf Rolles 35 https://github.com/RolfRolles/Miscellaneous/blob/d0e6c9a1fccb34bcefed19929a44540693a46f43/PrintTypeSignature.py 36 Set type to the local variable by ida_hexrays.modify_user_lvars() Set type to the operand of the call instruction by ida_nalt.set_op_tinfo() SOLUTION (CONT.) • The scripts still don’t cover all strings • A semi-automatic script handles minor cases individually • flare-emu emulateSelection + static decoding 37 IDA_CALLSTRINGS SCRIPTS Used Library and API Static decoding Flare-emu iterateAllPaths Flare-emu emulateRange Flare-emu emulateSelection IDA DBG_Hooks Automated? Yes Yes Yes No Yes Effective for another malware? No Yes Yes No Yes Effective in CFF funcs? Yes No Yes - Yes API func type set? No Yes Yes No Yes Limitation Strings used by memcpy Modifications needed to flare-emu and CAPA All execution paths not covered Manual selection required Strings used during debugging 38 HODUR PROTOCOL REVERSING 39 PROTOCOL OVERVIEW • The latest Hodur samples only support HTTP/HTTPS • Two header values (Sec-Dest/Sec-Site) used to authenticate clients • GET request for the initial handshake • A RC4 key returned • Periodical POST requests to receive C2 commands after the handshake • The request/response data are encrypted with the key 40 AUTHENTICATION HEADERS • Sec-Dest: %2.2X%ws (e.g., “7BnqmmCg”) • A random byte (0x64-0x99) • 0x64 + 0-0x35 by QueryPerformanceCounter • A random 6 characters • The checksum depends on the method • GET = 99, POST = 88 • Sec-Site: %2.2X%2.2X%ws (e.g., “896B2AC144C9E2E09836”) • Two random bytes (0x64-0x99) • 8-bytes victim ID generated by time-related APIs 41 In [2]: sum(b for b in b'nqmmCg') & 0xff Out[2]: 99 INITIAL HANDSHAKE • GET request with the authentication headers • A RC4 key is returned if the header values are valid • If not valid, no content returned • The Hodur sample code checks if the Content-Type is application/octet-stream • The Content-Length was unknown at static analysis but revealed during the scanner development 42 AFTER HANDSHAKE • The sample receives a C2 command by POST requests • The POST request and response data are encrypted using RC4 • The POST data header is the same as the PlugX variants, but the head key is not used • The C2 response body also has the same header 43 POST DATA PAYLOAD 44 HODUR SCANNER DEVELOPMENT 45 FAKE C2 SERVER FOR VALIDATION • Developed a fake C2 server to validate the request data of the PoC scanner and other recent samples • fakenet (IP diverter) + Python HTTPS server 46 [*] Validating Sec-Dest.. [+] Prefix number 0x95 is valid [+] The hash of the random bytes b'xbsYpB' matches 88 [*] Validating Sec-Site.. [+] Prefix numbers 0x7f/0x8e is valid [+] victim_id='F4EB6EF3A8882016’ .. [+] The decrypted POST data is saved as dec_post_data.bin [*] Responding with PlugX custom header data.. (C2 command = 0x7002) POST request validation https://github.com/mandiant/flare-fakenet-ng HUNTING RECENT SAMPLES • VT-retrohunted using yara_fn 47 { 55 8B EC 6A ?? 68 ?? ?? ?? ?? 64 A1 ?? ?? ?? ?? 50 81 EC ?? ?? ?? ?? 53 56 57 A1 ?? ?? ?? ?? 33 C5 50 8D 45 ?? 64 A3 ?? ?? ?? ?? 89 65 ?? 8B 45 ?? 50 8D 8D ?? ?? ?? ?? E8 } o_imm fixup o_mem o_displ o_near https://github.com/TakahiroHaruyama/ida_haru/blob/master/fn_fuzzy/yara_fn_7x.py HUNTING RECENT SAMPLES (CONT.) • One of the rules hit the latest sample in Dec last year • CFF was not applied to the sample • The C2 included in the sample was active J • I could check the Content-Length and the format of the GET response 48 https://www.virustotal.com/gui/file/510b4c53dc6f5260d15824a97bff2f5def3f01c24cb621058177df7a22faaaf7/detection APPROACH BASED ON VALIDATION • All recent samples had exactly the same C2 protocol encryption and data format • Every sample’s C2 protocol/port is HTTPS/443 • No need to send the POST request after handshake • The C2 likely responded without content until commands are specified by operators • I started to implement a scanner just checking the difference between GET requests with/without the authentication headers 49 TLS HANDSHAKE ISSUE • OpenSSL caused an internal error during the TLS handshake 50 * TLSv1.0 (OUT), TLS header, Certificate Status (22): * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS header, Unknown (21): * TLSv1.2 (OUT), TLS alert, internal error (592): * error:0800006A:elliptic curve routines::point at infinity * Closing connection 0 curl: (35) error:0800006A:elliptic curve routines::point at infinity TLS HANDSHAKE ISSUE (CONT.) • I tested major open source TLS clients • Only LibreSSL (pylibtls) worked for the TLS handshake 51 OpenSSL Mbed TLS (python-mbedtls) wolfSSL (wolfssl-py) LibreSSL (pylibtls) Tested version 1.1.1k, 3.0.2, 3.2.0 2.28.6 5.6.0 3.8.2 Worked? No No No Yes DETECTION BY THIRD PARTY SCANS • Shodan haven't been able to recognize the port since at least last Dec • Censys can detect the port but the protocol is UNKNOWN (not HTTPS) 52 INTERNET-WIDE SCANNING WORKFLOW • Automate with Python (Use asynchronous I/O for OpenSSL/JARM scans) • Exclude as much as possible before the pylibtls scan ZMap • Get the list of hosts open at TCP/443 OpenSSL • Try TLS handshake • Cause an internal error? JARM • Match the JARM fingerprint value of the Hodur C2? pylibtls • GET request with/without auth headers • Get a RC4 key-like string only when sending with the headers? 53 https://github.com/salesforce/jarm RESULT • Two C2 servers were found late last December • 149[.]104.12.64 and 45[.]83.236.105 • Two months later, Trendmicro referred to the C2s in the blog • But they are still active 54 https://www.trendmicro.com/en_us/research/24/b/earth-preta-campaign-targets-asia-doplugs.html DEMO 55 55 DEMO Authentication headers generated. checksum='ygflkF', victim_id='70FA7450D3323310' 45.83.236.105: OpenSSL internal error. Calculating JARM.. 149.104.12.64: OpenSSL internal error. Calculating JARM.. 45.83.236.105: The JARM value matched with Hodur C2 45.83.236.105:443: No content when sending a query without auth headers 45.83.236.105:443: RCY key "add4a9424879F0bd6eaal094779F889eb" returned when sending a query with auth headers 45.83.236.105, active, RC4 key = a44a9424879F0bd6eaal094779F889eb 149.104.12.64: The JARM value matched with Hodur C2 149.104.12.64:443: No content when sending a query without auth headers 149.104.12.64:443: RC4 key "adda9424879FObd6eaal094779F889eb" returned when sending a query with auth headers 149.104.12.64, active, RC4 key = a44a9424879F0bd6eaal094779F889eb new servers found (2 in total) WRAP-UP 56 v TENNIS. . yh if ee £2 SSN: 3 aN NY i eee ESSSTS WRAP-UP • Defeating compiler-level obfuscations is easier than before • 2-3 months for APT10 ANEL -> 3-4 weeks for Hodur • We still need to improve or create tools when RE requires de-obfuscating code precisely • Code will be available online after the conference • The developed scanner keeps tracking the malware C2s on the Internet • We can respond proactively using the intel 57