#### 1 ## THE ART OF MALWARE C2 SCANNING - HOW TO REVERSE AND EMULATE PROTOCOL OBFUSCATED BY COMPILER ## THE ART OF MALWARE C2 SCANNING - HOW TO REVERSE AND EMULATE PROTOCOL OBFUSCATED BY COMPILER #### TAKAHIRO HARUYAMA BINARLY ----- ## WHO AM I? #### • Takahiro Haruyama (@cci_forensics) ###### • Principal Security Researcher at Binarly • Previously Staff Threat Researcher at Carbon Black TAU #### • Past Research ###### • Scalable RE automation (e.g., hunting vulnerable drivers) • Anti-Forensics (e.g., firmware acquisition MitM attack) • Malware Analysis (e.g., Internet-wide C2 scanning) #### 2 ----- ## AGENDA ##### BACKGROUND PEELING HODUR: DEFEATING COMPILER-LEVEL OBFUSCATIONS HODUR PROTOCOL REVERSING HODUR PROTOCOL EMULATION WRAP-UP #### 3 ----- #### 4 # BACKGROUND ----- ## WHY MALWARE C2 SCANNING? #### • IP reputation is not effective for catching fresh C2s • Internet-wide C2 scanning is beneficial from both detection and threat intel perspectives #### 5 ----- ## HOW MALWARE C2 SCANNING? #### 6 ----- ## CASE: PLUGX #### • Long used, but still many variants in the wild ###### • Most variants has almost the same C2 protocol except the packet encoding algorithm #### • The “Hodur” variants (aka MiniPlug) were obfuscated with multiple methods likely applied at compile time ###### • EclecticIQ and Check Point reported the latest variants last year, but no one had described the updated C2 protocol details #### • I focus on the Hodur de-obfuscations, then explain the protocol reversing and emulation briefly #### 7 ----- #### 8 # PEELING HODUR: DEFEATING COMPILER-LEVEL OBFUSCATIONS ----- #### 9 # CONTROL FLOW FLATTENING ###### DEFEATING COMPILER-LEVEL OBFUSCATIONS ----- ## WHAT’S CONTROL FLOW FLATTENING? #### • Control flow flattening (CFF) transforms a program's control flow to make it much harder to understand, while preserving the original functionality #### 10 |Control Flo Dispatcher(|w s)| |---|---| |Flatten Block|ed s| |---|---| ###### First Block(s) ###### Flattened Blocks ----- ## HOW CFF WORKS #### • Control flow dispatchers decide which block to execute next based on a state variable • The state variable is updated in first/flattened blocks #### 11 ----- ## CONTROL FLOW UNFLATTENING: BASIC STRATEGY #### 1. Identify control flow dispatchers and state variables 2. Trace back the state variable values from the end of flattened blocks 3. Associate the values with the block IDs 4. Re-order the code flow based on the associations • I Use IDA Pro microcode for the unflattening task ###### • Intermediate representation used by Hex-Rays decompiler • We can implement the algorithm in the optblock_t callback #### 12 ----- ## CONTROL FLOW UNFLATTENING: BASIC STRATEGY #### 1. Identify control flow dispatchers and state variables 2. Track back the state variable values from the end of flattened blocks 3. Associate the values with the block IDs 4. Re-order the code flow based on the associations • I Use IDA Pro microcode for the unflattening task ###### • Intermediate representation used by Hex-Rays decompiler • We can implement the algorithm in the optblock_t callback #### 13 #### Identify control flow dispatchers and state variables Track back the state variable values from the end of flattened blocks Associate the values with the block IDs Re-order the code flow based on the associations I Use IDA Pro microcode ----- ## CONTROL FLOW UNFLATTENING: IDA MICROCODE TOOL HISTORY #### • HexRaysDeob (2018) ###### • The first implementation breaking CFF • Ported to IDAPython by Hex-Rays (2019) • Tested on only one binary, so some versions implemented • APT10 ANEL (2019), Emotet (2022) #### • D-810 (2020) ###### • Effective for not only OLLVM but also Tigress Flatten • Works reliably with different binaries #### 14 ----- ## D-810 ISSUES #### • D-810 worked for the most functions of the Hodur samples, but some key functions related to the C2 protocol were still flattened ###### • Additional CFF settings? #### • Two issues ###### 1. The control flow dispatcher detections failed 2. The block state variable tracking failed #### 15 ----- ## ISSUE1: CONTROL FLOW DISPATCHER DETECTION FAILURE #### 16 |• The dispatcher detection algorithm misses dispatchers whose predecessors are conditional jump by the state variable • The genmc plugin was useful for troubleshooting|s| |---|---| ###### • ###### • The dispatcher detection algorithm misses dispatchers whose predecessors are conditional jumps by the state variable • The genmc plugin was useful for troubleshooting ----- ## ISSUE1: FIX #### • I added another dispatcher detection algorithm ###### • The algorithm simply guesses a dispatcher block based on the biggest number of predecessors • The dispatcher will be validated based on the entropy value of the state variable (only effective for OLLVM) #### 17 ----- ## ISSUE1: FIX #### 18 #### I added another dispatcher detection algorithm ###### The algorithm simply guesses a dispatcher block based on The dispatcher will be validated based on the entropy value of the state variable (only effective for OLLVM) ----- ## ISSUE2: BLOCK STATE VARIABLE TRACKING FAILURE #### • The state variable tracking fails if the value is assigned in the first blocks ###### • D-810 only traces in the flattened blocks and doesn’t recognize the dispatcher has been reached -> loop L #### 19 ###### The value is assigned ###### D810 l t WARNING C 't l t i t ti V i bl '% d d 10 1 4{24}' i t d fi d ###### The value is assigned Tracking fails ----- ## ISSUE2: FIX #### • The added code detects dispatchers in tracking and resumes the tracking from the end of the first blocks ###### • The unflattening performance is also improved #### 20 ----- ## ISSUE2: FIX #### 21 #### The added code detects dispatchers in tracking and resumes the tracking from the end of the first blocks ###### The unflattening performance is also improved ----- #### 22 # MIXED BOOLEAN ARITHMETIC EXPRESSIONS ###### DEFEATING COMPILER-LEVEL OBFUSCATIONS ----- #### 23 #### • Mixed Boolean Arithmetic (MBA) expressions transform a simple expression into a complex but semantically equivalent form ###### The same encoded string The same encoded string is decoded in different is decoded in different expressionsexpressions ----- ## SIMPLIFYING MBA EXPRESSIONS ###### $ ipython #### 1. Find an obfuscation pattern ###### In [1]: import z3 #### and hypothesize for simplification ###### In [2]: x, y = z3.BitVecs("x y", 8) #### 2. Validate the hypothesis by equivalence checking In [3]: s = z3.SolverFor("QF_BV") #### 24 ###### • e.g., using Z3 or Arybo #### 3. Replace the pattern with the simplified one ###### $ ipython In [1]: import z3 In [2]: x, y = z3.BitVecs("x y", 8) In [3]: s = z3.SolverFor("QF_BV") In [4]: s.add((~(x ^ ~y)) != (x ^ y)) In [5]: s.check() Out[5]: unsat ----- ## SIMPLIFICATION ON IDA + D-810 #### • D-810 uses a custom AstNode class to represent an (abstract) microcode instruction ###### • I could easily define several new replacement patterns • genmc is useful to show microcode instruction structures #### 25 ----- ## SIMPLIFICATION ON IDA + D-810 #### D-810 uses a custom AstNode class to represent an (abstract) microcode instruction ###### I could easily define several new replacement patterns genmc is useful to show microcode instruction structures ----- ## LIMITATION #### • More functions, more complicated patterns L ###### • It was difficult to defeat all MBA expressions perfectly #### • I only handled interesting patterns, especially related to the string decoding used by the samples #### 27 ----- #### 28 # POLYMORPHIC STACK STRINGS ###### DEFEATING COMPILER-LEVEL OBFUSCATIONS ----- ## STACK STRINGS ##### • All strings are constructed and decoded in the stack area • After defeating CFF and MBA expressions, the decoding algorithm was identified ###### • enc[i] ^= (i + Const) ^ Const • The constant value is different per function #### 29 ----- ## COPYING THE ENCODED STRING BYTES INTO STACK ###### • Sometimes the Hex-Rays decompiler partially recognizes the copy or only shows the assignments • For static decoding, we need to • Construct the bytes from the assigned variables • Detect the length and constant value used in the decoding algorithm #### 30 |Combination of global variable and hard-coded bytes|Col2| |---|---| ###### Length and ----- ## VARIOUS ACCESS PATTERNS #### 31 |Referencing another varia (enc is decod|ble ed)| |---|---| ###### Additional XORs before decoding ----- ## EMULATION ISSUE IN GENERAL ###### • Unicorn-based flare-emu library provides users with a flexible interface for scripting emulation tasks on IDA • The iterateAllPaths API emulates all basic block paths in a function • Looked to be useful to de-obfuscate stack strings (e.g., ironstrings) • This API emulates only once per basic block • I modified the code to reproduce xor loops detected by CAPA #### 32 ----- ## EMULATION ISSUE IN THIS SAMPLE #### • The flare-emu API takes only one path in CFF functions ###### • The code simply tracks basic block successors • The search ends when revisiting the CFF dispatchers #### • Microcode-based solutions ###### • Emulate x86 code in an unflattened microcode block order • Extend D-810 microcode emulation functionality #### • I tried both a little bit, but I realized that they are not straightforward L #### 33 ----- ## SOLUTION ##### • I utilized another flare-emu API (emulateRange) that emulates the code as is, without changing the code flow ###### • Some quick hacks added to flare-emu (e.g., LoadLibrary/GetProcAddress hook, infinite loop detection, etc.) • The created script worked for 58% of the tested functions ##### • I also implemented a script based on the IDA debug hook class (DBG_Hooks) to handle the failed functions • Not elegant, but the combination covers most strings quickly #### 34 ----- ## SOLUTION (CONT.) ##### • Both scripts recover argument strings on call instructions in emulation/debugging ###### • The information such as calling convention and argument type is taken through the Hex-Rays decompiler APIs ##### • The sample dynamically resolves all API addresses except GetProcAddress after decoding the API name strings ###### • When an address assignment is detected, the script applies the API function type to the local variable pointer • GetTypeSignature() written by Rolf Rolles #### 35 ----- #### 36 |Set type to the operand of the call inst by ida_nalt.set_op_tinfo()|ruction|Col3| |---|---|---| ###### Set type to the operand of the call instruction by ida_nalt.set_op_tinfo() ----- ## SOLUTION (CONT.) #### 37 #### • The scripts still don’t cover all strings • A semi-automatic script handles minor cases individually ###### • flare-emu emulateSelection + static decoding ----- ## IDA_CALLSTRINGS SCRIPTS #### 38 |Used Library and API|Static decoding|Flare-emu iterateAllPaths|Flare-emu emulateRange|Flare-emu emulateSelection|IDA DBG_Hooks| |---|---|---|---|---|---| |Automated?|Yes|Yes|Yes|No|Yes| |Effective for another malware?|No|Yes|Yes|No|Yes| |Effective in CFF funcs?|Yes|No|Yes|-|Yes| |API func type set?|No|Yes|Yes|No|Yes| ###### Limitation Strings used by memcpy ###### Modifications needed to flare-emu and ###### All execution paths not covered ###### Manual selection required ###### Strings used during debugging ----- #### 39 # HODUR PROTOCOL REVERSING ----- ## PROTOCOL OVERVIEW #### • The latest Hodur samples only support HTTP/HTTPS • Two header values (Sec-Dest/Sec-Site) used to authenticate clients • GET request for the initial handshake ###### • A RC4 key returned #### • Periodical POST requests to receive C2 commands after the handshake ###### • The request/response data are encrypted with the key #### 40 ----- ## AUTHENTICATION HEADERS ###### • Sec-Dest: %2.2X%ws (e.g., “7BnqmmCg”) • A random byte (0x64-0x99) • 0x64 + 0-0x35 by QueryPerformanceCounter • A random 6 characters • The checksum depends on the method In [2]: sum(b for b in b'nqmmCg') & 0xff Out[2]: 99 • GET = 99, POST = 88 • Sec-Site: %2.2X%2.2X%ws (e.g., “896B2AC144C9E2E09836”) • Two random bytes (0x64-0x99) • 8-bytes victim ID generated by time-related APIs #### 41 ###### In [2]: sum(b for b in b'nqmmCg') & 0xff Out[2]: 99 ----- ## INITIAL HANDSHAKE #### • GET request with the authentication headers • A RC4 key is returned if the header values are valid ###### • If not valid, no content returned • The Hodur sample code checks if the Content-Type is application/octet-stream • The Content-Length was unknown at static analysis but revealed during the scanner development #### 42 ----- ## AFTER HANDSHAKE #### • The sample receives a C2 command by POST requests • The POST request and response data are encrypted using RC4 ###### • The POST data header is the same as the PlugX variants, but the head key is not used • The C2 response body also has the same header #### 43 ----- ## POST DATA PAYLOAD #### 44 ----- #### 45 # HODUR SCANNER DEVELOPMENT ----- ## FAKE C2 SERVER FOR VALIDATION #### • Developed a fake C2 server to validate the request data of the PoC scanner and other recent samples ###### • fakenet (IP diverter) + Python HTTPS server #### 46 ###### [*] Validating Sec-Dest.. [+] Prefix number 0x95 is valid [+] The hash of the random bytes b'xbsYpB' matches 88 [*] Validating Sec-Site.. [+] Prefix numbers 0x7f/0x8e is valid [+] victim_id='F4EB6EF3A8882016’ .. [+] The decrypted POST data is saved as dec_post_data.bin [*] Responding with PlugX custom header data.. (C2 command = 0x7002) ----- ## HUNTING RECENT SAMPLES ###### o_imm #### • VT-retrohunted ###### fixup #### using yara_fn #### 47 ###### o_imm fixup o_mem o_displ o_near ###### { 55 8B EC 6A ?? 68 ?? ?? ?? ?? 64 A1 ?? ?? ?? ?? 50 81 EC ?? ?? ?? ?? 53 56 57 A1 ?? ?? ?? ?? 33 C5 50 8D 45 ?? 64 A3 ?? ?? ?? ?? 89 65 ?? ----- ## HUNTING RECENT SAMPLES (CONT.) #### • One of the rules hit the latest sample in Dec last year ###### • CFF was not applied to the sample #### • The C2 included in the sample was active J ###### • I could check the Content-Length and the format of the GET response #### 48 ----- ## APPROACH BASED ON VALIDATION #### • All recent samples had exactly the same C2 protocol encryption and data format ###### • Every sample’s C2 protocol/port is HTTPS/443 #### • No need to send the POST request after handshake ###### • The C2 likely responded without content until commands are specified by operators #### • I started to implement a scanner just checking the difference between GET requests with/without the authentication headers #### 49 ----- ## TLS HANDSHAKE ISSUE #### • OpenSSL caused an internal error during the TLS handshake #### 50 ###### * TLSv1.0 (OUT), TLS header, Certificate Status (22): * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS header, Unknown (21): * TLSv1.2 (OUT), TLS alert, internal error (592): * error:0800006A:elliptic curve routines::point at infinity * Closing connection 0 curl: (35) error:0800006A:elliptic curve routines::point at infinity ###### * TLSv1.0 (OUT), TLS header, Certificate Status (22): * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS header, Certificate Status (22): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS header, Unknown (21): * TLSv1.2 (OUT), TLS alert, internal error (592): * error:0800006A:elliptic curve routines::point at infinity * Closing connection 0 curl: (35) error:0800006A:elliptic curve routines::point at infinity ----- ## TLS HANDSHAKE ISSUE (CONT.) #### • I tested major open source TLS clients ###### • Only LibreSSL (pylibtls) worked for the TLS handshake #### 51 |Col1|OpenSSL|Mbed TLS (python-mbedtls)|wolfSSL (wolfssl-py)|LibreSSL (pylibtls)| |---|---|---|---|---| |Tested version|1.1.1k, 3.0.2, 3.2.0|2.28.6|5.6.0|3.8.2| |Worked?|No|No|No|Yes| ----- ## DETECTION BY THIRD PARTY SCANS ###### • Shodan haven't been able to recognize the port since at least last Dec • Censys can detect the port but the protocol is UNKNOWN (not HTTPS) #### 52 ----- #### 53 ### INTERNET-WIDE SCANNING WORKFLOW ###### • Automate with Python (Use asynchronous I/O for OpenSSL/JARM scans) • Exclude as much as possible before the pylibtls scan ###### • Get the list of hosts open at TCP/443 ###### • Try TLS handshake • Cause an internal error? ###### • Match the JARM fingerprint value of the Hodur C2? ###### • GET request with/without auth headers • Get a RC4 key-like string only when sending with the headers? ----- ## RESULT #### • Two C2 servers were found late last December ###### • 149[.]104.12.64 and 45[.]83.236.105 #### • Two months later, Trendmicro referred to the C2s in the blog • But they are still active #### 54 ----- ## DEMO #### 55 ----- #### 56 # WRAP-UP ----- ## WRAP-UP #### • Defeating compiler-level obfuscations is easier than before ###### • 2-3 months for APT10 ANEL -> 3-4 weeks for Hodur • We still need to improve or create tools when RE requires de-obfuscating code precisely • Code will be available online after the conference #### • The developed scanner keeps tracking the malware C2s on the Internet ###### • We can respond proactively using the intel #### 57 -----