RAMBO: Run-time packer Analysis with Multiple Branch Observation Xabier Ugarte-Pedrero, Davide Balzarotti, Igor Santos, Pablo G. Bringas University of Deusto, Cisco Talos, Eurecom 13th Conference on Detection of Intrusions and Malware & Vulnerability Assessment Outline •  Why multi-path exploration for packers? •  Approach –  Domain-specific optimizations –  Heuristics •  Evaluation •  Discuss the results Run-time packers.. . •  Widely used by malware authors to obfuscate/ protect their code •  2 main goals –  Hide the original code from static analysis –  Implement anti-analysis methods •  Anti-debug •  Anti-dump •  VM / Sandbox / Tool detection •  Making both automated and manual analysis more difficult Shifting-decode-frames •  Also known as “partial code revelation” •  Takes advantage of the limitation of dynamic analysis –  Single path! •  Decrypt code/data on-demand •  Prevent “run and dump” •  Used by certain “advanced” protectors (i.e. Armadillo) •  Presented in academic literature (Bilge et. al.) –  Compile time function based protection 001010100 111001010 101011010 101110010 110101010 111010101 101101100 001010100 111001010 101011010 101110010 110101010 111010101 101101100 001010100 111001010 101011010 101110010 110101010 111010101 001010100 7 111001010 101011010 101110010 110101010 111010101 101101100 001010100 111001010 101011010 101110010 110101010 111010101 101101100 001010100 111001010 101011010 101110010 110101010 111010101 001010100 7 PUSH ebp 111001010 f) MOV ebp, esp 101011010 + (4 app eax, edx 101110010 POP ebp 110101010 | CALL 0x00401037 111010101 111010101 101101100 101101100 001010100 001010100 111001010 111001010 101011010 101011010 101110010 101110010 110101010 110101010 111010101 111010101 101101100 101101100 001010100 001010100 111001010 111001010 101011010 101011010 101110010 101110010 110101010 110101010 111010101 111010101 001010100 7 PUSH ebp 111001010 f) MOV ebp, esp 101011010 + (4 app eax, edx 101110010 POP ebp 110101010 | CALL 0x00401037 111010101 111010101 101101100 101101100 001010100 001010100 111001010 111001010 101011010 101011010 101110010 101110010 110101010 110101010 111010101 111010101 101101100 101101100 001010100 001010100 111001010 111001010 101011010 101011010 101110010 101110010 110101010 110101010 111010101 111010101 001010100 7] PUSH ebp 111001010 f) MOV ebp, esp 101011010 + (4 app eax, edx 101110010 POP ebp 110101010 | CALL 0x00401037 | 111010101 111010101 101101100 101101100 001010100 001010100 111001010 111001010 7] 101011010 101011010 101110010 101110010 110101010 110101010 111010101 111010101 101101100 101101100 001010100 001010100 111001010 111001010 101011010 101011010 101110010 101110010 110101010 110101010 111010101 111010101 001010100 7 111001010 101011010 101110010 110101010 | 111010101 101101100 001010100 111001010 101011010 101110010 110101010 111010101 101101100 001010100 111001010 101011010 101110010 110101010 111010101 PUSH ebp MOV ebp, esp ADD eax, edx POP ebp CALL 0x00401037 | 111010101 101101100 001010100 111001010 7] 101011010 101110010 110101010 111010101 101101100 — 001010100 111001010 101011010 101110010 110101010 111010101 - 001010100 111001010 101011010 101110010 | 110101010 111010101 101101100 001010100 T PUSH ebp MOV ebp, esp _| ADD eax, edx POP ebp PUSH ebp L_L.CALL 0x00401000 001010100 111001010 101011010 101110010 110101010 111010101 001010100 7 111001010 101011010 101110010 110101010 | 111010101 101101100 001010100 111001010 101011010 101110010 110101010 111010101 101101100 001010100 111001010 101011010 101110010 110101010 111010101 PUSH ebp MOV ebp, esp ADD eax, edx POP ebp CALL 0x00401037 | 111010101 101101100 001010100 111001010 7] 101011010 101110010 110101010 111010101 101101100 — 001010100 111001010 101011010 101110010 110101010 111010101 - 001010100 111001010 101011010 101110010 | 110101010 111010101 101101100 001010100 T PUSH ebp MOV ebp, esp _| ADD eax, edx POP ebp PUSH ebp L_L.CALL 0x00401000 001010100 111001010 101011010 101110010 110101010 111010101 001010100 7 111001010 101011010 101110010 110101010 _ 111010101 101101100 001010100 111001010 101011010 101110010 110101010 111010101 101101100 001010100 111001010 101011010 101110010 110101010 111010101 PUSH ebp 001010100 MOV ebp, esp 111001010 ADD eax, edx 101011010 POP ebp 101110010 _ CALL _oxoo4o1037 |__| | 110101010 111010101 111010101 ! 101101100 101101100 | 001010300, _______ 001010100 ; 111001010 = PUSH ebp 101011010 = MOV ebp, esp 101110010 |L _| ADD eax, edx 110101010 POP ebp 111010101 PUSH ebp 101101100_ = = 101101100 J __ _ _ _ LCALL ox00401000 001010100 001010100 | 111001010 111001010 , 101011010 101011010 101110010 101110010 | 110101010 110101010 ! 111010101 111010101 | Multi-path exploration.. . •  Computationally complex –  Specially with obfuscated (even self- modifying) code •  Does not scale to real-world, large, complex malware Multi-path exploration.. . •  Computationally complex –  Specially with obfuscated (even self- modifying) code •  Does not scale to real-world, large, complex malware Can we apply optimizations to multi-path exploration for this specific use case? Multi-path exploration •  Computationally complex –  Specially with obfuscated (even self- modifying) code •  Does not scale to real-world, large, complex malware Can we apply heuristics to multi-path exploration for unpacking this type of packers? Some intuitions.. . •  We do NOT need to explore every single path in the binary, just enough paths to uncover all the interesting regions. •  We do NOT need to understand which are the conditions to reach each path (unlike other use-cases, such as vulnerability analysis. •  We do NOT need to maintain the environment / system perfectly consistent. We just need to make sure that the execution is stable enough to uncover the protected regions. Multi-path exploration •  Baseline implementation –  Based on the concepts presented by Moser et al. •  Bitblaze platform –  Dynamic taint analysis (Temu) •  Taint result of function calls: –  Network/file/argument/time related –  Symbolic analysis (Vine) •  Based on Weakest precondition & queries to STP •  Concrete address for indirect memory accesses –  System-level snapshots •  Heavier, but we avoid dealing with system level inconsistencies: handles, open files, sockets... Optimizations #1 Partial symbolic execution Only execute certain regions of interest #2 Inconsistent multi-path exploration Ignore path constraints if solver cannot provide a solution Give priority to paths that can be solved consistently #3 Sacrifice global consistency Maintain consistency only for the regions of interest Optimizations #4 Discard long traces #5 Bypass blocking API calls #6 String comparisons Our model avoids exploring string comparison API calls We taint the output whenever input arguments are tainted This relaxes the constraints, allowing certain inconsistencies The general goal is to simplify symbolic processing General workflow Approach: 1.  Extract unpacked memory regions (frames) –  Generically detect the frames & dump at the appropriate point •  Prev. work: Deep Packer Inspection 2.  Process extracted code (disassemble, compute CFG) 3.  Find interesting points in the code (specific instructions) 4.  Compute which paths lead to these points 5.  Prioritize these paths during multi-path exploration General workflow Approach: 1.  Extract unpacked memory regions (frames) –  Generically detect the frames & dump at the appropriate point •  Prev. work: Deep Packer Inspection 2.  Process extracted code (disassemble, compute CFG) 3.  Find interesting points in the code (specific instructions) 4.  Compute which paths lead to these points 5.  Prioritize these paths during multi-path exploration Heuristic Decide which paths should be expanded first •  Several paths can trigger the execution of a region •  We can skip paths that can only lead to regions already unpacked Heuristic Steer the execution to the interesting points: •  JMP & CALL instructions –  that we have not executed in any run, but: –  If they lead to a region that has not been unpacked yet •  CJMP instructions leading to protected regions –  That have not been executed (but were unpacked) –  If we have only explored one of their paths •  Direct memory access (address not unpacked yet) •  Indirect calls (explore all the paths to these points) •  Immediate values that fall in the range of a protected memory region (may represent a memory access) Heuristic Also need to consider inter-procedural CFG: •  Explore all the paths that lead to a function, if it contains “points of interest”. Path selection during MPE: •  Breadth First Search –  Incrementally expand all the paths in the tree –  Prioritize other paths over loops •  Prioritize branches with the lowest number of expansions •  Prioritize paths that can be forced consistently over inconsistent ones Heuristic Last resort: path bruteforcing •  Set maximum number of expansions for each branch. •  When this limit is reached for all the tainted branches: –  Force the alternative path of non-tainted branches (INCONSISTENT!) •  Introduces inconsistencies, but can be useful to: –  Bypass loops or control structures with very complex internal logic depending on input •  E.g.: Parsers –  In some cases, we just need to jump to some point in the code to trigger its unpacking. Evaluation Case study #1: Backpack + Kaiten IRC Bot •  Compile-time packer proposed by Bilge et al. •  Function based granularity •  Kaiten: IRC bot that connects a channel and receives commands Iteration 0 Iteration 1 Iteration 2 No Heur. Functions unpacked 5/31 11/31 27/31 8/31 Interesting points - 52 96 - Cjmps - 36 110 - Snapshots - 167 544 6015 Tainted-consistent cjmps - 161 525 5888 Tainted-inconsistent cjmps - 6 19 127 Untainted cjmps - 0 40 - Long traces discarded - 6 0 - Time 5m 24m 1.2h 8h Evaluation Case study #2: Armadillo •  Page based granularity (based on memory protection) •  Protected 2 bots: SDBot, SpyBot. SDBOT It. 0 It. 1 It. 2 It. 3 No Heur. Functions unpacked 2/7 4/7 6/7 7/7 4/7 Interesting points - 3 2 7 - Cjmps - 65 162 264 - Snapshots - 14 366 367 3974 Tainted-consistent cjmps - 13 295 296 3660 Tainted-inconsistent cjmps - 1 71 71 314 Untainted cjmps - 0 1 1 - Long traces discarded - 1 14 14 - Time 30m 2.2h 2.8h 3.2h 8h SPYBOT Iteration 0 Iteration 1 Iteration 2 No Heur. Functions unpacked 3/9 8/9 9/9 6/9 Interesting points - 26 1 - Cjmps - 163 214 - Snapshots - 113 153 4466 Tainted-consistent cjmps - 17 31 4096 Tainted-inconsistent cjmps - 96 122 370 Untainted cjmps - 17 34 - Long traces discarded - 9 34 - Time 30m 3h 2.75h 8h Conclusions •  Plain vanilla multi-path exploration was not able to recover the code in a reasonable time (even with partial/ inconsistent exploration) •  With heuristic: –  Almost 100% recovery of code / data –  Significant reduction of time / resources when applying heuristics Discussion •  Strong limitations for sample selection –  For backpack, we needed linux-based source code. –  We needed sufficiently complex samples: •  For Armadillo, several pages of code. •  Complex parsing routines or logic. –  We needed non-packed samples. •  Otherwise, the packer would reveal all the original code at once. –  Simple malware families execute most the code in a single run (we needed bots). Discussion •  Technical complexity of protectors may affect multi-path exploration –  Calling convention violation –  Alternative methods to redirect control flow (push + ret, indirect calls, SEH/VEH based…) –  Resource exhaustion (intentionally introduce complexity to exhaust time-consuming analysis engines such as emulators) –  Nanomites (substitute branches by interrupts, compute the branch in a separate region of code or process) Questions! talosintelligence.com blog.talosintel.com @talossecurity