{
	"id": "ac71dbbd-0a48-4c33-aea1-6f8ab0c463e3",
	"created_at": "2026-04-06T00:20:03.426837Z",
	"updated_at": "2026-04-10T03:21:56.052915Z",
	"deleted_at": null,
	"sha1_hash": "ecea631dfb8ea903ad803a3756b270bbdc603f5f",
	"title": "Automatically Unpacking IcedID Stage 1 with angr",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 667022,
	"plain_text": "Automatically Unpacking IcedID Stage 1 with angr\r\nPublished: 2022-05-30 · Archived: 2026-04-05 13:32:49 UTC\r\nIt started with 0verfl0w posting a small challenge on the Zero 2 Automated discord server asking to automatically\r\nextract the configuration of an unpacked IcedID sample\r\n(0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7).\r\nUnpacking the sample was part of the exercise but could be done manually as a one shot, however the more I\r\nlooked into the stager, the more i thought an automated unpacker would be a fun thing to do.\r\nI’ll skip over some details of the stager (like API hashing and injection) to focus only on the unpacking part.\r\nTL;DR: full code is available here: https://github.com/matthw/icedid_stage1_unpack.\r\nEDIT: it unpacks samples packed with SPLCrypt, including BazarLoader.\r\n1. Structure and Flow\r\nThe packed data are really easy to identify: there’s a huge hex string in the data section, and by hex string i a mean\r\nlittleral string of [0-9a-f] characters.\r\n[...]\r\n000af670 33 62 36 64 34 39 61 61 36 35 33 36 31 34 64 65 |3b6d49aa653614de|\r\n000af680 33 31 62 32 66 64 37 31 65 64 38 66 61 30 37 63 |31b2fd71ed8fa07c|\r\n000af690 63 34 30 39 64 64 34 38 61 65 36 35 38 39 31 61 |c409dd48ae65891a|\r\n000af6a0 63 36 33 61 30 39 39 36 31 38 61 63 38 35 30 33 |c63a099618ac8503|\r\n000af6b0 62 34 32 37 39 31 36 63 66 36 31 66 31 31 33 30 |b427916cf61f1130|\r\n000af6c0 37 66 35 39 30 35 33 31 65 37 37 39 35 34 31 33 |7f590531e7795413|\r\n000af6d0 63 64 31 62 32 30 00 00 00 00 00 00 00 00 00 00 |cd1b20..........|\r\nThe unpacking process is as follow:\r\n ┌────────────┐\r\n │ │\r\n │ Hex Decode │\r\n │ │\r\n └─────┬──────┘\r\n │\r\n │\r\n ┌─────▼──────┐\r\n │ │\r\n ┌─────┤ RC4 │\r\n │ │ │\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 1 of 24\n\n│ └─────┬──────┘\r\n │ │\r\n │ │\r\n┌──────▼─────┐ │\r\n│ │ │\r\n│ XOR │ │\r\n│ (optionnal)│ │\r\n│ │ │\r\n└─────┬──────┘ │\r\n │ │\r\n │ │\r\n │ ┌─────▼──────┐\r\n │ │ │\r\n └──────► QuickLZ │\r\n │(decompress)│\r\n │ │\r\n └─────┬──────┘\r\n │\r\n │\r\n ┌─────▼──────┐\r\n │ │\r\n │ Split │\r\n │ │\r\n └────────────┘\r\nThe control flow at assembly level is very obfuscated, so the decompiler comes handy even if it doesn’t produces\r\nperfect results.\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 2 of 24\n\n1.1. Hex decode\r\nThe first step is to decode the hex string:\r\n for (i = 0; i \u003c length; i = i + 2) {\r\n chr = hexencoded_data[i];\r\n next_chr = hexencoded_data[i + 1];\r\n \r\n v1 = is_valid_hex_chr(chr);\r\n \r\n /* not an hex digit, ciao */\r\n if ((v1 == 0) || (v1 = is_valid_hex_chr(next_chr), v1 == 0)) {\r\n memset((ulonglong)destination,0,0x10,uVar2,chr,length);\r\n get_TEB();\r\n (*RtlFreeHeap)();\r\n return 0;\r\n }\r\n /\r\n /* convert 1st ascii chr to hex value, ex: 'a' -\u003e 0xa */\r\n v1 = hex_digit_to_int(chr);\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 3 of 24\n\n/* 0xa -\u003e 0xa0 */\r\n high4 = (byte)(v1 \u003c\u003c 4);\r\n /* convert 2nd ascii chr to hex value '8' -\u003e 0x8 */\r\n low4 = hex_digit_to_int(next_chr);\r\n /* make it a byte: 0xa0 | 0x8 == 0xa8 */\r\n *(byte *)(*destination + (i \u003e\u003e 1)) = high4 | (byte)low4;\r\n }\r\nthis is a plain equivalent to python’s bytes.fromhex(...)\r\n1.2. RC4\r\nThe RC4 routine is easily identified:\r\nThe parameters 4 and 5 are respectively a pointer to the key and the length of the key (which is always 4\r\napparently).\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 4 of 24\n\n1.3. XOR\r\nThe XOR was not present in all samples i checked, but when applied, it reuses the RC4 key.\r\nIt looks intimidating but in reality it can be translated to:\r\n for x in range(len(data) - 1):\r\n data[x] = ((data[x] ^ key[x % len(key)]) - data[x + 1]) \u0026 0xff\r\n1.3. QuickLZ\r\nThe QuickLZ part was harder to identify. On the first sample I analyzed, there was no compression applied, so at\r\nthis point the decrypted data looked OK\r\nI could find a valid PE file inside the decrypted data:\r\n00000db0 7c 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 ||MZ.............|\r\n00000dc0 00 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 |.........@......|\r\n00000dd0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|\r\n00000de0 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 00 00 |................|\r\n00000df0 00 0e 1f ba 0e 00 b4 09 cd 21 b8 01 4c cd 21 54 |.........!..L.!T|\r\n00000e00 68 69 73 20 70 72 6f 67 72 61 6d 20 63 61 6e 6e |his program cann|\r\n00000e10 6f 74 20 62 65 20 72 75 6e 20 69 6e 20 44 4f 53 |ot be run in DOS|\r\n00000e20 20 6d 6f 64 65 2e 0d 0d 0a 24 00 00 00 00 00 00 | mode....$......|\r\n00000e30 00 21 c9 10 93 65 a8 7e c0 65 a8 7e c0 65 a8 7e |.!...e.~.e.~.e.~|\r\nbut i still noticed some kind of header at the very beginning of the extracted data, and that the dword starting at\r\noffset 1 was actually the size of the data blob\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 5 of 24\n\nI thought it was some kind of internal structure I could just ignore, until I started having issues with some samples\r\nwhere the embedded PE file seemed corrupt:\r\n00000c40 3e eb df 8f 46 03 4d 5a 90 00 03 f3 03 00 80 46 |\u003e...F.MZ.......F|\r\n00000c50 8a 10 ff ff cd 17 20 c6 00 06 93 fb 01 00 0b f2 |...... .........|\r\n00000c60 03 0e 1f ba 0e 00 b4 09 cd 21 b8 01 4c cd 21 54 |.........!..L.!T|\r\n00000c70 68 69 73 20 70 72 00 00 00 80 6f 67 72 61 6d 20 |his pr....ogram |\r\n00000c80 63 61 6e 6e 6f 74 20 62 65 20 72 75 6e 20 69 6e |cannot be run in|\r\n00000c90 20 44 4f 53 20 6d 6f 64 65 20 40 10 c4 2e 0d 0d | DOS mode @.....|\r\n00000ca0 0a 24 12 11 21 c9 10 93 65 a8 7e c0 16 01 42 6e |.$..!...e.~...Bn|\r\n00000cb0 05 c0 67 20 16 ca 7f c1 6e 0a 05 7f c0 4f 20 08 |..g ....n....O .|\r\n00000cc0 0a 07 88 83 cc 7a 0a 04 83 cc 7e c1 64 0a 02 7c |.....z....~.d..||\r\n00000cd0 0a 02 52 69 63 68 06 0e 16 25 8b 5e 03 64 86 07 |..Rich...%.^.d..|\r\nAfter some time staring at the code, it turned out that it’s using QuickLZ.\r\nMajor pointers were:\r\nthe header format, as described here:\r\n┌─────╥──╥──╥──╥──╥──╥──╥──╥──┐\r\n│Flags║ Comp size ║ Dec size │\r\n└─────╨──╨──╨──╨──╨──╨──╨──╨──┘\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 6 of 24\n\nfinding code like this\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 7 of 24\n\nlooking very very similar to https://github.com/sergey-dryabzhinsky/python-quicklz/blob/master/quicklz.c#L630\r\nLuckily for us there’s python bindings for QuickLZ, which work just fine: https://pypi.org/project/pyquicklz/.\r\n1.4 Split\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 8 of 24\n\nThe decrypted data blob can be split at every occurences of the |SPL| marker, the string is build on the stack:\r\nI kind of skipped the details in my analysis because i do not need them for now.\r\n2. Automating\r\nRight now we have everything we need for unpacking:\r\nthe data blob is easy to grab from the data section, with a regex for example\r\nRC4 is vanilla\r\nthe XOR is easy to implement\r\nQuickLZ has python bindings\r\nThe only thing we need to be able to recover is the RC4/XOR key, and that’s where the fun begins.\r\nThe key is not stored as data, instead it’s computed in the code and stored on the stack:\r\nso in this case the key is:\r\n\u003e\u003e\u003e p32(0x11c7425e + 0x68)\r\nb'\\xc6B\\xc7\\x11'\r\n2.1. Failed Approach\r\nmy first approach was to match the bytes using a YARA rule like:\r\n rule key {\r\n strings:\r\n // C74424 34 5E42C711 | mov dword ptr ss:[rsp+34],11C7425E\r\n // 834424 34 68 | add dword ptr ss:[rsp+34],68\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 9 of 24\n\n$instr = { C7 44 24 ?? ?? ?? ?? ?? 8? ?? 24 ?? ?? }\r\n condition:\r\n $instr\r\n }\r\nemulate all the matches with unicorn, fetch the result from the stack and try all values as keys.\r\nThe key grabbing looked like this and worked on some samples:\r\ndef emulate(code):\r\n \"\"\" emulate the potential key instruction and return\r\n whatever 4 byte value is on the stack (or None)\r\n \"\"\"\r\n ADDR_TEXT = 0x1000000\r\n ADDR_STACK = 0x7000000\r\n mu = Uc(UC_ARCH_X86, UC_MODE_64)\r\n mu.mem_map(ADDR_TEXT, 0x1000)\r\n mu.mem_map(ADDR_STACK, 0x1000)\r\n # copy code\r\n mu.mem_write(ADDR_TEXT, code)\r\n # init rsp\r\n mu.reg_write(UC_X86_REG_RSP, ADDR_STACK)\r\n # emulate\r\n try:\r\n mu.emu_start(ADDR_TEXT, ADDR_TEXT + len(code))\r\n except unicorn.UcError:\r\n pass\r\n # read stack\r\n stack = mu.mem_read(ADDR_STACK, 0x100)\r\n # assume there's no null byte\r\n for v in [stack[i:i+4] for i in range(0, len(stack), 4)]:\r\n if u32(v) != 0:\r\n return bytes(v)\r\n return None\r\ndef find_keys(pe):\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 10 of 24\n\n''' find potential instructions setting the key\r\n '''\r\n # find .text\r\n data = get_section(pe, '.text')\r\n rule = yara.compile(source=\"\"\"\r\n rule key {\r\n strings:\r\n // C74424 34 5E42C711 | mov dword ptr ss:[rsp+34],11C7425E\r\n // 834424 34 68 | add dword ptr ss:[rsp+34],68\r\n $instr = { C7 44 24 ?? ?? ?? ?? ?? 8? ?? 24 ?? ?? }\r\n condition:\r\n $instr\r\n }\"\"\")\r\n yara_matches = rule.match(data=data)\r\n if not len(yara_matches):\r\n return []\r\n # potential code snippet setting the key\r\n key_code = []\r\n for offset, _, _ in yara_matches[0].strings:\r\n string = data[offset:offset+16]\r\n if string[3] == string[11]:\r\n #print(string)\r\n key_code.append(string)\r\n potential_keys = []\r\n for code in key_code:\r\n print(\"--- emulating:\")\r\n #disasm(code)\r\n key = emulate(code)\r\n # assume no null byte in key\r\n if key is not None and not b'\\x00' in key:\r\n potential_keys.append(key)\r\n return potential_keys\r\n ./unpack2.py 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x20], 4\r\n0x1008: add dword ptr [rsp + 0x20], 0\r\n0x100d: jmp 0x1067\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 11 of 24\n\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x34], 0x11c7425e\r\n0x1008: add dword ptr [rsp + 0x34], 0x68\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x14], 0xa6\r\n0x1008: add dword ptr [rsp + 0x14], 0x5a\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x20], 1\r\n0x1008: add dword ptr [rsp + 0x20], 0\r\n0x100d: jmp 0x101f\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x60], 0\r\n0x1008: add dword ptr [rsp + 0x60], 2\r\n0x100d: jmp 0xfda\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x50], 2\r\n0x1008: add dword ptr [rsp + 0x50], 2\r\n0x100d: cmp bl, bl\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x70], 0\r\n0x1008: add dword ptr [rsp + 0x70], 3\r\n0x100d: cmp bp, bp\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x28], 0x800000e0\r\n0x1008: sub dword ptr [rsp + 0x28], 0xe0\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x28], 0x800000e0\r\n0x1008: sub dword ptr [rsp + 0x28], 0xe0\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x44], 0\r\n0x1008: add dword ptr [rsp + 0x44], 3\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x24], 4\r\n0x1008: add dword ptr [rsp + 0x24], 0\r\n0x100d: jmp 0x1073\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x28], 0x800000e0\r\n0x1008: sub dword ptr [rsp + 0x28], 0xe0\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x24], 3\r\n0x1008: add dword ptr [rsp + 0x24], 1\r\n0x100d: jmp 0x1086\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x24], 1\r\n0x1008: add dword ptr [rsp + 0x24], 0\r\n0x100d: jmp 0x100f\r\n--- emulating:\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 12 of 24\n\n0x1000: mov dword ptr [rsp + 0x28], 1\r\n0x1008: add dword ptr [rsp + 0x28], 0\r\n0x100d: jmp 0x1050\r\n0x100f: nop\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x28], 3\r\n0x1008: add dword ptr [rsp + 0x28], 1\r\n0x100d: jmp 0xff6\r\n--- emulating:\r\n0x1000: mov dword ptr [rsp + 0x28], 0x7ffffa1\r\n0x1008: add dword ptr [rsp + 0x28], 0x5f\r\n0x100d: cmp di, di\r\nfound 1 potential keys: [b'\\xc6B\\xc7\\x11']\r\ngot 0x4c04b data blob\r\ndecrypted data\r\n- dump 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.dump\r\nfound 5 elements\r\n- dumped 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.0\r\n- dumped 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.1\r\n- dumped 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.2\r\n- dumped 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.3\r\n- dumped 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.4\r\nuntil the flow obfuscation of some samples brought even more fun to the party by putting a jump right in the\r\nmiddle of the key setting:\r\nrendering my initial and pretty naive approach useless.\r\n2.2. angr\r\nI only played with angr before to solve crackmes and CTF challenges, and I wanted to do something else with it to\r\npractice (because let’s face it, i’m pretty bad with angr).\r\nTo quote their website:\r\nangr is an open-source binary analysis platform for Python.\r\nIt combines both static and dynamic symbolic (\"concolic\") analysis, providing tools to solve a variety of tasks.\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 13 of 24\n\nMy new goal was:\r\nIdentify the RC4 function using some heuristics\r\nWalk the control-flow graph (CFG) up to find where it’s called from\r\nExecute the function(s) calling the RC4 function up to the actual call\r\nGet the key\r\n ┌────────────────────────────────────────\r\n │ FUNCTION WHATEVER │\r\n │ │\r\n ├────────────────────────────────────────\r\n │ ├────┐ 0x0332 (function sta\r\n ┌───────►│ 0x0332 INSTR1 │ │ to\r\nstep4: find function │ │ 0x0334 INSTR2 │ │ 0x0456 (call rc4)\r\n start addr │ │ etc... │ │\r\n 0x332 │ │ ... │ │\r\n └────────┤ ... │ │\r\n │ ... │ │\r\n ┌───────►│ 0x0456 CALL POTENTIAL_RC4 │ ◄──┘\r\n │ │ ... │ step6: dump the key parameter\r\n │ │ ... │\r\n │ │ ... │\r\nstep3: find XREF │ │ │\r\n for 0x1234 │ │ │\r\n -\u003e 0x0456 │ │ │\r\n │ └────────────────────────────────────────\r\n │\r\n │\r\n │\r\n │ ┌────────────────────────────────────────\r\n │ │ FUNCTION POTENTIAL_RC4 │\r\n └────────┤ │\r\n ├────────────────────────────────────────\r\n │ │\r\n ┌──────► │ 0x1234 INSTR1 │\r\n │ │ 0x1236 INSTR2 │\r\nstep2: find function │ │ etc... │\r\n start addr │ │ │ step1: find offset with\r\n 0x1234 │ │ ─────┐ ◄──────────────────────────────\r\n │ │ 0x1430 │ xor eax, ecx │\r\n └────────┤ 0x1432 │ movsxd rcx, dword [var_18h_2] │\r\n │ ──────┘ │\r\n │ │\r\n │ │\r\n └────────────────────────────────────────\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 14 of 24\n\nLuckily angr can provide a CFG and has a sense of “function”.\r\nWe can define a project and get the CFG:\r\n self.prj = angr.Project(filename, load_options={'auto_load_libs': False})\r\n self.cfg = self.prj.analyses.CFGFast()\r\n2.2.1. Finding the RC4 Function\r\nI used some loosy heuristic to find the RC4 but it seems to work well:\r\n # find .text\r\n section = get_section(self.pe, '.text')\r\n data = section.get_data()\r\n # oddly enough this seems to match the rc4 function\r\n # fairly accurately\r\n # like\r\n # 0x1800027c2 33c1 xor eax, ecx\r\n # 0x1800027c4 48634c2418 movsxd rcx, dword [var_18h_2]\r\n # or\r\n # 0x180004242 0fb68c0cd0000000 movzx ecx, byte [rsp + rcx + 0xd0]\r\n # 0x18000424a 33c1 xor eax, ecx\r\n # 0x18000424c e974feffff jmp 0x1800040c5 ; fcn.180003bbf+0x506\r\n rule = yara.compile(source=\"\"\"\r\n rule rc4 {\r\n strings:\r\n //$s1 = { 33 c1 }\r\n $s2 = { 33 c1 48 63 4c 24 ?? }\r\n $s3 = { 33 c1 (e9 | 3a) }\r\n condition:\r\n $s2 or $s3\r\n }\"\"\")\r\n # get matching offsets\r\n yara_matches = rule.match(data=data)\r\nThen we just need to fix the offsets - which are relative to the start of .text , so they match the virtual address:\r\n offsets = []\r\n for offset, _, _ in yara_matches[0].strings:\r\n # offset are relative to .text, rebase them\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 15 of 24\n\noff = self.pe.OPTIONAL_HEADER.ImageBase + section.VirtualAddress + offset\r\n offsets.append(off)\r\nUsing these offsets, we can find the start of the function they live in:\r\n for offset in offsets:\r\n # find function containg the offset\r\n func = self.find_func_addr(offset)\r\n if func is None:\r\n print(\"skip 0x%x: not part of a func...\"%offset)\r\n continue\r\n if not len(func.predecessors):\r\n print(\"skip 0x%x: no predecessor...\"%offset)\r\n continue\r\n if len(func.predecessors) \u003e 2:\r\n print(\"skip 0x%s: too many predecessors (%d)\"%(func.addr, len(func.predecessors)))\r\n continue\r\n print(\"found potential rc4 code: 0x%x\"%func.addr)\r\nWe discard an offset if:\r\nit does not belong to a function\r\nit belongs to a function with no predecessors (meaning it’s not called)\r\nit belongs to a function with more than 2 predecessors (called from more than 2 different places)\r\nthe RC4 function should only be called from one place - 2 is being conservative\r\nNow that we have the start address of a potential RC4 function, we need to:\r\nfind the XREF (the CALL rc4 )\r\nfind the calling function start address\r\nWhich is just repeating what we just did:\r\n # list of (start_addr, stop_addr) to emulate\r\n explorer = []\r\n for pred in func.predecessors:\r\n caller = self.find_func_addr(pred.addr)\r\n # skip some cases where start_addr == stop_addr\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 16 of 24\n\nif caller is not None and caller.addr != pred.addr:\r\n explorer.append((caller.addr, pred.addr))\r\n print(\" * found caller (0x%x -\u003e 0x%x)\"%(caller.addr, pred.addr))\r\nAt this stage, we should have a list in the form of:\r\nexplorer = [\r\n (addr_start_func1, addr_call_rc4_in_func1),\r\n (addr_start_func2, addr_call_rc4_in_func2),\r\n ...\r\n]\r\nWe can just emulate from the function start address to the call rc4 function address and dump the key\r\nfrom register r9 (according to the x64 fastcall convention, r9 holds the 4th parameter - the pointer to the key in\r\nour case).\r\n2.2.2. Emulation\r\nThe emulation goes as follow:\r\ncreate an initial state simulating a function call at our start address\r\nwe use the CALLLESS option to skip over function calls as they are not related to the key\r\ncomputation\r\nstate = self.prj.factory.call_state(addr=start_addr)\r\nstate.options.add(angr.options.CALLLESS)\r\ncreate a Simulation Manager and then step until one of the state reachs our destination address\r\nsimgr = self.prj.factory.simulation_manager(state)\r\nwhile True:\r\n simgr.step()\r\n for state in simgr.active:\r\n key = self.check_state(state, stop_addr)\r\n if key is not None:\r\n return key\r\nwith a few extra conditions to avoid looping when there’s no active path left or when the path gets too complex\r\n(arbitrary pick), it looks like this:\r\n def emulate(self, start_addr, stop_addr, max_iter=3000):\r\n \"\"\" symbolic execution from start_addr to stop_addr.\r\n max_iter is the maximum number of instructions\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 17 of 24\n\nreturn None if failed, or [r9]\r\n \"\"\"\r\n print(\"emulating from 0x%x to 0x%x (max iter = %s)\"%(start_addr, stop_addr, max_iter))\r\n state = self.prj.factory.call_state(addr=start_addr)\r\n # no function call\r\n state.options.add(angr.options.CALLLESS)\r\n simgr = self.prj.factory.simulation_manager(state)\r\n while True:\r\n # advance all states by one basic block\r\n simgr.step()\r\n max_iter -= 1\r\n # very arbitrary picks\r\n # we shouldnt run into too complex paths\r\n if not max_iter or len(simgr.active) \u003e 10 or not len(simgr.active):\r\n return None\r\n # check each active\r\n for state in simgr.active:\r\n key = self.check_state(state, stop_addr)\r\n if key is not None:\r\n return key\r\n return None\r\nThe check_state function will check if the current state address is the destination address, and if so, will\r\ndereference the value of the R9 register (4th parameter) and read a DWORD in there.\r\nI assume the key is always 4 bytes long, however should it not be the case, its length can be read from the stack\r\n(5th function parameter).\r\n def check_state(self, state, stop_addr):\r\n \"\"\" check if a state reached the expected address\r\n hook potential calls with unconstrained destinations\r\n returns the key if arrived at destination\r\n \"\"\"\r\n # final destination\r\n #if state.addr in range(stop_addr, stop_addr+8):\r\n if state.addr == stop_addr:\r\n # dereference r9 register and read a DWORD\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 18 of 24\n\n# we assume the key is 4 bytes, we could read its size off the stack\r\n return p32(state.solver.eval(state.mem[state.regs.r9].uint32_t.resolved))\r\nThere’s an extra twist in the check_state function.\r\nThe sample uses API hashing to resolve api proc addresses, and we explicitly told angr to skip function calls\r\n(CALLLESS). The effect of the CALLLESS flag is that the return value of all function calls will be unconstrained\r\n(symbolic).\r\nSo what should have been:\r\naddress = get_proc_address(0x12345678) // returns 0x18032323\r\n(*0x18032323)(arg1, arg2)\r\nbecomes:\r\naddress = get_proc_address(0x12345678) // returns some symbolic constant\r\n(*????????)(arg1, arg2)\r\nand basically angr stops because there’s too many possible paths (for some reason, even with the CALLLESS flag).\r\ncall to resolved function (call to a stack address):\r\nMy work around to that was to hook all CALL instructions to a temp address by:\r\nchecking if the node successor is reached via a call:\r\nif block.vex.jumpkind == 'Ijk_Call':\r\nthen checking if the call uses a temp value:\r\nif block.vex.next.tag == 'Iex_RdTmp':\r\nlooping over the block instructions to find the actual call and hook it with something of ours:\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 19 of 24\n\nfor insn in self.prj.factory.block(block.addr).capstone.insns:\r\n if insn.mnemonic == 'call':\r\n if insn.address not in self.hooks:\r\n print(\"hooking addr=0x%x size=%s\"%(insn.address, insn.size))\r\n self.prj.hook(insn.address, hook_api_hash, length=insn.size)\r\n # in order to avoid hook twice // angr would warn anyway\r\n self.hooks.append(insn.address)\r\nThe hook is very simple and looks like that:\r\ndef hook_api_hash(state):\r\n \"\"\" hook register calls with this\r\n \"\"\"\r\n # symbolize return value\r\n state.regs.rax = claripy.BVS('ret', 64)\r\nThe full check_state function looks like this:\r\n def check_state(self, state, stop_addr):\r\n \"\"\" check if a state reached the expected address)\r\n hook potential call with unconstrained destinatinations\r\n returns the key if arrived at destination\r\n \"\"\"\r\n # final destination\r\n #if state.addr in range(stop_addr, stop_addr+8):\r\n if state.addr == stop_addr:\r\n # dereference r9 register and read a DWORD\r\n # we assume the key is 4 bytes, we could read its size off the stack\r\n return p32(state.solver.eval(state.mem[state.regs.r9].uint32_t.resolved))\r\n #\r\n # hook registers calls (api hashing)\r\n # we want to hook all \"call $tmp\", otherwise angr gets lost\r\n # even with angr.options.CALLLESS\r\n #\r\n try:\r\n block = state.block()\r\n except angr.errors.SimEngineError:\r\n return None\r\n # verify that the block ends with a call\r\n if block.vex.jumpkind == 'Ijk_Call':\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 20 of 24\n\n# the next block is based on tmp value\r\n if block.vex.next.tag == 'Iex_RdTmp':\r\n # iterates over block instructions to find the call addr and size\r\n for insn in self.prj.factory.block(block.addr).capstone.insns:\r\n if insn.mnemonic == 'call':\r\n if insn.address not in self.hooks:\r\n print(\"hooking addr=0x%x size=%s\"%(insn.address, insn.size))\r\n self.prj.hook(insn.address, hook_api_hash, length=insn.size)\r\n # in order to avoid hook twice // angr would warn anyway\r\n self.hooks.append(insn.address)\r\n return None\r\nin the end we, we can just loop over the (start_addr, stop_addr) tupples, to get a list of potential RC4 keys:\r\n # emulate all potential calls\r\n potential_keys = []\r\n for start, stop in explorer:\r\n # emulate\r\n key = self.emulate(start, stop)\r\n if key:\r\n potential_keys.append(key)\r\n return potential_keys\r\n2.3. Decrypting\r\nnow that we constructed a list of potential keys, we can just try them all. using the QuickLZ header, we can know\r\nthat we found a correct one by matching the size of the data with what’s in the header:\r\ndef try_to_decrypt(data, potential_keys):\r\n ''' try all keys with xor and without\r\n it seems the xor is not always applied\r\n '''\r\n for key in potential_keys:\r\n for apply_xor in [True, False]:\r\n print(\"trying key %r / xor=%r\"%(key, apply_xor))\r\n dec = decrypt(data, key, apply_xor)\r\n if dec is not None:\r\n return dec\r\n return None\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 21 of 24\n\ndef decrypt(data, key, apply_xor):\r\n ''' decrypt + decompress data\r\n '''\r\n # RC4 decrypt\r\n cipher = ARC4(key)\r\n dec = bytearray(cipher.decrypt(data))\r\n # dexor\r\n if apply_xor:\r\n for x in range(len(dec) - 1):\r\n dec[x] = ((dec[x] ^ key[x % len(key)]) - dec[x + 1]) \u0026 0xff\r\n # Quick check we got valid data\r\n # ref: quicklz format: https://github.com/ReSpeak/quicklz/blob/master/Format.md\r\n # DWORD at decrypted data+1 should be the length\r\n if u32(dec[1:5]) == len(data):\r\n return quicklz.decompress(bytes(dec))\r\nThe XOR pass doesn’t seem to always be applied, so we try with and without.\r\nExtracting the C2 address and campain ID from the unpacked PE is pretty straight forward.\r\nWe just need to XOR 2 32 bytes data blob (from the .d section) with each other:\r\ndef extract_c2(filename):\r\n pe = pefile.PE(filename)\r\n data = get_section(pe, \".d\").get_data()\r\n key = data[:0x20]\r\n conf = data[0x40:0x40+0x20]\r\n data = xor(key, conf)\r\n camp = u32(data[:4])\r\n c2 = data[4:].split(b'\\x00')[0]\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 22 of 24\n\nreturn {'campaign_id': camp, 'c2': c2}\r\n4. Showcase\r\n% ./icedid_stage1_unpack.py 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin\r\ngot data blob: 0x4c04b bytes\r\nfound potential rc4 code: 0x1800026b3\r\n * found caller (0x180001b77 -\u003e 0x180001bc0)\r\nemulating from 0x180001b77 to 0x180001bc0 (max iter = 3000)\r\nfound 1 potential keys: [b'\\xc6B\\xc7\\x11']\r\ntrying key b'\\xc6B\\xc7\\x11' / xor=True\r\ndecrypted data: 0x4c042 bytes\r\nfound 5 elements\r\n- dumped 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.0\r\n- dumped 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.1\r\n- dumped 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.2\r\n looks like a PE... {'campaign_id': 109932505, 'c2': b'ilekvoyn.com'}\r\n- dumped 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.3\r\n- dumped 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.4\r\n% ./icedid_stage1_unpack.py samples/17aeebe6c1098a312074b0fdeae6f97339f2d64d66a2b07496bfc1373694a4e3.bin\r\ngot data blob: 0x3820 bytes\r\nfound potential rc4 code: 0x180003fc1\r\n * found caller (0x1800011c3 -\u003e 0x180001507)\r\nemulating from 0x1800011c3 to 0x180001507 (max iter = 3000)\r\nfound 1 potential keys: [b'k\\xfe\\xfa\\x8b']\r\ntrying key b'k\\xfe\\xfa\\x8b' / xor=True\r\ntrying key b'k\\xfe\\xfa\\x8b' / xor=False\r\ndecrypted data: 0x5714 bytes\r\nfound 4 elements\r\n- dumped samples/17aeebe6c1098a312074b0fdeae6f97339f2d64d66a2b07496bfc1373694a4e3.bin.extracted.0\r\n- dumped samples/17aeebe6c1098a312074b0fdeae6f97339f2d64d66a2b07496bfc1373694a4e3.bin.extracted.1\r\n- dumped samples/17aeebe6c1098a312074b0fdeae6f97339f2d64d66a2b07496bfc1373694a4e3.bin.extracted.2\r\n looks like a PE... {'campaign_id': 429479428, 'c2': b'arelyevennot.top'}\r\n- dumped samples/17aeebe6c1098a312074b0fdeae6f97339f2d64d66a2b07496bfc1373694a4e3.bin.extracted.3\r\n% ./icedid_stage1_unpack.py samples/12a692718d21b8dc3a8d5a2715688f533f1a978ee825163d41de11847039393d.bin\r\ngot data blob: 0x16064 bytes\r\nskip 0x6442458550: too many predecessors (4)\r\nfound potential rc4 code: 0x180003bbf\r\n * found caller (0x1800016bf -\u003e 0x180001980)\r\nemulating from 0x1800016bf to 0x180001980 (max iter = 3000)\r\nhooking addr=0x18000184b size=7\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 23 of 24\n\nhooking addr=0x180001c19 size=7\r\nhooking addr=0x180001c09 size=7\r\nhooking addr=0x180001bdc size=7\r\nfound 1 potential keys: [b',u\\xe2I']\r\ntrying key b',u\\xe2I' / xor=True\r\ndecrypted data: 0x179f7 bytes\r\nfound 5 elements\r\n- dumped samples/12a692718d21b8dc3a8d5a2715688f533f1a978ee825163d41de11847039393d.bin.extracted.0\r\n- dumped samples/12a692718d21b8dc3a8d5a2715688f533f1a978ee825163d41de11847039393d.bin.extracted.1\r\n- dumped samples/12a692718d21b8dc3a8d5a2715688f533f1a978ee825163d41de11847039393d.bin.extracted.2\r\n looks like a PE... {'campaign_id': 3068011852, 'c2': b'yolneanz.com'}\r\n- dumped samples/12a692718d21b8dc3a8d5a2715688f533f1a978ee825163d41de11847039393d.bin.extracted.3\r\n- dumped samples/12a692718d21b8dc3a8d5a2715688f533f1a978ee825163d41de11847039393d.bin.extracted.4\r\nThe extracted data blobs are:\r\n2 shellcodes\r\n1 DLL\r\n1 or 2 images:\r\n% file 0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.*\r\n0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.0: data\r\n0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.1: data\r\n0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.2: PE32+ executable (DLL) (GUI) x\r\n0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.3: JPEG image data, JFIF standard\r\n0581f0bf260a11a5662d58b99a82ec756c9365613833bce8f102ec1235a7d4f7.bin.extracted.4: JPEG image data, Exif Standard\r\n5. Conclusion\r\nWhile it is certainly not the most optimal method to unpack the samples, it was a fun exercise to do.\r\nThe full code is available here: https://github.com/matthw/icedid_stage1_unpack.\r\nSource: https://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nhttps://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/\r\nPage 24 of 24\n\nrule key { strings:   \n// C74424 34 5E42C711 | mov dword ptr ss:[rsp+34],11C7425E\n// 834424 34 68 | add dword ptr ss:[rsp+34],68\n  Page 9 of 24",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"references": [
		"https://matth.dmz42.org/posts/2022/automatically-unpacking-icedid-stage1-with-angr/"
	],
	"report_names": [
		"automatically-unpacking-icedid-stage1-with-angr"
	],
	"threat_actors": [],
	"ts_created_at": 1775434803,
	"ts_updated_at": 1775791316,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/ecea631dfb8ea903ad803a3756b270bbdc603f5f.pdf",
		"text": "https://archive.orkl.eu/ecea631dfb8ea903ad803a3756b270bbdc603f5f.txt",
		"img": "https://archive.orkl.eu/ecea631dfb8ea903ad803a3756b270bbdc603f5f.jpg"
	}
}