{
	"id": "60d5fc50-3d07-444d-8fd2-dd95e59dcfa6",
	"created_at": "2026-04-06T00:14:52.16437Z",
	"updated_at": "2026-04-10T03:38:09.908416Z",
	"deleted_at": null,
	"sha1_hash": "71b645bc5d5437335e22afca89d64ab03e1ca0f8",
	"title": "FinSpy VM Unpacking Tutorial Part 3: Devirtualization. Phase #2: First Attempt at Devirtualization — Möbius Strip Reverse Engineering",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 100968,
	"plain_text": "FinSpy VM Unpacking Tutorial Part 3: Devirtualization. Phase #2:\r\nFirst Attempt at Devirtualization — Möbius Strip Reverse\r\nEngineering\r\nBy Rolf Rolles\r\nPublished: 2018-02-21 · Archived: 2026-04-05 13:31:20 UTC\r\n[Note: if you've been linked here without context, the introduction to Part #3 describing its four phases can be\r\nfound here.]\r\n1. Introduction\r\nIn the previous Part #3, Phase #1, we inspected our FinSpy VM bytecode disassembly listing and discovered that\r\nthe Group #2 instructions were used for obfuscation. After discovering obfuscation patterns and replacing them\r\nwith simpler sequences, we obtained a simpler FinSpy VM bytecode program without any Group #2 instructions.\r\nAfter some further small improvements to the FinSpy VM bytecode disassembly process, we resume with a\r\nFinSpy VM bytecode listing that is suitable for devirtualization.\r\nThis Phase #2 discusses our first attempt at devirtualization. In fact, given the work done heretofore, the\r\ndevirtualization code comes to about 100 lines of thoroughly-commented Python code. We then inspect our\r\ndevirtualization, and discover several remaining issues that require correction before our task is complete. This\r\nphase fixes one of them, and the subsequent Part #3, Phase #3 examines the genesis of the remaining issues before\r\nfixing them. Part #3, Phase #4 makes a second (and eventually successful) approach toward devirtualizing FinSpy\r\nVM bytecode programs.\r\n2. Whole-Program Devirtualization, First Attempt\r\nAfter the previous Part #3, Phase #1 substitutions, our remaining FinSpy VM bytecode contains only Group #3\r\nVM instructions (containing raw x86 machine code) and Group #1 VM instructions (all 16 varieties of x86\r\nconditional branch instructions, and the now-correctly-identified unconditional JMP VM instructions). The\r\nfollowing snippet is fairly representative of our FinSpy VM bytecode program at present. It contains only \"Raw\r\nX86\" instructions, \"X86CALLOUT\", \"X86JUMPOUT\", and conditional/unconditional branch instructions -- the\r\nonly categories of remaining VM instructions after the previous phase. It basically looks like x86 already, except\r\nthe control flow instructions are FinSpy VM instructions instead of x86 instructions. \r\n0x008e50: X86 mov ebx, edi\r\n0x008e80: JMP VM[0x008aa8]\r\n0x008e98: X86 push 2E50340Bh\r\n0x008ec8: X86 push dword ptr [420358h]\r\n0x008f28: X86CALLOUT 0x408360\r\n0x008f40: X86 test eax, eax\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nPage 1 of 12\n\n0x008f58: JZ VM[0x009108] (fallthrough VM[0x008f70])\r\n0x008f70: X86 lea ecx, dword ptr [ebp+0FFFFFFFCh]\r\n0x008fd0: X86 push ecx\r\n0x009000: X86 push 0FFFFFFFFh\r\n0x009030: X86JUMPOUT jmp eax\r\n0x009048: X86 test eax, eax\r\n0x009060: JZ VM[0x009108] (fallthrough VM[0x009078])\r\n0x009078: X86 cmp dword ptr [ebp+0FFFFFFFCh], 0h\r\n0x009090: JZ VM[0x009108] (fallthrough VM[0x0090a8])\r\n0x0090a8: X86 xor eax, eax\r\n0x0090c0: X86 inc eax\r\n0x0090d8: X86 leave\r\n0x0090f0: X86 ret\r\n0x009108: X86 xor eax, eax\r\n0x009120: X86 leave\r\n0x009138: X86 ret\r\n2.1 Instruction-By-Instruction Devirtualization\r\nAt this point -- very quickly and easily into the analysis of the FinSpy VM bytecode program -- I felt comfortable\r\nwriting a tool to reconstruct an x86 machine code program from the VM bytecode program. (Perhaps I was too\r\ncomfortable, considering that a lot of work remained after my first attempt.) All of the remaining instructions\r\nseemed amenable to one-by-one translation back into x86. Namely, I wrote a simple loop to iterate over each VM\r\ninstruction and create an x86 machine code array for the devirtualized program. At this point, there are four cases\r\nfor the remaining VM instructions: the three varieties of Group #3 instructions, and the Group #1 branch\r\ninstructions. We show simplified code for the devirtualizer, and then discuss the cases for the instruction types in\r\nmore detail.\r\n# Given: insns, a list of FinSpy VM instructions\r\n# Generate and return an array of x86 machine code bytes for the VM program\r\ndef RebuildX86(insns):\r\n # Array of x86 machine code, built instruction-by-instruction\r\n mcArr = []\r\n # Bookkeeping: which VM position/key corresponds to which\r\n # position within the x86 machine code array mcArr above\r\n locsDict = dict()\r\n keysDict = dict()\r\n # List of fixups for branch instructions\r\n locFixups = []\r\n # Iterate through the FinSpy VM instructions\r\n for i in insns:\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nPage 2 of 12\n\n# currLen is the current position within mcArr\r\ncurrLen = len(mcArr)\r\n # Bookkeeping: memorize the x86 position for the\r\n # VM instruction's VM position and VM key\r\n locsDict[i.Pos] = currLen\r\n keysDict[i.Key] = currLen\r\n # Is it \"Raw X86\" or \"X86JUMPOUT\"? Just emit the\r\n # raw x86 machine code if so\r\n if isinstance(i,RawX86StraightLine):\r\n mcArr.extend(i.Remainder[0:i.DataLen])\r\n elif isinstance(i,RawX86Jumpout):\r\n mcArr.extend(i.Instruction[0:i.DataLen])\r\n # Is it a branch instruction?\r\n elif isinstance(i, ConditionalBranch):\r\n # Get the name of the branch\r\n jccName = INSN_NAME_DICT[i.Opcode]\r\n # Is this an unconditional jump?\r\n if jccName == \"JMP\":\r\n # Emit 0xE9 (x86 JMP disp32)\r\n mcArr.append(0xE9)\r\n # Opcode is 1 byte\r\n dispPos = 1\r\n # Otherwise, it's a conditional jump\r\n else:\r\n # Conditional jumps begin with 0x0F\r\n mcArr.append(0x0F)\r\n # Second byte is specific to the condition code\r\n mcArr.append(JCC_TO_OPCODE_DICT[jccName])\r\n # Opcode is 2 bytes\r\n dispPos = 2\r\n # Emit the displacement DWORD (0 for now)\r\n mcArr.extend([0x00, 0x00, 0x00, 0x00])\r\n # Emit a fixup: the JMP displacement targets\r\n # the VM location specified by i.VMTarget\r\n locFixups.append((i.Pos,dispPos,i.VMTarget))\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nPage 3 of 12\n\n# Is it X86CALLOUT?\r\nelif isinstance(i,RawX86Callout):\r\n # We aren't handling this just yet.\r\n # Emit E8 00 00 00 00 (CALL $+5)\r\n # Revisit later\r\n mcArr.append(0xE8)\r\n mcArr.extend([0x00, 0x00, 0x00, 0x00])\r\nNow we discuss the cases above:\r\nThe instruction is Group #3 \"Raw X86\", which already contains raw x86 machine code in an array inside\r\nof the VM instruction object. In this case, we simply append the raw x86 machine code to the array.\r\nThe instruction is Group #3 \"Call Indirect\". As this instruction also contains raw x86 machine code, the\r\ninitial idea and approach was to simply append it to the array identically to the previous case. (I later\r\ndiscovered that this case required more care, though it ended up not being very difficult.)\r\nThe instruction is a Group #1 conditional or unconditional jump. We discuss these in a subsequent\r\nsubsection.\r\nThe instruction is \"Call Direct\" (symbolized as X86CALLOUT in the FinSpy VM disassembly listing). We\r\ndiscuss these later in this subsection.\r\n2.2 Bookkeeping Cross-References\r\nWhile translating the instructions as described above, we update two dictionaries to keep track of the position of\r\neach VM instruction within the x86 machine code array. Recall from the introduction in Part #3, Phase #1 that\r\neach VM instruction has two uniquely identifying characteristics: 1) its offset within the VM bytecode array, a\r\nmultiple of 0x18 (the length of a VM instruction), and 2) a \"key\" DWORD used to locate instructions. The two\r\ndictionaries, called locsDict and keysDict respectively, map each instruction's position, and each instruction's key,\r\nto the offset within the x86 machine code array where the devirtualized instruction resides. Reproducing the\r\npertinent snippets from the simplified code above:\r\n# Given: insns, a list of FinSpy VM instructions\r\n# Generate and return an array of x86 machine code bytes for the VM program\r\ndef RebuildX86(insns):\r\n # Array of x86 machine code, built instruction-by-instruction\r\n mcArr = []\r\n # Bookkeeping: which VM location/key corresponds to which\r\n # position within the x86 machine code array mcArr above\r\n locsDict = dict()\r\n keysDict = dict()\r\n # Iterate through the instructions\r\n for i in insns:\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nPage 4 of 12\n\n# currLen is the current position in mcArr\r\n currLen = len(mcArr)\r\n # Bookkeeping: memorize the x86 position for the\r\n # VM instruction's VM position and VM key\r\n locsDict[i.Pos] = currLen\r\n keysDict[i.Key] = currLen\r\n # ... code to devirtualize was shown above ...\r\nFor example, the first VM instruction (the one at position 0x0 within the VM instruction array) is emitted at\r\nposition 0 within the x86 machine code blob. Thus, we update the dictionary locsDict with the binding\r\nlocsDict[0x0] = 0x0 (i.e., VM bytecode position 0x0 -\u003e x86 machine code position 0x0). The first instruction has\r\nkey 0x5A145D, so we update the dictionary keysDict with the binding: keysDict[0x5A145D] = 0x0.\r\nThe second VM instruction shall be devirtualized after the first instruction. The devirtualized first instruction was\r\nthree bytes in length, thus the second VM instruction begins at position 3 within the x86 machine code array.\r\nSince the second VM bytecode instruction corresponds to position 0x78 within the VM bytecode array (after the\r\nsimplifications from the previous Part #3, Phase #1), we update our dictionary with the information:\r\nlocsDict[0x78] = 3 (i.e., VM bytecode position 0x78 -\u003e x86 machine code instruction position 0x3). The second\r\ninstruction has key 0x5A1461, so we update the dictionary keysDict with the binding: keysDict[0x5A1461] =\r\n0x3. \r\nThis process continues sequentially for every instruction that we translate. We start a counter at 0x0, add the\r\nlength of the devirtualized x86 machine code after each step, and at each iteration we associate the current value\r\nof the counter with the instruction's key and position before generating x86 machine code.\r\n2.3 De-Virtualizing Branch Instructions\r\nWe have deferred discussing conditional branches; now we shall take the subject up again. \r\nx86 branch instructions are encoded based on the distance between the address of the branch instruction and the\r\naddress of the target. The addresses themselves of the source and destination locations aren't important, only the\r\ndifference between them. The opcode of each of the 16 x86 conditional branch types is 0F 8x for some value of x;\r\nthe opcode for an unconditional branch is E9. To encode a jump targeting the address 0x1234 bytes after the JMP\r\ninstruction, this would be encoded as E9 34 12 00 00. If we needed to encode an x86 \"JZ\" instruction, whose\r\ndestination was 0x1234 bytes after the JZ instruction, the encoding would be 0F 84 34 12 00 00.\r\nSince we are devirtualizing VM instructions one-by-one in forward order, if the branch target is in the reverse\r\ndirection, that means we've already devirtualized the destination, and hence we could immediately write the\r\nproper displacement. However, if the branch target is in the forwards direction, that means the destination lies at\r\nan address of a VM instruction that we haven't translated yet, and so we don't know the address of the destination,\r\nand so we can't immediately generate the x86 branch instruction. This conundrum -- common in the world of\r\nlinkers -- necessitates a two-phase approach.\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nPage 5 of 12\n\n1. Phase #1: During devirtualization, the first thing to do is to emit the opcode for the jump (recalling the\r\nprevious examples, e.g., E9 for an unconditional JMP, or 0F 84 for a conditional JZ). This is easy; we just\r\nlook up the proper opcode for a given jump type within a dictionary that maps VM branch instruction types\r\nto x86 opcodes. After the x86 opcode for the branch type, we write a displacement DWORD of 0x0.\r\nCrucially, we also generate a \"fixup\": we add an entry to a list that contains both the position of the\r\ndisplacement DWORD within the x86 machine code array, as well as the VM bytecode instruction EIP to\r\nwhich the jump must eventually point. These fixups are processed by phase two. (Note that the code for\r\nphase #1 was shown in the previous section.)\r\n2. Phase #2: After devirtualizing the entire FinSpy VM bytecode program, the \"locsDict\" dictionary described\r\nin the previous section on bookkeeping now has information about where each VM instruction lies within\r\nthe devirtualized x86 machine code array. We also have a list of fixups, telling us which locations in the\r\nx86 machine code program correspond to the dummy 0x0 DWORDs within the branch instructions that we\r\ngenerated in the previous phase, which now need to be fixed up to point to the correct locations. Each such\r\nfixup also tells us the position of the VM bytecode instruction (within the VM bytecode array) to which the\r\nbranch should point. Thus, we simply look up the destination's devirtualized x86 machine code position\r\nwithin locsDict, and compute the difference between the position after the jump displacement DWORD\r\nand the position of the destination. We replace the dummy 0x0 DWORD with the correct value. Voila, the\r\nbranches now have the correct destinations.\r\nHere is the code for phase #2, which executes after the main loop in \"RebuildX86\" has emitted the devirtualized\r\nx86 machine code for all VM instructions:\r\n# Fixups contain:\r\n# * srcBegin: beginning of devirtualized branch instruction\r\n# * srcFixup: distance into devirtualized branch instruction\r\n# where displacement DWORD is located\r\n# * dst: the position within the VM program where the\r\n# branch destination is located\r\nfor srcBegin, srcFixup, dst in locFixups:\r\n # Find the machine code address for the source\r\n mcSrc = locsDict[srcBegin]\r\n # Find the machine code address for te destination\r\n mcDst = locsDict[dst]\r\n # Set the displacement DWORD within x86 branch instruction\r\n StoreDword(mcArr, mcSrc+srcFixup, mcDst-(mcSrc+srcFixup+4))\r\n(Pedantic optional note: x86 also offers short forms of the conditional and unconditional branch instructions: if the\r\ndisplacement of the destination fits within a single signed byte, there are shorter instructions that can be used to\r\nencode the branch. I.e., EB 50 is the encoding for unconditionally jumping to 50 bytes after the source instruction,\r\nand 75 F0 performs a JNZ to 0xFFFFFFF0 bytes after the source instruction. Because using the short forms\r\nrequires a more sophisticated analysis, I chose to ignore the short forms of the branch instructions (and use only\r\nthe long forms) when performing devirtualization. The only side effect of this is that some branch instructions in\r\nthe devirtualized program are longer than necessary -- but in terms of their semantic effects, the long and short\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nPage 6 of 12\n\nbranches function identically. If you wanted to be less lazy about this, and emit small jumps when applicable, you\r\nshould read this paper or the source code to an x86 assembler.)\r\n2.4 Devirtualizing X86CALLOUT Instructions, Take One\r\nThe last variety of VM instruction that we need to handle is the X86CALLOUT instruction. These instructions\r\nspecify an RVA within the .text section at which the targeted x86 function begins. x86 CALL instructions are\r\nencoded identically to x86 branch instructions -- following the x86 CALL opcode E8, there is a DWORD\r\nspecifying the distance from the end of the x86 CALL instruction to the target address. Thus, like we just\r\ndiscussed for devirtualizing branch instructions, we need to know the source and destination addresses for the\r\nCALL to generate the proper displacement for the x86 CALL instruction. \r\nThe targets of the virtualized branch instructions are specified as locations within the VM program. I.e., none of\r\nthe jumps exit the VM. Thus, our task in fixing the branch instructions was simply to locate the distance between\r\nthe branch instruction in the devirtualization and its target. Since x86 branch instructions are relatively-addressed,\r\nwe do not need to take into account where within the original binary we will eventually reinsert the devirtualized\r\ncode -- just the distance between the source and target address is enough information.\r\nOn the other hand, the targets of all X86CALLOUT instructions are outside of the VM. Like with jumps, we'll\r\nneed to know the distance between the source and target addresses. Unlike with jumps, for the devirtualized\r\nCALL instructions, we do need to know where within the original binary our devirtualized code shall be located in\r\norder to compute the CALL displacement DWORD correctly.\r\nSince I hadn't decided where I was going to store the devirtualized code, and yet I wanted to see if my\r\ndevirtualization process was otherwise working, I decided to postpone resolving this issue. I decided for the time\r\nbeing to just generate dummy x86 instructions to devirtualize X86CALLOUT instructions. I.e., for each \"Call\r\nDirect\" VM instruction, at this stage, I chose to emit E8 00 00 00 00 (x86 CALL $+5) for its machine code.\r\nClearly this is an incorrect translation -- to reiterate, this approach is just a placeholder for now -- and the\r\ndevirtualized program will not work at this stage (and we won't see proper call destinations in the devirtualized\r\nprogram), but doing this allowed me to proceed without having to decide immediately where to put the\r\ndevirtualized code. \r\n(Note that when it came time to fix this issue, I actually discovered a second and much more severe set of issues\r\nwith devirtualizing X86CALLOUT instructions, which ended up consuming roughly half of my total time spent in\r\nthe devirtualization phase. We will return to those issues when they arise naturally.)\r\n3. Moment of Truth, and Stock-Taking\r\nAt last, we have our first devirtualized version of the FinSpy VM bytecode program for this sample. If you would\r\nlike to see the results for yourself, you can load into IDA the binary for the first devirtualization. Now for the real\r\ntest: did it work? How close to being done are we?\r\nI took the output .bin file and loaded it into IDA. It looked pretty good! Most of it looks like something that was\r\ngenerated by a compiler. After the thrill of initial success wore off, I began looking for mistakes, things that hadn't\r\nbeen properly translated, or opportunities to improve the output. After this investigation, I'll go back to the\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nPage 7 of 12\n\ndrawing board and see what I can do about fixing them. Then I'll do the same thing again: look at the second\r\niteration of the output, find more issues, fix them, and repeat. Hopefully, this process will eventually terminate. (It\r\ndid.)\r\n3.1 Problem #1: Call Targets are Missing\r\nThis problem was anticipated. As discussed in the last section, since I wasn't sure just yet how to handle the\r\nX86CALLOUT targets, I deliberately emitted broken x86 CALL instructions (namely E8 00 00 00 00, CALL\r\n$+5), with the plan to fix them later. Not surprisingly, those instructions were broken in the devirtualized output:\r\nseg000:000005D6 BE 08 02 00 00  mov  esi, 208h\r\nseg000:000005DB 56  push  esi\r\nseg000:000005DC 8D 85 C0 FB FF FF  lea  eax, [ebp-440h]\r\nseg000:000005E2 57  push  edi\r\nseg000:000005E3 50  push  eax\r\nseg000:000005E4 E8 00 00 00 00  call  $+5 ; \u003c- call target missing\r\nseg000:000005E9 56  push  esi\r\nseg000:000005EA 8D 85 C8 FD FF FF  lea  eax, [ebp-238h]\r\nseg000:000005F0 57  push  edi\r\nseg000:000005F1 50  push  eax\r\nseg000:000005F2 E8 00 00 00 00  call  $+5 ; \u003c- call target missing\r\nseg000:000005F7 56  push  esi\r\nseg000:000005F8 8D 85 A8 F5 FF FF  lea  eax, [ebp-0A58h]\r\nseg000:000005FE 57  push  edi\r\nseg000:000005FF 50  push  eax\r\nseg000:00000600 E8 00 00 00 00  call  $+5 ; \u003c- call target missing\r\nseg000:00000605 83 C4 24  add  esp, 24h\r\nTherefore I don't know where any of the calls are going, and so there are no function call cross-references.\r\nObviously we will need to stop kicking the can down the road at some point and properly attend to this issue.\r\n3.2 Observation #2: What are These Indirect Jump Instructions?\r\nAnother thing I noticed was an odd juxtaposition of indirect jump instructions where the surrounding context\r\nindicated that an indirect call would have been more sensible. For example:\r\nseg000:00000B51  push  dword ptr [ebp-0Ch]\r\nseg000:00000B54  mov  eax, ds:41FF2Ch\r\nseg000:00000B59  jmp  dword ptr [eax+14h]\r\nseg000:00000B5C  mov  eax, ds:41FF38h\r\nseg000:00000B61  push  esi\r\nseg000:00000B62  jmp  dword ptr [eax+90h]\r\nseg000:00000B68  mov  eax, ds:41FF38h\r\nseg000:00000B6D  jmp  dword ptr [eax+74h]\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nPage 8 of 12\n\nWe see arguments being pushed on the stack, as though in preparation for a call. Then, instead of a call, we see a\r\njump. Then, the assembly language indicates that another call should be taking place, but instead there's another\r\nindirect jump. There are no intervening cross-references jumping to the locations after the indirect jumps. What's\r\nup with that?\r\n3.3 Problem #3: None of the Functions Have Prologues\r\nAlthough a lot of the functions looked exactly like x86 machine code emitted by Microsoft Visual Studio to me,\r\nsomething was off. IDA tipped me off to the problem, since every function had a red message indicating a\r\nproblem with the function's stack frame. Here is a screenshot of IDA, showing the colored message (and with the\r\nstack pointer shown for illustration):\r\nMost of the code looks good, but the \"leave\" instruction is supposed to pop the EBP register off the stack -- and\r\nyet the prologue does not push the EBP register onto the stack. Also, the first instruction makes reference to \"\r\n[EBP+8]\", which assumes that EBP has previously been established as the frame pointer for this function -- and\r\nyet, being the first instruction in the function, clearly the devirtualized code has not yet established EBP as the\r\nframe pointer. Which memory location is actually being accessed by this instruction?\r\nIn other functions, we see the epilogues popping registers off the stack -- but inspecting the beginnings of those\r\nfunctions, we see that those instructions were not previously pushed onto the stack. Here's an example from the\r\nbeginning of a function:\r\nseg000:000004FB  mov  ebx, [ebp+8] ; Overwrite EBX\r\nseg000:000004FE  mov  edi, [ebx+3Ch] ; Overwrite EDI\r\nseg000:00000501  add  edi, ebx\r\nseg000:00000503  mov  eax, [edi+0A0h]\r\nseg000:00000509  test  eax, eax  ; Early return check\r\nseg000:0000050B  jz  return_ebx  ; JZ taken =\u003e fail, return\r\n; ... more function code ...\r\nseg000:000005A3 return_ebx:\r\nseg000:000005A3  pop  edi  ; Restore EDI (NOT PREVIOUSLY SAVED)\r\nseg000:000005A4  mov  eax, ebx\r\nseg000:000005A6  pop  ebx  ; Restore EBX (NOT PREVIOUSLY SAVED)\r\nseg000:000005A7  leave  ; Restore EBP (NOT PREVIOUSLY SAVED)\r\nseg000:000005A8  retn  8  ; Return\r\nWe're going to need to know why these functions seemingly are missing instructions at their beginnings.\r\nI can see why IDA is complaining about these functions -- despite being mostly coherent x86 implementations of\r\nC functions, they are clearly slightly broken. But how are they broken, and how can I fix them?\r\nAt this point I remembered something about the original binary. Back in part one, we analyzed a few VM\r\nentrypoints, and noticed that they began with normal-looking function prologues, before a segue into gibberish\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nPage 9 of 12\n\ninstructions, before finally pushing a VM key on the stack and transferring control to the VM entrypoint. To wit,\r\nhere is the x86 code corresponding to the virtualized function from the screenshot above:\r\n.text:00401340  push  ebp  ; Ordinary prologue\r\n.text:00401341  mov  ebp, esp  ; Ordinary prologue\r\n.text:00401343  push  esi  ; Save obfuscation register #1\r\n.text:00401344  push  edx  ; Save obfuscation register #2\r\n.text:00401345  mov  edx, offset word_403DFE ; Junk obfuscation\r\n.text:0040134A  xor  esi, esp  ; Junk obfuscation\r\n.text:0040134C  shl  esi, cl  ; Junk obfuscation\r\n.text:0040134E  shr  esi, 1  ; Junk obfuscation\r\n.text:00401350  shl  esi, cl  ; Junk obfuscation\r\n.text:00401352  pop  edx  ; Restore obfuscation register #2\r\n.text:00401353  pop  esi  ; Restore obfuscation register #1\r\n.text:00401354  push  5A145Dh  ; Push VM instruction key\r\n.text:00401359  push  edx  ; Obfuscated JMP\r\n.text:0040135A  xor  edx, edx  ; Obfuscated JMP\r\n.text:0040135C  pop  edx  ; Obfuscated JMP\r\n.text:0040135D  jz  GLOBAL__Dispatcher  ; Enter VM\r\nThat's where the missing function prologues went -- they are still at the locations where at which original\r\nfunctions resided within the binary. The prologues have not been virtualized. Upon calling an x86 function that\r\nhas been virtualized, the function prologue will execute in x86 as usual, before the virtualized body of the function\r\nis executed within the VM. So in order to properly devirtualize the functions within the VM bytecode, we will\r\nneed to extract their prologues from the x86, and during devirtualization, prepend those prologue bytes before the\r\ndevirtualization of the function bodies. \r\n4. Fixing the Indirect Jumps\r\nNow that we've identified a few issues with the devirtualized code, our next task is to fix them. We'll start with the\r\nlowest-hanging of the fruit, the weird indirect jumps, reproduced here for coherence:\r\nseg000:00000B51  push  dword ptr [ebp-0Ch]\r\nseg000:00000B54  mov  eax, ds:41FF2Ch\r\nseg000:00000B59  jmp  dword ptr [eax+14h]\r\nseg000:00000B5C  mov  eax, ds:41FF38h\r\nseg000:00000B61  push  esi\r\nseg000:00000B62  jmp  dword ptr [eax+90h]\r\nseg000:00000B68  mov  eax, ds:41FF38h\r\nseg000:00000B6D  jmp  dword ptr [eax+74h]\r\nThe x86 JMP instructions were copied verbatim from machine code stored within X86JUMPOUT instructions.\r\nSo, in a sense, the JMP instructions are \"correct\". Nevertheless, this disassembly listing looks wrong. The indirect\r\njump does not push a return address, so the code at the destination doesn't know where to resume executing.\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nPage 10 of 12\n\nConsequently, the instruction following one of the indirect jumps will never execute unless its address is\r\nreferenced somewhere else in the devirtualized code -- but that didn't seem to be the case, and it would have been\r\nweird if it was the case, since none of the situations in which a compiler would have emitted an address contained\r\nwithin the body of a function (a switch case, or an exception handler) seemed to apply. Another possibility would\r\nhave been tail calls to function pointers, but the context of the indirect jumps did not indicate that a tail call was\r\nabout to take place.\r\nIt struck me that these indirect jumps probably ought to be indirect calls instead. To investigate, I looked again at\r\nthe disassembly of the VM instruction handler for the \"Indirect Call\" instructions. The \"Indirect Call\" FinSpy VM\r\ninstruction is one of the ones that generates code dynamically while executing. Reproduced from part two, here's\r\nthe code that the FinSpy VM generates dynamically when interpreting an \"Indirect Call\" instruction:\r\npopf  ; Restore flags from VMContext\r\npopa  ; Restore registers from VMContext\r\npush offset @RESUME  ; PUSH RETURN ADDRESS (@RESUME)\r\n[X86 machine code for indirect jump, copied from VM instruction]\r\n@RESUME: ; \u003c- RETURN LOCATION\r\npush offset NextVMInstruction ; Resume at next VM insn\r\npusha  ; Save registers\r\npushf  ; Save flags\r\npush offset VMContext\r\npop ebx  ; EBX := VMContext *\r\npush offset fpVMReEntryReturn\r\nret  ; Re-enter VM\r\nSo indeed, before executing the indirect jump instruction from the x86 machine code stored within the instruction,\r\nthe FinSpy VM \"Indirect Call\" handler pushes a return address on the stack. The return address points to the\r\ndynamically-generated code at the @RESUME label in the snippet above, which then continues execution at the\r\nnext VM instruction following the \"Indirect Call\" instruction. That's why the devirtualized code isn't pushing a\r\nreturn address -- the VM takes care of that.\r\nI expect that the FinSpy VM authors have code to translate x86 indirect call instructions into x86 indirect jump\r\ninstructions. Thus I will need to do the reverse process: I will need to take the raw machine code contained in the\r\n\"Indirect Call\" VM instructions, and convert the x86 indirect jump instructions into call instructions instead. I\r\nwrote the following function, and added a call to it in the constructor for the Python object representing an\r\n\"Indirect Call\" VM instruction:\r\n# This function disassembles the raw machine code for indirect jump instructions,\r\n# changes the instructions therein to indirect call instructions, re-assembles\r\n# them, and returns the new machine code with its textual disassembly.\r\n#\r\n# Input: bytes, an array of machine code for an indirect jump.\r\n# Output: a tuple (string: disassembly, bytes: machine code for indirect call)\r\ndef ChangeJumpToCall(bytes):\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nPage 11 of 12\n\n# Decode the x86 machine code\r\ni2container = X86Decoder(StreamObj(bytes)).Decode(0)\r\n # Fetch the instruction from the decoded bundle\r\n insn = i2container.instr\r\n # Ensure it's an indirect jump!\r\n assert(insn.mnem == XM.Jmp)\r\n # Change the mnemonic to call\r\n insn.mnem = XM.Call\r\n # Return new textual disassembly and machine code\r\n return (str(insn),EncodeInstruction(insn))\r\nIn reality, using my disassembler library was overkill here. There are only a few ways to encode indirect calls and\r\nindirect jumps in x86 machine code; simple pattern-replacement could have identified and re-written them very\r\neasily. But despite being overkill, there are no technical drawbacks with the solution based on rewriting and\r\nreassembling the instructions.\r\nThe devirtualized binary after the above changes can be found here.\r\n5. Conclusion\r\nWe began this Phase #2 with a deobfuscated FinSpy VM bytecode disassembly listing. From there, we formulated\r\na strategy to devirtualize the program instruction-by-instruction. This was mostly straightforward, except we\r\ndeliberately did not devirtualize the X86CALLOUT instructions from Group #3. After devirtualization, we\r\ndiscovered several remaining issues, all relating to functions and function calls. This phase ended by examining\r\nand fixing one of those issues.\r\nIn the next Part #3, Phase #3, we will examine the source of the function-related issues in our current\r\ndevirtualization. In so doing, we will learn that we need to obtain some specific information about the\r\nX86CALLOUT VM instructions and virtualized functions that we shall need to obtain and incorporate into our\r\ndevirtualization process. Part #3, Phase #3 will then write scripts to collect this information. The final Part #3,\r\nPhase #4 will then incorporate this information into the devirtualization process, and then discover and correct a\r\nfew remaining issues before producing the final devirtualized listing for the FinSpy VM bytecode in our running\r\nexample.\r\nSource: https://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization\r\nPage 12 of 12\n\ndispPos # Emit = 2 the displacement DWORD (0 for now)\nmcArr.extend([0x00,  0x00, 0x00, 0x00])\n# Emit a fixup: the JMP displacement targets\n# the VM location specified by i.VMTarget \nlocFixups.append((i.Pos,dispPos,i.VMTarget))   \n   Page 3 of 12",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"references": [
		"https://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-2-first-attempt-at-devirtualization"
	],
	"report_names": [
		"devirtualizing-finspy-phase-2-first-attempt-at-devirtualization"
	],
	"threat_actors": [
		{
			"id": "cf7fc640-acfe-41c4-9f3d-5515d53a3ffb",
			"created_at": "2023-01-06T13:46:38.228042Z",
			"updated_at": "2026-04-10T02:00:02.883048Z",
			"deleted_at": null,
			"main_name": "APT1",
			"aliases": [
				"PLA Unit 61398",
				"Comment Crew",
				"Byzantine Candor",
				"Comment Group",
				"GIF89a",
				"Group 3",
				"TG-8223",
				"Brown Fox",
				"ShadyRAT",
				"G0006",
				"COMMENT PANDA"
			],
			"source_name": "MISPGALAXY:APT1",
			"tools": [],
			"source_id": "MISPGALAXY",
			"reports": null
		},
		{
			"id": "3aaf0755-5c9b-4612-9f0e-e266ef1bdb4b",
			"created_at": "2022-10-25T16:07:23.480196Z",
			"updated_at": "2026-04-10T02:00:04.626125Z",
			"deleted_at": null,
			"main_name": "Comment Crew",
			"aliases": [
				"APT 1",
				"BrownFox",
				"Byzantine Candor",
				"Byzantine Hades",
				"Comment Crew",
				"Comment Panda",
				"G0006",
				"GIF89a",
				"Group 3",
				"Operation Oceansalt",
				"Operation Seasalt",
				"Operation Siesta",
				"Shanghai Group",
				"TG-8223"
			],
			"source_name": "ETDA:Comment Crew",
			"tools": [
				"Auriga",
				"Cachedump",
				"Chymine",
				"CookieBag",
				"Darkmoon",
				"GDOCUPLOAD",
				"GLOOXMAIL",
				"GREENCAT",
				"Gen:Trojan.Heur.PT",
				"GetMail",
				"Hackfase",
				"Hacksfase",
				"Helauto",
				"Kurton",
				"LETSGO",
				"LIGHTBOLT",
				"LIGHTDART",
				"LOLBAS",
				"LOLBins",
				"LONGRUN",
				"Living off the Land",
				"Lslsass",
				"MAPIget",
				"ManItsMe",
				"Mimikatz",
				"MiniASP",
				"Oceansalt",
				"Pass-The-Hash Toolkit",
				"Poison Ivy",
				"ProcDump",
				"Riodrv",
				"SPIVY",
				"Seasalt",
				"ShadyRAT",
				"StarsyPound",
				"TROJAN.COOKIES",
				"TROJAN.FOXY",
				"TabMsgSQL",
				"Tarsip",
				"Trojan.GTALK",
				"WebC2",
				"WebC2-AdSpace",
				"WebC2-Ausov",
				"WebC2-Bolid",
				"WebC2-Cson",
				"WebC2-DIV",
				"WebC2-GreenCat",
				"WebC2-Head",
				"WebC2-Kt3",
				"WebC2-Qbp",
				"WebC2-Rave",
				"WebC2-Table",
				"WebC2-UGX",
				"WebC2-Yahoo",
				"Wordpress Bruteforcer",
				"bangat",
				"gsecdump",
				"pivy",
				"poisonivy",
				"pwdump",
				"zxdosml"
			],
			"source_id": "ETDA",
			"reports": null
		}
	],
	"ts_created_at": 1775434492,
	"ts_updated_at": 1775792289,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/71b645bc5d5437335e22afca89d64ab03e1ca0f8.pdf",
		"text": "https://archive.orkl.eu/71b645bc5d5437335e22afca89d64ab03e1ca0f8.txt",
		"img": "https://archive.orkl.eu/71b645bc5d5437335e22afca89d64ab03e1ca0f8.jpg"
	}
}