{
	"id": "67894507-7824-4408-8d81-930c6c3fbc2d",
	"created_at": "2026-04-06T00:16:16.129293Z",
	"updated_at": "2026-04-10T03:20:58.856754Z",
	"deleted_at": null,
	"sha1_hash": "d8f99075f078a54d5dac19dfd958dcde0af1509f",
	"title": "FinSpy VM Unpacking Tutorial Part 3: Devirtualization. Phase #4: Second Attempt at Devirtualization — Möbius Strip Reverse Engineering",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 111250,
	"plain_text": "FinSpy VM Unpacking Tutorial Part 3: Devirtualization. Phase #4:\r\nSecond Attempt at Devirtualization — Möbius Strip Reverse\r\nEngineering\r\nBy Rolf Rolles\r\nPublished: 2018-02-21 · Archived: 2026-04-05 18:58:50 UTC\r\n[Note: if you've been linked here without context, the introduction to Part #3 describing its four phases can be\r\nfound here.]\r\n1. Introduction\r\nIn Part #3, Phase #1, we deobfuscated the FinSpy VM bytecode program by removing the Group #2 instructions.\r\nIn Part #3, Phase #2, we made a first attempt to devirtualize the FinSpy VM bytecode program back into x86\r\ncode. This was mostly successful, except for a few issues pertaining to functions and function calls, which we\r\nexamined in Part #3, Phase #3.\r\nNow we are ready to take a second stab at devirtualizing our FinSpy VM sample. We need to incorporate the\r\ninformation from Part #3, Phase #3 into our devirtualization of X86CALLOUT instructions. After having done so,\r\nwe will take a second look at the devirtualized program to see whether any issues remain. After addressing one\r\nmore major observation and a small one, our devirtualization will be complete.\r\n2. Devirtualization, Take Two\r\nWe are finally ready to choose an address in the original FinSpy sample at which to insert the devirtualized code,\r\ndevirtualize the FinSpy VM program, and copy the devirtualized machine code into the original binary. I chose the\r\naddress 0x500000, for no particular reason other than that it was after any of the existing sections in the binary.\r\nIf everything we've done so far has worked correctly, now we have all of the information we need to generate\r\nproper functions in our devirtualized program. We have a set containing the non-virtualized functions called by the\r\nFinSpy VM program. For virtualized function targets, we have a list containing tuples of the function addresses,\r\nthe VM instruction key corresponding to the first virtualized instruction in the function, and a list of prologue\r\nbytes to prepend before the devirtualization of the first virtualized instruction. \r\nWe derive two dictionaries from the virtualized function information. \r\n1. The dictionary named X86_VMENTRY_TO_KEY_DICT maps an X86CALLOUT target to the VM\r\ninstruction key corresponding to the beginning of the virtualized function body. \r\n2. The dictionary named KEY_TO_PROLOGUE_BYTES_DICT maps the VM instruction key to the copied\r\nx86 prologue machine code bytes for the function beginning at the VM instruction with that key.\r\nNow we make two changes to our instruction-by-instruction devirtualization process:\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 1 of 15\n\nIn the loop that iterates over all VM bytecode instructions and produces the devirtualized output, consult\r\nKEY_TO_PROLOGUE_BYTES_DICT to see if the instruction corresponds to the beginning of a\r\nvirtualized function. If so, insert the prologue bytes before devirtualizing the instruction.\r\nWhen devirtualizing X86CALLOUT instructions, look up the address of the target in the\r\nNOT_VIRTUALIZED set. \r\nIf the target is in the set, then nothing special needs to be done to devirtualize the X86CALLOUT\r\ninstruction; emit an x86 CALL instruction with a dummy displacement DWORD of 0x0, and a\r\nfixup to later replace the 0x0 value with the proper distance from the source to the target (similarly\r\nto how we devirtualized VM jump instructions).\r\nIf the target is not in the set, then we need to generate an x86 CALL instruction to the devirtualized\r\naddress of the target's VM instruction key. Emit a dummy x86 CALL instruction as before. Also\r\ngenerate a fixup specifying the offset of the dummy 0x0 displacement DWORD in the x86 CALL\r\ninstruction, and the target of the X86CALLOUT instruction.\r\nAfter the instruction-by-instruction devirtualization process, we need to process the fixups generated for the two\r\nvarieties of X86CALLOUT instructions mentioned above (i.e., based on whether the destination is virtualized or\r\nnot).\r\nHere is partial code from the second approach at devirtualization.\r\n# This is the same devirtualization function from before, but\r\n# modified to devirtualize X86CALLOUT instructions and insert\r\n# function prologues where applicable.\r\n# It has a new argument: \"newImageBase\", the location in the\r\n# FinSpy-virtualized binary at which we emit our devirtualized\r\n# code.\r\ndef RebuildX86(insns, newImageBase):\r\n # These are the same as before:\r\n mcArr = [] # Machine code array into which we generate code\r\n locsDict = dict() # VM location -\u003e x86 position dictionary\r\n keysDict = dict() # VM key -\u003e x86 position dictionary\r\n locFixups = [] # List of fixup locations for jumps\r\n # New: fixup locations for calls to virtualized functions\r\n keyFixups = []\r\n # New: fixup locations for calls to non-virtualized functions\r\n binaryRelativeFixups = []\r\n # Same as before: iterate over all instructions\r\n for i in insns:\r\n # Same as before: memorize VM position/key to x86 mapping\r\n currLen = len(mcArr)\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 2 of 15\n\nlocsDict[i.Pos] = currLen\r\nkeysDict[i.Key] = currLen\r\n # New: length of prologue instructions inserted before\r\n # devirtualized FinSpy VM instruction. Only obtains a\r\n # non-zero value if this instruction corresponds to the\r\n # beginning of a virtualized function.\r\n prologueLen = 0\r\n # New: is this VM instruction the beginning of a\r\n # virtualized function?\r\n if i.Key in KEY_TO_PROLOGUE_BYTES_DICT:\r\n # Get the prologue bytes that should be inserted\r\n # before this VM instruction.\r\n prologueBytes = KEY_TO_PROLOGUE_BYTES_DICT[i.Key]\r\n # Increase the length of the instruction.\r\n prologueLen += len(prologueBytes)\r\n # Copy the raw x86 machine code for the prologue\r\n # into the mcArr array before devirtualizing the\r\n # instruction.\r\n mcArr.extend(prologueBytes)\r\n # Now devirtualize the instruction. Handling of\r\n # \"Raw X86\", \"Indirect Call\", and jumps are identical\r\n # to before, so the code is not duplicated here.\r\n # Is this an \"X86CALLOUT\" (\"Direct Call\")?\r\n if isinstance(i,RawX86Callout):\r\n # New: emit 0xE8 (x86 CALL disp32)\r\n mcArr.append(0xE8)\r\n # Was the target a non-virtualized function?\r\n if i.X86Target in NOT_VIRTUALIZED:\r\n # Emit a fixup from to the raw target\r\n binaryRelativeFixups.append((i.Pos,prologueLen+1,i.X86Target))\r\n # Otherwise, the target was virtualized\r\n else:\r\n # Emit a fixup to the devirtualized function body\r\n # specified by the key of the destination\r\n keyFixups.append((i.Pos,prologueLen+1,i.X86Target))\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 3 of 15\n\n# Write the dummy destination DWORD in the x86 CALL\r\n# instruction that we just generated. This will be\r\n # fixed-up later.\r\n mcArr.extend([0x00, 0x00, 0x00, 0x00])\r\nThe Python code above generates additional fixup information for devirtualized X86CALLOUT instructions. The\r\ntwo cases of the destination being virtualized or not are handled similarly, though they are placed in two different\r\nlists (\"keyFixups\" for virtualized targets, and \"binaryRelativeFixups\" for non-virtualized targets). After the main\r\ndevirtualization loop shown above, we must process the fixups just generated, the same way we did for the jump\r\ninstructions. The process of applying the fixups is nearly identical to what we did for jump instructions, except\r\nthat for virtualized targets, we need to determine the VM instruction key corresponding to the x86 address of the\r\nX86CALLOUT target. Here is the code for fixing up calls to virtualized functions:\r\n# Fixups contain:\r\n# * srcBegin: beginning of devirtualized CALL instruction\r\n# * srcFixup: distance into devirtualized CALL instruction\r\n# where displacement DWORD is located\r\n# * dst: the X86CALLOUT target address\r\nfor srcBegin, srcFixup, dst in keyFixups:\r\n # Find the machine code address for the source\r\n mcSrc = locsDict[srcBegin]\r\n # Lookup the x86 address of the target in the information\r\n # we extracted for virtualized functions. Extract the key\r\n # given the function's starting address.\r\n klDst = X86_VMENTRY_TO_KEY_DICT[dst]\r\n # Find the machine code address for the destination\r\n mcDst = keysDict[klDst]\r\n # Set the displacement DWORD within x86 CALL instruction\r\n StoreDword(mcArr, mcSrc+srcFixup, mcDst-(mcSrc+srcFixup+4))\r\nNext, and more simply, here is the code for fixing up calls to non-virtualized functions:\r\n# Same comments as above\r\nfor srcBegin, srcFixup, dst in binaryRelativeFixups:\r\n # Find the machine code address for the source\r\n mcSrc = locsDict[srcBegin]\r\n # Compute the distance between the end of the x86\r\n # CALL instruction (at the address at which it will\r\n # be stored when inserted back into the binary) and\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 4 of 15\n\n# the raw x86 address of the X86CALLOUT target\r\nfixup = dst-(newImageBase+mcSrc+srcFixup+4)\r\n # Set the displacement DWORD within x86 CALL instruction\r\n StoreDword(mcArr, mcSrc+srcFixup, fixup)\r\n3. Inspecting the Devirtualization\r\nNow we are in a similar place to where we were after our initial devirtualization attempt in Part #3, Phase #2; let's\r\nlook at the devirtualized code in IDA and see if anything jumps out as being obviously incorrect.\r\nIDA's navigation bar shows a few things.\r\nThe first third of the binary -- in the transparently-colored regions -- contains data defined as arrays. IDA\r\nhas not identified code in these regions.\r\nThe red-colored areas have properly been indentified as code, but don't have any incoming references,\r\ntherefore they have not been defined as functions.\r\nThese two issues are related: if the regions currently marked as data are actually code, and if they make function\r\ncalls to the code in red, then perhaps IDA can tell us that the red regions do have incoming references and should\r\nbe defined as functions. I selected the entire text section, undefined it by pressing 'U', and then selected it again\r\nand pressed 'C' to turn it into code. The result was much more pleasing:\r\nNow the whole devirtualized blob is defined as code. There is still an obvious cluster of functions that don't have\r\nincoming references.\r\nNext we list the remaining issues that appear when inspecting the new devirtualization.\r\n3.1 Some Functions Don't Have Incoming References\r\nAs we just saw from the navigation bar, there is a cluster of functions with no incoming references. Furthermore,\r\ninspecting these functions shows that they all lack prologues, like we noticed originally for all functions in our\r\nfirst devirtualization. If we turn them into functions, IDA makes its objections well-known with its black-on-red\r\ntext:\r\nSo apparently, our prologue extraction scripts have missed these functions. We'll have to figure out why.\r\n3.2 Many Call Instructions Have Invalid Destinations\r\nIDA's \"Problems\" window (View-\u003eOpen Subviews-\u003eProblems) helpfully points us to another category of errors.\r\nMany function calls have unresolved addresses, which IDA highlights as black-on-red text.\r\nThese issues have an innocuous explanation. In this Phase #4, we made the decision to choose the address\r\n0x500000 as the base address at which to install the devirtualized code within the original binary. The x86 CALL\r\ninstructions targeting non-virtualized functions are thus computed relative to an address in that region of the\r\nbinary. Since we are currently inspecting the .bin file on its own, its base address is 0x0, and not 0x500000 like it\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 5 of 15\n\nwill be when we insert it the devirtualized code into IDA. The x86 CALL displacements are indeed nonsensical at\r\nthe moment, but we'll double check on them after we've inserted the devirtualized code back into the binary.\r\n3.3 One Call in Particular is Weird\r\nAll of the x86 CALL instructions described in the previous issue have displacements that begin with the nibbles\r\n0xFFF....., indicating that the destination of those CALL instructions lies at an address located physically before\r\nthe CALL instruction. However, one x86 CALL instruction at the beginning of a devirtualized function has a\r\npositive displacement, also colored black-on-red:\r\nseg000:00000E70 sub_E70 proc near\r\nseg000:00000E70  push  0B6Ch\r\nseg000:00000E75  push  40B770h\r\nseg000:00000E7A  call  near ptr 6D85h ; \u003c- bogus destination\r\nI looked at the corresponding function in the original binary from which this prologue had been copied, and the\r\nsituation became clear.\r\n.text:00404F77  push  0B6Ch\r\n.text:00404F7C  push  offset stru_40B770\r\n.text:00404F81  call  __SEH_prolog\r\n.text:00404F86  xor  esi, esi\r\n.text:00404F88  mov  [ebp-20h], esi\r\n.text:00404F8B  push  edi  ; Save obfuscation register #1\r\n.text:00404F8C  push  ebp  ; Save obfuscation register #1\r\n.text:00404F8D  mov  ebp, offset word_411A6E ; Junk obfuscation\r\n.text:00404F92  shrd  edi, esi, 0Eh  ; Junk obfuscation\r\nThe prologue for the pre-virtualized function installed an exception handler by calling __SEH_prolog. Our\r\nprologue extraction script simply copied the raw bytes for the prologue. Since x86 CALL instructions are encoded\r\nrelative to their source and destination addresses, we can't simply copy a CALL instruction somewhere else\r\nwithout updating the destination DWORD; if we don't, the destination will be incorrect.\r\nSince this issue appeared only once, instead of re-architecting the prologue extraction functionality to deal with\r\nthis special case, I decided to just manually byte-patch my devirtualized code once I've copied it into the original\r\nbinary. If I wanted to write a more fully-automated FinSpy devirtualization tool, I would tackle this issue more\r\njudiciously.\r\n3.4 What are These Function Pointers?\r\nThe second devirtualization contains many pointers that reference hard-coded addresses within the original\r\nFinSpy binary from which we extracted the VM bytecode. For example, the following example references a\r\nfunction pointer and an address in the .text section:\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 6 of 15\n\nseg000:00004BB9 push 0\r\nseg000:00004BBE push 0\r\nseg000:00004BC3  push  400h\r\nseg000:00004BC8  push  0FFFFh\r\nseg000:00004BCD  call  dword ptr ds:401088h\r\nseg000:00004BD3  mov  eax, ds:41FF38h\r\nseg000:00004BD8  push  0\r\nseg000:00004BDD  call  dword ptr [eax+38h]\r\nSince we are examining the devirtualized code in isolation from the original binary, IDA cannot currently provide\r\nus meaningful information about the addresses in question. We can check the addresses in the FinSpy sample IDB\r\nto see if they make any sense; for example, here's the address referenced by the CALL: \r\n.idata:00401088  ; LRESULT __stdcall SendMessageW(HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lPara\r\n.idata:00401088  extrn SendMessageW:dword\r\nThings look good; we see four arguments pushed in the code above, and the function pointer references a function\r\nwith four arguments. Once we've inserted our devirtualization back into the original binary, IDA will resolve the\r\nreferences seamlessly, and allow us to make full use of its normal facilities for cross-referencing, naming, type\r\ninference, and parameter tracking.\r\nI also noticed that some of the instructions made reference to items within the original binary's .text section. \r\nseg000:0000333E  mov  dword ptr [esi+4], 4055D5h\r\nseg000:00003345  mov  dword ptr [esi], 40581Eh\r\n; ... later ...\r\nseg000:000034C4  mov  dword ptr [esi+4], 40593Ch\r\nseg000:000034CB  mov  dword ptr [esi], 4055FEh\r\n; ... later ...\r\nseg000:000034DC  mov  dword ptr [esi+4], 40593Ch\r\nseg000:000034E3  mov  dword ptr [esi], 405972h\r\n; ... more like the above ...\r\nLooking at these addresses in the original binary, I found that they corresponded to virtualized functions in the\r\n.text section. For example, here are the contents of the first pointer from the snippet above -- 0x4055D5 -- within\r\nthe original binary:\r\n.text:004055D5  mov  edi, edi  ; Original prologue\r\n.text:004055D7  push  ebp  ; Original prologue\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 7 of 15\n\n.text:004055D8 mov ebp, esp ; Original prologue\r\n.text:004055DA push ebp ; Push obfuscation register #1\r\n.text:004055DB  push  esi  ; Push obfuscation register #2\r\n.text:004055DC  mov  esi, offset word_41CCBA ; Junk obfuscation\r\n.text:004055E1  mov  ebp, 7C9E085h  ; Junk obfuscation\r\n.text:004055E6  bswap  ebp  ; Junk obfuscation\r\n.text:004055E8  pop  esi  ; Pop obfuscation register #2\r\n.text:004055E9  pop  ebp  ; Pop obfuscation register #1\r\n.text:004055EA  push  5A329Bh  ; Push VM instruction entry key\r\n.text:004055EF  push  ecx  ; Obfuscated JMP\r\n.text:004055F0  sub  ecx, ecx  ; Obfuscated JMP\r\n.text:004055F2  pop  ecx  ; Obfuscated JMP\r\n.text:004055F3  jz  GLOBAL__Dispatcher  ; Enter FinSpy VM\r\nAnd it turns out that the VM key pushed by this sequence, namely 0x5A329B, references one of the functions in\r\nthe devirtualized binary which otherwise did not have a incoming reference. Great! We would like to extract the\r\naddresses of the pointed-to functions so that we can process them with the scripts we developed in Part #3, Phase\r\n#3 in order to extract their prologues. We'd also like to alter the raw x86 instructions that reference the function\r\npointers to make them point to their devirtualized targets within the devirtualized blob instead.\r\n4. Next Step: Function Pointers\r\nAt this point, only two issues remain. First, we noticed that some devirtualized functions still don't have\r\nprologues. The explanation for this behavior must be that the addresses of their virtualized function stubs must not\r\nhave been passed to the scripts. If we had provided those virtualized functions' addresses, our scripts would have\r\nfound something for their prologues, even if it was an incorrect prologue. Yet the scripts found nothing.\r\nSecondly, we noticed that the devirtualized code contained function pointers referencing the addresses of the\r\nprologue-lacking functions from the previous paragraph. We would like to replace the raw function pointers\r\nwithin the x86 instructions with the addresses of the corresponding functions in the devirtualized code.\r\n4.1 Extracting Function Pointers from the VM Bytecode Disassembly\r\nIt seems like the first step in resolving both issues is to locate the function pointers within the FinSpy VM. I took a\r\nlook at the raw FinSpy VM instructions from the snippet above with the function pointer references.\r\nHere's the first VM instruction:\r\n0x030630: X86 mov dword ptr [esi+4h], 4055D5h\r\nHere's the raw bytes that encode that VM instruction:\r\nseg000:00030630 dd 5A5C54h ; \u003c- VM instruction key\r\nseg000:00030634 db 1Bh  ; \u003c- Opcode: raw x86\r\nseg000:00030635 db  7  ; \u003c- Length of x86 instruction: 7\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 8 of 15\n\nseg000:00030636 db 3 ; \u003c- Fixup offset: #3\r\nseg000:00030637 db 0 ; \u003c- Unused\r\nseg000:00030638 db 0C7h  ; \u003c- x86: mov dword ptr [esi+4], 55D5h\r\nseg000:00030639 db 46h \r\nseg000:0003063A db  4\r\nseg000:0003063B db 0D5h  ; \u003c- 3: offset of 0x55D5 in x86 instruction\r\nseg000:0003063C db 55h\r\nseg000:0003063D db  0\r\nseg000:0003063E db  0\r\nThe important thing to notice is that the x86 machine code contained within this instruction disassembles to:\r\nmov dword ptr [esi+4], 55D5h\r\nWhereas the x86 instruction shown in the VM bytecode disassembly listing is, instead:\r\nmov dword ptr [esi+4h], 4055D5h\r\nThe difference is in the DWORD value (55D5h in the raw VM bytecode versus 4055D5h in the VM bytecode\r\ndisassembly). \r\nThe reason for this difference lies in the line labeled \"Fixup offset: #3\". You may recall from part two that all\r\nFinSpy VM instructions have two byte fields at offsets +6 and +7 into the VM instruction structure that were\r\nnamed \"RVAPosition1\" and \"RVAPosition2\". To quote the description of those fields from part two:\r\n\"Some instructions specify locations within the x86 binary. Since the binary's base address may change when\r\nloaded as a module by the operating system, these locations may need to be recomputed to reflect the new base\r\naddress. FinSpy VM side-steps this issue by specifying the locations within the binary as relative virtual addresses\r\n(RVAs), and then adding the base address to the RVA to obtain the actual virtual addresses within the executing\r\nmodule. If either of [RVAPosition1 or RVAPosition2] is not zero, the FinSpy VM treats it as an index into the\r\ninstruction's data area at which 32-bit RVAs are stored, and fixes the RVA up by adding the base address to the\r\nRVA.\"\r\nIn a bit of unplanned, happy serendipity, when I was writing my FinSpy VM bytecode disassembler, I made a\r\nPython class called \"GenericInsn\" that served as the base class for all other Python representations of FinSpy VM\r\ninstruction types. Its Init() method is called in the constructor for every VM instruction type. And in particular,\r\nInit() includes the following code:\r\nif self.Op1Fixup:\r\n ApplyFixup(self.Remainder, self.Op1Fixup \u0026 0x7F, self.Pos)\r\nif self.Op2Fixup:\r\n ApplyFixup(self.Remainder, self.Op2Fixup \u0026 0x7F, self.Pos)\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 9 of 15\n\nThus, we are in the fortunate position where FinSpy VM helpfully tags all pointers to items in the original binary\r\nby setting these RVAPosition1 and RVAPosition2 fields. And furthermore, our existing function \"ApplyFixup\"\r\nalready receives all of these values when we disassemble a FinSpy VM bytecode program. Thus, all we need to do\r\nto extract the function pointers is to include some logic inside of ApplyFixup that detects when one of these\r\nembedded RVAs refers to a function pointer, and if it does, to store the virtual address of the function pointer into\r\na global set. The logic I used to determine function pointers was simply checking whether the virtual address was\r\nbetween the beginning of the first function in the .text section, and the last address in the .text section.\r\nTo wit, I changed my implementation of ApplyFixup as follows:\r\n# New: constants describing the first function in the\r\n# .text section and the end of the .text section.\r\nTEXT_FUNCTION_BEGIN = 0x401340\r\nTEXT_FUNCTION_END = 0x41EFC6\r\n# New: a global dictionary whose keys are fixed-up\r\n# virtual addresses, and whose values are lists of\r\n# VM instruction positions whose bodies reference\r\n# those virtual addresses.\r\nfrom collections import defaultdict\r\nALL_FIXED_UP_DWORDS = defaultdict(list)\r\n# Existing ApplyFixup function\r\ndef ApplyFixup(arr, FixupPos, InsnPos):\r\n # New: Python scoping statement\r\n global ALL_FIXED_UP_DWORDS\r\n # Existing ApplyFixup logic\r\n OriginalDword = ExtractDword(arr, FixupPos)\r\n FixedDword = OriginalDword + IMAGEBASE_FIXUP\r\n StoreDword(arr, FixupPos, FixedDword)\r\n # New: if the fixed-up DWORD is in the .text\r\n # section, save it in ALL_FIXED_UP_DWORDS and\r\n # add the VM instruction position (InsnPos) to\r\n # the list of positions referencing that DWORD.\r\n if FixedDword \u003e= TEXT_FUNCTION_BEGIN and FixedDword \u003c= TEXT_FUNCTION_END:\r\n ALL_FIXED_UP_DWORDS[FixedDword].append(InsnPos)\r\n4.2 Extracting Prologues from Virtualized Function Pointers\r\nNext, I also wanted to treat these virtual addresses as though they were the beginnings of virtualized functions, so\r\nthat my existing machinery for extracting function prologues would incorporate them. In Part #3, Phase #3, I had\r\nwritten a function called \"ExtractCalloutTargets\" that scanned the FinSpy VM instructions looking for\r\nX86CALLOUT instructions and extracted their target addresses. This was then passed to the function prologue\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 10 of 15\n\nextraction scripts to collect the data that was used in this Phase #4 to devirtualize X86CALLOUT instructions and\r\ninsert the function prologues from virtualized functions into the devirtualization.\r\nIt seemed natural to modify ExtractCalloutTargets to incorporate the virtual addresses we collected in the previous\r\nsubsection. To wit, I modified that function as such:\r\n# Existing ExtractCalloutTargets function\r\ndef ExtractCalloutTargets(insns, vmEntrypoint):\r\n # New: Python scoping statement\r\n global ALL_FIXED_UP_DWORDS\r\n # New: initialize the set to the function pointer\r\n # addresses collected in ApplyFixup\r\n calloutTargets = set(ALL_FIXED_UP_DWORDS.keys())\r\n # Existing: add vmEntrypoint to set\r\n calloutTargets.add(vmEntrypoint)\r\n # Existing: extract X86CALLOUT targets\r\n for i in insns:\r\n if isinstance(i,RawX86Callout):\r\n if i.X86Target not in NOT_VIRTUALIZED:\r\n calloutTargets.add(i.X86Target)\r\n # Existing: return list of targets\r\n return list(calloutTargets)\r\nNow I ran the function prologue extraction scripts from Part #3, Phase #3 again, to re-generate the virtualized\r\nfunction prologue and VM instruction entry keys for the function pointers in addition to the existing data for the\r\nX86CALLOUT targets. I then pasted the output data back into the second devirtualization program we wrote in\r\nthis Phase #4 (remembering to copy in the data I'd generated manually for those virtualized functions without junk\r\nobfuscation sequences), and ran the devirtualization again. This time, the unreferenced functions had proper\r\nprologues, though they were still unreferenced.\r\n4.3 Fixing the Function Pointers in the Devirtualized x86 Code\r\nThe last remaining issue is that the devirtualized x86 instructions which reference the function pointers still use\r\nthe addresses of the virtualized functions in the .text section, whereas we want to modify them to point to their\r\ndevirtualized equivalents instead. This is implemented in the RebuildX86 devirtualization function after the\r\nmachine code array for the devirtualized program has been fully generated.\r\nFortunately for us, we already know which VM instructions reference the function pointers -- we collected that\r\ninformation when we modified ApplyFixup() to locate and log virtualized function pointers. Not only did we log\r\nthe virtual addresses of the purported function pointers, but we also logged a list of VM instruction positions\r\nreferencing each such function pointer in the ALL_FIXED_UP_DWORDS dictionary.\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 11 of 15\n\n4.3.1 A Slight Complication\r\nA slight complication lead me to a solution that perhaps could have been more elegant. Namely, we collect the\r\npositions of the VM instructions referencing function pointers within ApplyFixup() at the time that we\r\ndisassemble the VM bytecode program. However, the simplifications in Part #3, Phase #1 can potentially merge\r\ninstructions together when collapsing patterns of VM instructions into smaller sequences. Therefore, it might be\r\nthe case that the VM instruction positions that we have collected no longer refer to valid locations in the VM\r\nprogram after the simplifications have been applied. However, we'd still expect the function pointers to appear in\r\nthe machine code for the VM instructions into which the removed instruction was merged.\r\nTo work around this issue, I made use of the locsDict dictionary that we generated through devirtualization.\r\nNamely, that dictionary recorded the offset within the devirtualized x86 blob of each VM instruction processed in\r\nthe main devirtualization loop. We find the offset within the devirtualized x86 machine code array of the prior VM\r\ninstruction with an entry within locsDict, and we find the devirtualized offset of the next VM instruction with an\r\nentry within locsDict. This gives us a range of bytes to search in the devirtualized machine code looking for the\r\nbyte pattern corresponding to the function pointer for the virtualized function. Once found, we can replace the raw\r\nbytes with the address of the devirtualized function body for that virtualized function.\r\n4.3.2 Locating the Function Pointers in the Devirtualized Blob\r\nHere is the code for locating function pointers as just described; if it is still unclear, read the prose remaining in\r\nthis subsection.\r\n# dword: the virtual address of a virtualized function\r\n# posList: the list of VM instruction positions\r\n# referencing the value of dword\r\nfor dword, posList in ALL_FIXED_UP_DWORDS.items():\r\n # For each position referencing dword:\r\n for pos in posList:\r\n # Set the low and high offset within the\r\n # devirtualized blob to None\r\n lowPos,highPos = None,None\r\n # posSearchLow is the backwards iterator\r\n posSearchLow = pos\r\n # Continue while we haven't located a prior\r\n # instruction with a devirtualization offset\r\n while not lowPos:\r\n # Does posSearchLow correspond to a\r\n # devirtualized instruction? I.e., not\r\n # something eliminated by a pattern\r\n # substitution.\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 12 of 15\n\nif posSearchLow in locsDict:\r\n# Yes: get the position and quit\r\n lowPos = locsDict[posSearchLow]\r\n else:\r\n # No: move to the previous instruction\r\n posSearchLow -= INSN_DESC_SIZE\r\n # Now search for the next higher VM position\r\n # with a devirtualization offset\r\n posSearchHigh = pos+INSN_DESC_SIZE\r\n # Continue while we haven't located a later\r\n # instruction with a devirtualization offset\r\n while not highPos:\r\n # Does posSearchLow correspond to a\r\n # devirtualized instruction? I.e., not\r\n # something eliminated by a pattern\r\n # substitution.\r\n if posSearchHigh in locsDict:\r\n # Yes: get the position and quit\r\n highPos = locsDict[posSearchHigh]\r\n else:\r\n # No: move to the next instruction\r\n posSearchHigh += INSN_DESC_SIZE\r\nFor each instruction position X that references one of the pointers to virtualized functions, I locate the last VM\r\ninstruction at or before X in the locsDict array. This is implemented as a loop that tries to find X in locsDict. If\r\nlocsDict[X] exists, we save that value -- the offset of the corresponding devirtualized instruction within the\r\ndevirtualized blob. If locsDict[X] does not exist, then the VM instruction must have been removed by one of the\r\npattern simplifications, so we move on to the prior VM instruction by subtracting the size of an instruction -- 0x18\r\n-- from X. We repeat until we find an X that has been devirtualized; if X becomes 0x0, then we reach the\r\nbeginning of the VM instructions, i.e., devirtualized offset 0x0.\r\nWe do much the same thing to find the next VM instruction with a corresponding devirtualized offset: add the size\r\nof a VM instruction -- 0x18 -- to X and look it up in locsDict. If it's not a member of locsDict, add 0x18 and try\r\nagain. Once we find it, If X ever exceeds the last legal VM location, set the offset to the end of the machine code\r\narray.  Once we've found the next VM instruction's devirtualized position, we record it and stop searching.\r\n4.3.3 Rewriting the Function Pointers\r\nImmediately after the code just shown, we have now found a suitable range within the x86 machine code array\r\nthat ought to contain the raw bytes corresponding to the virtual address of a virtualized function referenced via\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 13 of 15\n\npointer. Next we simply byte-search this portion of the array looking for that address, and once found, replace the\r\naddress with that of the corresponding devirtualized function body. There is nothing especially complicated about\r\nthis; we simply consult the book-keeping metadata that we gathered through devirtualization to locate the\r\ndevirtualized offset of the virtualized function pointer, add the offset of the new image base at which we are\r\ninserting the devirtualized blob within the binary, and store the DWORD value at the found position within the\r\nmachine code array.\r\n4.3.4 Done\r\nAfter writing the code just described, now the pointers to virtualized functions have been modified to point to their\r\ndevirtualized counterparts. Here again are two references to virtualized functions before the modifications just\r\ndescribed:\r\nseg000:00003555  mov  dword ptr [esi+4], 4055D5h\r\nseg000:0000355C  mov  dword ptr [esi], 40581Eh\r\nAnd, the same code after modification:\r\nseg000:00003555  mov  dword ptr [esi+4], 50154Ah\r\nseg000:0000355C  mov  dword ptr [esi], 5017B3h\r\n5. Inserting the Devirtualized Code back Into the Binary\r\nThe last step before we can analyze the FinSpy dropper is to re-insert our devirtualized blob back into the binary.\r\nWe have already chosen an address for it: 0x500000, which was important in generating the devirtualized code.\r\nAt this point I struggled to load the devirtualized code with with IDA's File-\u003eLoad File-\u003eAdditional binary file...\r\nand Edit-\u003eSegments-\u003eCreate Segment menu selections. Although both of these methods allowed me to load the\r\nraw devirtualized machine code into the database, I experienced weird issues with both methods. Namely, the\r\ncross-section data references and/or function cross-references were broken. IDA might display the correct\r\naddresses for data items, and allow you to follow cross-references by pressing \"enter\" over an address, but it\r\nwould not show symbolic names or add cross-references. For example we might see something like this:\r\n.data:00504C3C call  dword ptr ds:401088h\r\nRather than what we see when things are working properly:\r\n.data:00504C3C call  ds:SendMessageW\r\nI tried screwing with every option in the two dialogs mentioned, especially the segment attributes (\"CODE\"\r\ninstead of \"DATA\"). For some attempts, the code references worked properly but the data references didn't; and\r\nfor other attempts, the opposite was true. More often neither would work. Igor Skochinsky from Hex-Rays has\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 14 of 15\n\nalways been very helpful to me, but this time he was away from his keyboard and did not hear my cries of anguish\r\nuntil it was too late. (Thanks anyway, Igor.)\r\nThat being the case, although it wasn't my first choice, I ended up enlarging the .data section via Edit-\u003eSegments-\r\n\u003eEdit Segment and then loading the binary contents with a one-liner in IDC:\r\nloadfile(fopen(\"mc-take2.bin\", \"rb\"), 0, 0x500000, 26840);\r\nAnd this time, everything worked. You can see the IDB with the devirtualized code here.\r\nFrom there I reverse engineered the devirtualized FinSpy dropper program. Whereas I was not impressed with the\r\nFinSpy VM, the dropper was more sophisticated than I was expecting. You can see a mostly-complete analysis in\r\nthe linked IDB; start reading from address 0x50207E, the address of WinMain() within the devirtualized code. I've\r\ntried to comment most of the assembly language, but a lot of the action is inside of Hex-Rays (i.e., look at those\r\nfunctions inside of Hex-Rays to see a lot of comments that aren't in the assembly language view).\r\n6. Conclusion \r\nFinSpy VM is weak. It's closer in difficulty to a crackme than to a commercial-grade VM. (I suppose it is slightly\r\nmore polished than your average VM crackme.) Having a \"Raw x86\" instruction in the FinSpy VM instruction set,\r\nwhere those instructions make up about 50% of the VM bytecode program, makes devirtualization trivial. \r\nI almost didn't publish this series because I personally didn't find anything interesting about the FinSpy VM. But,\r\nhopefully, through all the tedium I've managed to capture the trials and tribulations of writing a deobfuscator from\r\nscratch. If you're still reading at this point, hopefully you found something interesting about it, and hopefully this\r\nslog wasn't all for nothing. Hopefully next time you come across a FinSpy-protected sample, this series will help\r\nyou make short work of it.\r\nSource: https://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization\r\nPage 15 of 15",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"references": [
		"https://www.msreverseengineering.com/blog/2018/2/21/devirtualizing-finspy-phase-4-second-attempt-at-devirtualization"
	],
	"report_names": [
		"devirtualizing-finspy-phase-4-second-attempt-at-devirtualization"
	],
	"threat_actors": [],
	"ts_created_at": 1775434576,
	"ts_updated_at": 1775791258,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/d8f99075f078a54d5dac19dfd958dcde0af1509f.pdf",
		"text": "https://archive.orkl.eu/d8f99075f078a54d5dac19dfd958dcde0af1509f.txt",
		"img": "https://archive.orkl.eu/d8f99075f078a54d5dac19dfd958dcde0af1509f.jpg"
	}
}