{
	"id": "d2b6cd1a-8bcb-4017-bff9-34634372726a",
	"created_at": "2026-04-06T00:09:18.31293Z",
	"updated_at": "2026-04-10T13:11:25.377464Z",
	"deleted_at": null,
	"sha1_hash": "643776e0a555c13d0860a32816921de8024c1ec3",
	"title": "FinSpy VM Unpacking Tutorial Part 3: Devirtualization. Phase #1: Deobfuscating FinSpy VM Bytecode Programs — Möbius Strip Reverse Engineering",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 119207,
	"plain_text": "FinSpy VM Unpacking Tutorial Part 3: Devirtualization. Phase #1:\r\nDeobfuscating FinSpy VM Bytecode Programs — Möbius Strip\r\nReverse Engineering\r\nBy Rolf Rolles\r\nPublished: 2018-02-21 · Archived: 2026-04-05 15:57:24 UTC\r\n[Note: if you've been linked here without context, the introduction to Part #3 describing its four phases can be\r\nfound here.]\r\n1. Introduction\r\nIn part one of this series, we analyzed the obfuscation on the x86 implementation of the FinSpy VM, and wrote a\r\ntool to deobfuscate it to allow easier analysis. In the second part of this series, we analyzed the VM instruction set,\r\nwrote a disassembler that works for the running example, and obtained the VM bytecode listing. Now we are left\r\nwith devirtualization: we want to regenerate a facsimile of the original x86 program prior to virtualization.\r\nThis task winds up being fairly lengthy, so we have divided the approach into four phases, which corresponds to\r\nthe work I did in devirtualizing FinSpy in the same order in which I did it. (That being said, don't mistake\r\n\"lengthy\" for \"difficult\".) We begin Phase #1 by inspecting the FinSpy VM bytecode program, discovering\r\nobfuscation involving the \"Group #2\" instructions, and removing it via pattern-substitution.\r\n2. Devirtualizing FinSpy: Initial Observations and Strategy\r\nAt this point in the process, I had my first listing of the VM bytecode program in disassembled form. I began by\r\ninspecting the VM bytecode program and identifying ways to simplify the bytecode. This ultimately resulted in\r\nthe wholesale elimination of the Group #2 instructions.\r\nYou may wish to inspect the initial VM bytecode disassembly, which we shall progressively refine throughout this\r\nPhase #1. Along the way, after discovering and applying simplifications, we will gradually obtain new VM\r\nbytecode disassembly listings, which will be linked throughout.\r\n2.1 Recap of FinSpy VM Instruction Set and Bytecode Programs\r\nTo summarize part two, the FinSpy VM uses a fixed-length instruction encoding, with each VM instruction\r\nrepresented by a structure that is 0x18 bytes in length. \r\nEach instruction within a FinSpy VM program has two uniquely identifying characteristics associated with it. For\r\none, we can refer to the raw position of the VM instruction within the array of VM bytecode instructions. So for\r\nexample, the first instruction is located at position 0x0. Since each instruction is 0x18 bytes in length, the second\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 1 of 15\n\ninstruction is located at position 0x18, and generally instruction #N is located at position 0x18*N within the\r\nbytecode array.\r\nThe second identifying characteristic for an instruction is its \"key\", a 32-bit value (the first DWORD in an\r\nencoded FinSpy VM instruction) that can be used to locate a particular VM instruction. Specifically, before\r\nentering the VM, the x86 code that executes before transferring control to the FinSpy VM interpreter first pushes a\r\nkey DWORD onto the stack. Upon entering the VM, the FinSpy VM initalization code loads the key DWORD\r\npushed by the x86 portion, searches through the VM bytecode array to locate the VM instruction with that key,\r\nand then begins interpreting VM instructions starting from that location. (It turns out that FinSpy VM instruction\r\nkeys cause some complications in devirtualization, as we shall see later.)\r\nMost FinSpy VM instructions are assumed to execute sequentially. That is, after one VM instruction completes,\r\nthe x86 code for the instruction handler adds 0x18 to the current VM EIP, to advance it to the next VM instruction.\r\nVM control flow instructions may behave differently, e.g., the conditional branch instructions are relatively-addressed like their x86 counterparts (if the branch is taken, the VM instruction adds a displacement, a multiple of\r\n0x18, to VM EIP; if the branch is not taken, execution continues at the VM instruction 0x18 bytes subsequent to\r\nthe current instruction). The FinSpy VM also has an instruction for direct calls, and another one for indirect calls.\r\nThese call instructions behave like you'd expect; they push a return address onto the stack to re-enter the VM after\r\nthe called function returns.\r\nThe FinSpy VM's instruction set consists of three instruction groups:\r\nGroup #1: Conditional and unconditional jumps, (namely, JMP and all 16 of x86's conditional branches\r\nsuch as JZ, JNS, and JP).\r\nGroup #2: VM instructions that access the FinSpy VM's single dedicated register.\r\nGroup #3: VM instructions that contain raw x86 instructions inside of them.\r\n2.2 Preliminary Thoughts on Devirtualization\r\nAfter analyzing the VM's instruction set and perusing the VM bytecode program, it seemed like the group #3 VM\r\ninstructions -- those with embedded x86 machine code blobs -- would be easy to convert back into x86. There are\r\nthree VM instructions in this group: \"Raw X86\", \"Call Direct\", and \"Call Indirect\". (In practice, the latter two VM\r\ninstructions wound up presenting more complications than anticipated, with direct calls being the most difficult\r\naspect of the VM to devirtualize.)\r\nGroup #1, the conditional and unconditional branch instructions, seemed similarly easy to translate back to x86,\r\nsince their implementations are virtually identical to how x86 conditional branches operate internally. Since\r\nconditional branches use relative displacements to determine the address of the \"taken\" branch, the only challenge\r\nis to determine the relative locations of each devirtualized instruction. Thus, once we know exactly how far the\r\nbranch instruction is from its destination, we can simply compute the displacement. (Indeed, in practice, this was\r\neasy.)\r\nGroup #2, the set of VM instructions that access the FinSpy VM's single dedicated register, was the wild-card.\r\nAlthough from analyzing the FinSpy VM harness I knew the raw functionality of these instructions, I did not\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 2 of 15\n\nknow before analyzing the VM program how these instructions would be used, and nor how to devirtualize them.\r\nFor ease of reference, those instructions are summarized below from their more detailed expositions in part two:\r\nmov scratch, 0  [Operand: none]\r\nmov scratch, imm32  [Operand: imm32]\r\nshl scratch, imm32  [Operand: imm32]\r\nadd scratch, imm32  [Operand: imm32]\r\nmov scratch, savedReg32  [Operand: index of savedReg32]\r\nadd scratch, savedReg32  [Operand: index of savedReg32]\r\nmov savedReg32, scratch  [Operand: index of savedReg32]\r\nmov dword ptr [scratch], savedReg32 [Operand: index of savedReg32]\r\nmov scratch, dword ptr [scratch] [Operand: none]\r\nmov dword ptr [scratch], imm32  [Operand: imm32]\r\nmov dword ptr [imm32], scratch  [Operand: imm32]\r\npush scratch  [Operand: none]\r\nGiven that the other two groups seem straightforward, and the heretofore-unknown nature of Group #2, this seems\r\nlike a good first task to devirtualize our FinSpy VM program: analyze the usage patterns of group #2 instructions,\r\nand formulate a strategy for converting them back to x86 machine code. (This turned out to be very easy and\r\nintuitive in practice, with no real complications to speak of.)\r\n3. Step #1: Tweak the Output to Print the x86 Disassembly\r\nUpon perusing the output of my FinSpy VM disassembler, my first observation was that my task would benefit\r\nfrom knowing the textual disassembly for the \"Raw x86\" instructions, which were being rendered as raw machine\r\ncode bytes. For example, the first few instructions in the VM bytecode disassembly were:\r\n0x000000: MOV SCRATCH, 0\r\n0x000018: ADD SCRATCH, EBP\r\n0x000030: ADD SCRATCH, 0x000008\r\n0x000048: MOV SCRATCH, DWORD PTR [SCRATCH]\r\n0x000060: MOV EAX, SCRATCH\r\n0x000078: X86 [199, 0, 215, 1, 0, 0] ; \u003c- x86 machine code\r\n0x000090: X86 [184, 0, 240, 65, 0] ; \u003c- would prefer x86 assembly\r\n0x0000a8: X86 [201]\r\n0x0000c0: X86 [194, 4, 0]\r\nMy proximate goal was to figure out how the Group #2 instructions (those involving the SCRATCH register, the\r\nfirst five in the listing above) were used within the VM program. For context, it would be nice to see what the X86\r\ninstructions on the four subsequent lines were doing. I.e., it would be convenient to have a textual representation\r\nfor the x86 machine code, instead of the above display as lists of bytes. \r\nFor this task, I was fortunate to have already written an x86 disassembler and assembler library in Python. (I have\r\npreviously released this code to the public as part of the course sample for my SMT-Based Program Analysis\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 3 of 15\n\ntraining class.) For the two group #3 VM instructions with embedded x86 machine code -- \"Raw X86\" and \"Call\r\nIndirect\" -- I modified my FinSpy VM disassembler to invoke the x86 disassembler functionality when printing\r\nthem. This was trivial to accomplish in the __str__() method for these Python classes, for example:\r\n d = X86Decoder(self.Remainder[0:self.DataLen])\r\n i2container = d.Decode(0)\r\nAfter doing this, the VM bytecode disassembly listing was easier to follow. An example is shown below; the full\r\ndisassembly listing can be found here.\r\n0x000000: MOV SCRATCH, 0\r\n0x000018: ADD SCRATCH, EBP\r\n0x000030: ADD SCRATCH, 0x000008\r\n0x000048: MOV SCRATCH, DWORD PTR [SCRATCH]\r\n0x000060: MOV EAX, SCRATCH\r\n0x000078: X86 mov dword ptr [eax], 1D7h ; \u003c- new: x86 assembly, not machine code\r\n0x000090: X86 mov eax, 41F000h\r\n0x0000a8: X86 leave\r\n0x0000c0: X86 ret 4h\r\nAfter some quick perusal, the x86 assembly language mostly looks like ordinary C code compiled with MSVC,\r\nthough some functions use instructions like PUSHA and POPA that are not ordinarily generated by normal\r\ncompilers, and hence were probably written in assembly language prior to virtualization.\r\n4. Step #2: Pattern-Based Simplification for Scratch Register Access\r\nHaving a clearer FinSpy VM bytecode disassembly listing, I returned to trying to figure out what the Group #2\r\n\"dedicated register\" instructions were accomplishing. It's easy enough to locate these instructions; all of them use\r\nthe scratch register, which is denoted \"SCRATCH\" in the VM bytecode disassembly, so you can simply search for\r\n\"SCRATCH\" to see where these instructions are used. (Or just look at the disassembly listing -- group #2\r\ninstructions comprise a significant fraction of the VM instructions for our sample.)\r\nHere I went a bit too fast and would have benefited from being a bit more deliberate. Inspecting the beginning of\r\nthe VM bytecode program, I noticed a few different patterns involving loading x86 register values into the scratch\r\nregister. For example, repeating the first two lines from the example in the last section, we see:\r\n0x000000: MOV SCRATCH, 0\r\n0x000018: ADD SCRATCH, EBP\r\nThis code first sets the scratch register to zero, and then adds the value of EBP. Effectively, this two-instruction\r\nsequence sets the scratch register to EBP. However, elsewhere there were different patterns of VM instructions\r\nused to accomplish the same goal. Here was the second pattern:\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 4 of 15\n\n0x03c270: MOV SCRATCH, 0\r\n0x03c288: MOV SCRATCH, EDX\r\nAnd still elsewhere in the FinSpy VM bytecode disassembly, we see more natural one-instruction sequences to\r\nload register values into the stack register, for example:\r\n0x001b48: MOV SCRATCH, ECX\r\nFaced with these observations, I mistakenly identified this as a form of obfuscation: it seemed as though there\r\nwere several ways to load the value of an x86 register into the scratch register, and that FinSpy was choosing\r\nbetween them arbitrarily in order to introduce some randomness into the VM bytecode program. Such randomness\r\ncould make pattern-recognition more laborious. It might end up that I would need to write multiple patterns to\r\nmatch these sequences, thereby multiplying the number of patterns I needed to write involving x86 register loads\r\nby the number of patterns that the FinSpy VM employed for loading an x86 register into the SCRATCH register.\r\nI figured that I could save work for myself down the road if I coalesced the two-instruction sequences down into a\r\ncanonical representation of loading a register into the stack, namely, one instruction sequences as in the third\r\nexample. (I later realized that this isn't really a form of obfuscation -- the FinSpy VM developers were just being\r\nlazy.)\r\nTo be specific, I wanted to write a search-and-replace rule that would take VM instruction sequences of the form:\r\nMOV SCRATCH, 0  | MOV SCRATCH, 0 \r\nADD SCRATCH, REG32 | MOV SCRATCH, REG32\r\nAnd replace them with one-instruction sequences of the form:\r\nMOV SCRATCH, REG32\r\nIt was easy enough to accomplish the pattern recognition and replacement using Python's type introspection\r\nfunctionality. Since my Python FinSpy VM disassembler from part two used Python class types for these\r\nindividual VM bytecode instruction types, I could simply invoke the \"isinstance\" function to find instances of\r\nthese patterns in the VM bytecode disassembly listing. The code for this can be found in the function\r\n\"FirstSimplify\" (search the code for that name to find the exact point of definition).\r\nFor instance, here is the code to identify the \"MOV SCRATCH, 0\" VM instruction. It encompasses two different\r\npossibilities: the dedicated VM instruction that moves 0 into SCRATCH, and the VM instruction that moves an\r\narbitrary DWORD value into SCRATCH, when the DWORD is zero.\r\ndef IsMovScratch0(insn):\r\n if isinstance(insn,MovScratch0):\r\n return True\r\n if isinstance(insn,MovScratchImm32):\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 5 of 15\n\nreturn insn.Imm32 == 0\r\nreturn False\r\nThe complete code for recognition and replacement is shown in the following sequence. Please take the time to\r\nexamine it, as all of the pattern-matching and replacement in the rest of this section have similar implementations,\r\nand won't be discussed in much detail for the sake of brevity.\r\n# If the first VM instruction is \"MOV SCRATCH, 0\"...\r\nif IsMovScratch0(i1):\r\n # Check to see if the second instruction is \"MOV SCRATCH, REG32\". If\r\n # this function returns None, it wasn't. If it returns something other\r\n # than None, then it returns the register number.\r\n mr = IsMovScratchReg(i2)\r\n # Otherwise, check to see if the instruction was \"ADD SCRACH, REG32\",\r\n # and get the register number if it was (or None if it wasn't).\r\n if mr is None:\r\n mr = IsAddScratchReg(i2)\r\n # Did one of the two patterns match?\r\n if mr is not None:\r\n # Yes: make a new VM instruction, namely \"MOV SCRATCH, REG32\" to\r\n # replace the two instruction sequence we just matched. Use the same\r\n # file offset position from the first instruction in the sequence.\r\n newInsn = MovScratchDisp8([0]*INSN_DESC_SIZE, i1.Pos)\r\n # Save the register number into the new instruction.\r\n newInsn.Disp8 = mr\r\n # Use the same VM instruction key as for the first instruction.\r\n newInsn.Key = i1.Key\r\n # Add the new VM instruction to the output list.\r\n newInsnArr.append(newInsn)\r\nIf one of the two patterns match, the code above generates a new VM instruction to replace the existing two. Since\r\neach VM instruction is uniquely associated with a position and key, we decided to copy those attributes from the\r\nfirst VM instruction within the matched pattern, and to delete the second VM instruction outright. (I worried that I\r\nmight encounter situations where other VM instructions would reference the second and now-deleted VM\r\ninstruction by its key or EIP, and in fact I had to account for this in Part #3, Phase #4.)\r\nThe new VM bytecode disassembly listing can be found here.\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 6 of 15\n\n(Note that I originally wrote these pattern replacements in OCaml, since ML-family languages are intentionally\r\ndesigned with robust facilities for these sorts of tasks. My first implementation had the Python program print out\r\nan OCaml representation of the FinSpy VM program, which an OCaml program then simplified via pattern-substitutions. However, given that only a few such pattern-replacements were necessary to deobfuscate FinSpy\r\nVM bytecode programs, I decided that the cognitive and programmatic overhead of serializing between the\r\nOCaml and Python programs was unnecessarily complicated, and so I re-wrote the pattern replacement code in\r\npure Python, despite the tedium.)\r\n4.1 The \"MOV SCRATCH, REG32\" / \"PUSH SCRATCH\" Pattern\r\nNext, I continued to inspect the VM bytecode disassembly listing and look for further patterns involving the\r\nSCRATCH register. I found my next candidate shortly after the previous one. In particular:\r\n0x000120: MOV SCRATCH, ECX\r\n0x000138: PUSH SCRATCH\r\nThis VM instruction sequence obviously replaces the x86 instruction \"push ecx\". As with the previous step, I\r\ndecided to codify this into a pattern simplification. Any time we see the VM instruction sequence:\r\nMOV SCRATCH, REG32\r\nPUSH SCRATCH\r\nWe want to replace that with a single VM instruction representing \"push reg32\". Mechanically, this works nearly\r\nidentically to the previous step; we use Python's isinstance() function to find occurrences of this VM instruction\r\npattern. The complete code can be found in the \"SecondSimplify\" function in the Python source code (search for\r\nthe function name). For ease of reference, here is how we identify \"MOV SCRATCH, REG32\" instructions:\r\ndef IsMovScratchReg(insn):\r\n if isinstance(insn,MovScratchDisp32):\r\n return insn.Disp32\r\n if isinstance(insn,MovScratchDisp8):\r\n return insn.Disp8\r\n return None\r\nTo replace these two-instruction sequences, we generate a FinSpy VM \"Raw x86\" instruction which contains the\r\nx86 machine code for the x86 PUSH operation. I use my x86 library to create the Python object for the\r\nreplacement x86 instruction \"push reg32\": in this example, the Python object for \"push ecx\" can be constructed as\r\nX86.Instruction([], XM.Push, X86.Gd(mr,True)).\r\nAfter generating the replacement x86 instruction object, I wrote a function to generate a FinSpy VM instruction\r\ncontaining the raw machine code. The function \"MakeRawX86\" from the simplification source code (search for\r\nthe function name) is reproduced here for ease of presentation:\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 7 of 15\n\ndef MakeRawX86(Pos, Key, x86Insn):\r\n# Create a FinSpy VM \"Raw X86\" instruction with dummy\r\n # content at the correct position (specified by Pos)\r\n newInsn = RawX86StraightLine([0]*INSN_DESC_SIZE, Pos)\r\n # Set the key to be the key from the first of the two\r\n # instructions of the two-instruction matched sequence\r\n newInsn.Key = Key\r\n # Encode the x86 instruction into machine code, store\r\n # the bytes in the FinSpy VM instruction\r\n newInsn.Remainder = EncodeInstruction(x86Insn)\r\n # Cache the length of the x86 instruction's machine code\r\n newInsn.DataLen = len(newInsn.Remainder)\r\n # Cache the textual disassembly for that instruction\r\n newInsn.X86 = str(x86Insn)\r\n # Return the FinSpy VM instruction just constructed\r\n return newInsn\r\nThe FinSpy VM \"Raw X86\" instruction object returned by the function above is used as the replacement for the\r\ntwo-VM instruction sequence \"MOV SCRATCH, REG32 / PUSH SCRATCH\". As the code shows, its two\r\nuniquely-identifying characteristics (VM instruction position and VM instruction key) are duplicated from the first\r\nof the two VM instructions in the pattern.\r\nAfter applying this substitution, the new VM bytecode disassembly listing can be found here.\r\n4.2 The \"MOV SCRATCH, REG32\" / \"MOV REG32, SCRATCH\" Pattern\r\nShortly thereafter was another, similar VM instruction pattern:\r\n0x000228: MOV SCRATCH, ESP\r\n0x000240: MOV ESI, SCRATCH\r\nClearly, this sequence virtualizes the x86 instruction \"mov esi, esp\". More generally, when we see two adjacent\r\nVM instructions of the form:\r\nMOV SCRATCH, REG32_1\r\nMOV REG32_2, SCRATCH\r\nWe can replace this with an x86 instruction \"mov reg32_2, reg32_1\". As in the previous case, after using Python\r\nisinstance() checks to locate instances of this pattern, we generate a Python object representing the x86 instruction\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 8 of 15\n\n\"mov esi, esp\", and call the MakeRawX86 function (detailed in the previous section) to generate a new \"Raw x86\"\r\nVM instruction to replace the two VM instructions comprising the pattern instance. This process can be seen\r\ncompletely in the function \"ThirdSimplify\" from the simplification code (search for the function name).\r\nAfter applying this substitution, the new VM bytecode disassembly listing can be found here.\r\n4.3 The \"MOV SCRATCH, 0\" / \"ADD SCRATCH, IMM32\" Pattern\r\nContinuing to look for patterns in the VM bytecode program, I saw the following:\r\n0x0037b0: MOV SCRATCH, 0\r\n0x0037c8: ADD SCRATCH, 0x420344\r\nThis sequence can obviously be replaced with the single-instruction sequence \"MOV SCRATCH, 0x420344\". The\r\nfunction \"FourthSimplify\" in the simplification code (search for the function name) implements the detection and\r\nsubstitution. Again, it's trivial to write this pattern-recognition and replacement code, though unlike the\r\npreviously-described pattern-substitutions, the replacement code is a Group #2 \"SCRATCH register\" FinSpy VM\r\ninstruction rather than a Group #3 \"Raw x86\" instruction, since the output involves the scratch register.\r\nAfter applying this substitution, the new VM bytecode disassembly listing can be found here.\r\n4.4 The \"MOV SCRATCH, IMM32\" / \"PUSH SCRATCH\" Pattern\r\nMore perusal of the VM bytecode disassembly turned up the following sequence:\r\n0x006900: MOV SCRATCH, 0x000003\r\n0x006918: PUSH SCRATCH\r\nAs with previous examples, we can replace this two-instruction sequence with the x86 instruction \"push 3\", by\r\nusing a \"Raw x86\" VM instruction. The function \"FifthSimplify\" implements this -- it identifies \"MOV\r\nSCRATCH, IMM32 / PUSH SCRATCH\" sequences, and replaces them with a \"Raw x86\" FinSpy VM instruction\r\nobject containing the machine code for an x86 \"push imm32\" instruction, again using identical techniques to those\r\ndiscussed previously.\r\nAfter applying this substitution, the new VM bytecode disassembly listing can be found here.\r\n4.5 Memory Address Patterns\r\nAfter performing the aforementioned replacements, we start to see a lot less variety in the remaining Group #2\r\ninstructions, and their further purpose quickly became obvious. Namely, each remaining cluster of VM bytecode\r\ninstructions using the SCRATCH register consisted of two consecutive components: 1) a sequence of FinSpy VM\r\ninstructions to load a memory address into the SCRATCH register, using what I've called a \"memory address\r\npattern\"; followed by 2) a sequence of FinSpy VM instructions utilizing using the memory address just generated,\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 9 of 15\n\nfor a purpose such as reading from the address or writing to it, using what I've termed a \"memory access pattern\".\r\nWe will discuss both such patterns in this section and the subsequent one.\r\nFor an example of both a memory address pattern and a memory access pattern, here is the very beginning of the\r\nVM bytecode program after all the prior replacements have executed:\r\n0x000000: MOV SCRATCH, EBP  ; Memory !address!\r\n0x000030: ADD SCRATCH, 0x000008  ; Memory !address!\r\n0x000048: MOV SCRATCH, DWORD PTR [SCRATCH] ; Memory !access!\r\n0x000060: MOV EAX, SCRATCH  ; Memory !access!\r\nThe first two VM instructions set the SCRATCH register to EBP+0x8. These two instructions comprise the\r\n\"memory address pattern\".\r\nThe last two VM instructions read a DWORD from the memory address contained in the SCRATCH register, and\r\nstore the result into EAX. These two instructions comprise the \"memory access pattern\".\r\nClearly, these four VM instructions can be replaced by the single x86 instruction \"mov eax, [ebp+8]\" -- which\r\nlooks very much like something we'd expect to see near the beginning of a function (suitably, given that the four\r\nVM instructions just shown are the first VM instructions in the bytecode program, and are situated toward the\r\nbeginning of an x86 function). \r\nFor a more complicated example:\r\n0x03c270: MOV SCRATCH, EDX  ; Memory !address!\r\n0x03c2a0: SHL SCRATCH, 0x000003 ; Memory !address!\r\n0x03c2b8: ADD SCRATCH, EAX  ; Memory !address!\r\n0x03c2d0: ADD SCRATCH, 0x000010 ; Memory !address!\r\n0x03c2e8: MOV EAX, SCRATCH  ; Memory !access!\r\nLet's trace the value that is written to EAX at the end. The first two VM instructions shift EDX left by 3; the third\r\nVM instruction adds EAX, and the fourth VM instruction then adds 0x10. Thus, the expression written to EAX is\r\n\"EDX\u003c\u003c3 + EAX + 0x10\". EDX\u003c\u003c3 is the same thing as EDX*8, so our expression is \"EDX*8 + EAX + 0x10\".\r\nThe format of the memory expression should look familiar -- it can be encoded as the legal x86 ModRM/32\r\nmemory expression \"[EAX+EDX*8+0x10]\". These first four VM instructions comprise a \"memory address\r\npattern\". The fifth and final VM instruction is the \"memory access pattern\" instance -- rather than reading from or\r\nwriting to this memory location, we simply store the memory address itself into EAX. Thus this VM instruction\r\nsequence performs the equivalent of the x86 instruction \"lea eax, [eax+edx*8+10h]\".\r\nIndeed, all of the memory address patterns appear to be creating memory addresses that would be legal to encode\r\nusing X86 ModRM memory expressions. X86 ModRM memory expressions contain one or more of the following\r\nelements added together:\r\nA 32-bit base register\r\nA 32-bit scale register, optionally multiplied by 2, 4, or 8\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 10 of 15\n\nA 32-bit displacement (an immediate DWORD)\r\nBy inspecting the VM bytecode program, all memory address sequences have an identical layout.\r\nIf an index register is being used, the first VM instruction of the sequence sets the SCRATCH register to\r\nthe value of the base register. \r\nIf the index register is multiplied by a scale factor, the next VM instruction is SHL SCRATCH, IMM32,\r\nwhere IMM32 is 1, 2, or 3 (to multiply the index register by *2, *4, or *8, respectively).\r\nIf a base register is used, the next instruction in the sequence adds the base register.\r\nIf a 32-bit displacement is used, the next VM instruction is an ADD SCRATCH, IMM32 instruction.\r\nAll of these elements are optional, though at least one must be present. So for example, if the memory address is a\r\nraw 32-bit value (e.g. dword ptr [401234h]), only the ADD SCRATCH, IMM32 VM instruction will be present. If\r\nthe memory expression consists of only a register (e.g., dword ptr [eax]), then only the MOV SCRATCH, REG32\r\nVM instruction will be present. Memory expressions that use more than one element will be virtualized by\r\ncombining the VM instructions for the elements present.\r\n4.6 Memory Access Patterns\r\nAfter a memory address pattern just described creates a memory expression in the SCRATCH register, the next\r\none or two VM instructions make up the memory access pattern, and dictate how the memory address is used\r\nthrough how they access the SCRATCH register. By examining the VM bytecode program, I found that there were\r\nfour distinct VM instruction sequences used for memory access patterns.\r\n4.6.1 Memory Access Case #1: Memory Read, Store Into Register\r\nThe first case reads from the memory address, and stores the result in a register:\r\n; ... memory address pattern before this ...\r\n0x000048: MOV SCRATCH, DWORD PTR [SCRATCH]\r\n0x000060: MOV EAX, SCRATCH\r\nThis corresponds to \"mov eax, dword ptr [memExpression]\", with the details of \"memExpression\" being dictated\r\nby the memory address pattern.\r\n4.6.2 Memory Access Case #2: Memory Read, Push Result\r\nThe second case reads from the memory address, and then pushes the result on the stack:\r\n; ... memory address pattern before this ...\r\n0x004950: MOV SCRATCH, DWORD PTR [SCRATCH]\r\n0x004968: PUSH SCRATCH\r\nThis corresponds to \"push dword ptr [memExpression]\", with the details of \"memExpression\" being dictated by\r\nthe memory address pattern.\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 11 of 15\n\n4.6.3 Memory Access Case #3: Memory Write, Value Taken from Register\r\nThe third case stores a register into the memory location calculated by the memory address sequence:\r\n; ... memory address pattern before this ...\r\n0x000ae0: MOV DWORD PTR [SCRATCH], EAX\r\nThis corresponds to \"mov dword ptr [memExpression], eax\", with the details of \"memExpression\" being dictated\r\nby the memory address pattern.\r\n4.6.4 Memory Access Case #4: Store Address into Register\r\nThe fourth case does not dereference the memory location, but rather saves the result into a register:\r\n; ... memory address pattern before this ...\r\n0x000e28: MOV EAX, SCRATCH\r\nThis corresponds to \"lea eax, dword ptr [memExpression]\", with the details of \"memExpression\" being dictated by\r\nthe memory address pattern.\r\n4.7 All Together: De-Virtualizing Memory References\r\nAfter having fully analyzed the remaining references to the SCRATCH register as described above, I wrote code\r\nto detect the memory address patterns and the following memory access patterns, analyze them, and convert them\r\nback into x86 machine code. The function \"SixthSimplify\" in the simplification code recognizes memory address\r\npatterns, inspects the following instructions to determine the memory access patterns, and combines these two\r\npieces of information to determine which x86 instruction the two memory address/access sequences have\r\nreplaced. Next, the code reconstructs an x86 instruction -- as a FinSpy VM \"Raw X86\" instruction -- with which it\r\nfinally replaces the VM instructions comprising the memory address and memory access sequences. As before, the\r\nreplacement \"Raw x86\" FinSpy VM instruction uses the key and position attributes from the first VM instruction\r\nas the unique attributes for the replacement instruction.\r\nFirst, \"SixthSimplify\" invokes the function \"DecodeAddressSequence\". That function determines if a given\r\nposition within a list of FinSpy VM instructions begins a memory address sequence. If not, the function returns\r\nNone. If so, DecodeAddressSequence extracts details from the memory address sequence -- the base register, the\r\nindex register and optional scale factor, and the 32-bit displacement -- and uses it to create a Mem32 object (the\r\nclass used by my Python x86 library to represent 32-bit memory expressions), which it then returns along with the\r\nnumber of VM instructions comprising the memory address pattern. That code is shown below. \r\n# Given:\r\n# * A list of FinSpy VM instructions in insnArr\r\n# * An index within that list in idx\r\n#\r\n# Determine if insnArr[idx] contains a memory address sequence.\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 12 of 15\n\n# If not, return None. If so, create an x86 memory expression\r\n# operand using my x86 library, and return it along with the\r\n# number of instructions in the address sequence.\r\ndef DecodeAddressSequence(insnArr, idx):\r\n # Save position of index within insnArr\r\n oldIdx = idx\r\n # The first VM instruction in the sequence is usually \"MOV SCRATCH, REG32\".\r\n r1 = IsMovScratchReg(insnArr[idx])\r\n # Was it?\r\n if r1 is not None:\r\n # Yes, it was, so increment the current index\r\n idx += 1\r\n # Is the next VM instruction \"SHL REG, [1/2/3]\"?\r\n if isinstance(insnArr[idx],ShlScratchImm32):\r\n # Yes, copy the scale factor\r\n scaleFac = insnArr[idx].Imm32\r\n assert(scaleFac == 1 or scaleFac == 2 or scaleFac == 3)\r\n # Increment the current index\r\n idx += 1\r\n # Otherwise, there is no scale factor\r\n else:\r\n scaleFac = None\r\n # Is the next VM instruction \"ADD SCRATCH, REG32\"?\r\n r2 = IsAddScratchReg(insnArr[idx])\r\n if r2 is not None:\r\n # Yes, increment the current index\r\n idx += 1\r\n # Is the next VM instruction \"ADD SCRATCH, IMM32\"?\r\n disp32 = IsAddScratchImm32(insnArr[idx])\r\n if disp32 is not None:\r\n # Yes, increment the current index\r\n idx += 1\r\n # Make a memory expression from the parts, and return the length\r\n # of the memory address sequence\r\n return (idx-oldIdx, MakeMemExpr(r2, r1, scaleFac, disp32))\r\n # The second possibility is that the memory expression is a raw address.\r\n imm = IsMovScratchImm32(insnArr[idx])\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 13 of 15\n\n# Was it?\r\nif imm is not None:\r\n # Yes: make a memory expression from the address, and return the\r\n # length of the memory address sequence (namely, 1).\r\n return (1, MakeMemExpr(None,None,None,imm))\r\n # If we are here, neither of the memory address patterns matched, so\r\n # signal match failure.\r\n return None\r\nNext, a similar function called \"DecodeAccessSequence\" uses pattern-matching to determine which of the four\r\nmemory access patterns -- mov reg32, [memExpr] / push [memExpr] / mov [memExpr], reg32 / lea reg32,\r\n[memExpr] -- follows the access sequence. The code is similar to the code just shown for decoding memory\r\naddress sequences, and so is not duplicated here. See the source code for the complete details.\r\nFinally, after recognizing a memory address sequence followed by a memory access sequence, we use the\r\nmachinery we previously developed for regenerating x86 machine code -- namely, the function \"MakeRawX86\" --\r\nto create a FinSpy VM instruction containing the machine code for the raw x86 instruction. We use this to replace\r\nthe memory address and access sequences just recognized.\r\nAfter running this simplification pass, and all prior ones described for Group #2 instructions, over the VM\r\nbytecode program, all Group #2 VM instructions disappear, and consequently the SCRATCH register disappears\r\nfrom the VM bytecode disassembly listing. The final disassembly listing of the simplified FinSpy VM program\r\ncan be seen in its entirety here.\r\n5. Step #3: Correcting a Small Error\r\nHaving dispensed with the Group #2 VM instructions, Groups #1 and #3 remained. Before addressing them,\r\nhowever, there was one VM instruction that didn't fit into either one of those groups. Namely, in producing my\r\ninitial VM bytecode disassembly listing, I had incorrectly identified the FinSpy VM unconditional jump\r\ninstruction as being an instruction to cause a deliberate crash. I disassembled this as \"CRASH\". My FinSpy VM\r\nbytecode disassembly listing was confusing with how frequently the CRASH instruction occurred:\r\n0x005bb0: JNZ VM[0x005c88] (fallthrough VM[0x005bc8])\r\n0x005bc8: X86 xor eax, eax\r\n0x005be0: CRASH\r\n0x005bf8: X86 mov eax, dword ptr [esi+4h]\r\n0x005c70: CRASH\r\n0x005c88: X86 push ebx\r\nI discussed this issue at length in part two -- the implementation for unconditional jumps was considerably\r\ndifferent from conditional jumps, and involved dynamic code generation, which was only used otherwise for\r\nGroup #3 instructions. Closer analysis revealed that the true purpose of this instruction was as an unconditional\r\njump and not a deliberate crash. Upon learning this, I updated my disassembler component to rename the\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 14 of 15\n\n\"CRASH\" instruction to \"JMP\", and to print the branch targets of these newly-renamed JMP instructions. The\r\nresulting VM bytecode disassembly listing was clearer:\r\n0x005bb0: JNZ VM[0x005c88] (fallthrough VM[0x005bc8])\r\n0x005bc8: X86 xor eax, eax\r\n0x005be0: JMP VM[0x008df0]\r\n0x005bf8: X86 mov eax, dword ptr [esi+4h]\r\n0x005c70: JMP VM[0x008df0]\r\n0x005c88: X86 push ebx\r\nThe entire VM bytecode disassembly listing after this modification can be found here.\r\n6. Conclusion\r\nIn this Phase #1, we inspected our sample's initial FinSpy VM bytecode disassembly listing. We improved the\r\nlisting by including x86 disassembly for the x86 machine code embedded in the Group #3 instructions. We\r\ninspected how the VM bytecode program used Group #2 instructions. A half-dozen simple pattern-replacement\r\nschema were sufficient to entirely remove the Group #2 instructions. Afterwards, we corrected the output by\r\nobserving that the \"CRASH\" instructions were really unconditional jumps.\r\nIn the next Part #3, Phase #2, we are ready to attempt our first devirtualization for the FinSpy VM bytecode\r\nprogram. Our initial attempts wind up being insufficient. We study the deficiencies in Part #3, Phase #3, and\r\nremedy them in our second, final, devirtualization attempt in Part #3, Phase #4.\r\nSource: https://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nhttps://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye\r\nPage 15 of 15",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"origins": [
		"web"
	],
	"references": [
		"https://www.msreverseengineering.com/blog/2018/2/21/wsbjxrs1jjw7qi4trk9t3qy6hr7dye"
	],
	"report_names": [
		"wsbjxrs1jjw7qi4trk9t3qy6hr7dye"
	],
	"threat_actors": [
		{
			"id": "cf7fc640-acfe-41c4-9f3d-5515d53a3ffb",
			"created_at": "2023-01-06T13:46:38.228042Z",
			"updated_at": "2026-04-10T02:00:02.883048Z",
			"deleted_at": null,
			"main_name": "APT1",
			"aliases": [
				"PLA Unit 61398",
				"Comment Crew",
				"Byzantine Candor",
				"Comment Group",
				"GIF89a",
				"Group 3",
				"TG-8223",
				"Brown Fox",
				"ShadyRAT",
				"G0006",
				"COMMENT PANDA"
			],
			"source_name": "MISPGALAXY:APT1",
			"tools": [],
			"source_id": "MISPGALAXY",
			"reports": null
		},
		{
			"id": "3aaf0755-5c9b-4612-9f0e-e266ef1bdb4b",
			"created_at": "2022-10-25T16:07:23.480196Z",
			"updated_at": "2026-04-10T02:00:04.626125Z",
			"deleted_at": null,
			"main_name": "Comment Crew",
			"aliases": [
				"APT 1",
				"BrownFox",
				"Byzantine Candor",
				"Byzantine Hades",
				"Comment Crew",
				"Comment Panda",
				"G0006",
				"GIF89a",
				"Group 3",
				"Operation Oceansalt",
				"Operation Seasalt",
				"Operation Siesta",
				"Shanghai Group",
				"TG-8223"
			],
			"source_name": "ETDA:Comment Crew",
			"tools": [
				"Auriga",
				"Cachedump",
				"Chymine",
				"CookieBag",
				"Darkmoon",
				"GDOCUPLOAD",
				"GLOOXMAIL",
				"GREENCAT",
				"Gen:Trojan.Heur.PT",
				"GetMail",
				"Hackfase",
				"Hacksfase",
				"Helauto",
				"Kurton",
				"LETSGO",
				"LIGHTBOLT",
				"LIGHTDART",
				"LOLBAS",
				"LOLBins",
				"LONGRUN",
				"Living off the Land",
				"Lslsass",
				"MAPIget",
				"ManItsMe",
				"Mimikatz",
				"MiniASP",
				"Oceansalt",
				"Pass-The-Hash Toolkit",
				"Poison Ivy",
				"ProcDump",
				"Riodrv",
				"SPIVY",
				"Seasalt",
				"ShadyRAT",
				"StarsyPound",
				"TROJAN.COOKIES",
				"TROJAN.FOXY",
				"TabMsgSQL",
				"Tarsip",
				"Trojan.GTALK",
				"WebC2",
				"WebC2-AdSpace",
				"WebC2-Ausov",
				"WebC2-Bolid",
				"WebC2-Cson",
				"WebC2-DIV",
				"WebC2-GreenCat",
				"WebC2-Head",
				"WebC2-Kt3",
				"WebC2-Qbp",
				"WebC2-Rave",
				"WebC2-Table",
				"WebC2-UGX",
				"WebC2-Yahoo",
				"Wordpress Bruteforcer",
				"bangat",
				"gsecdump",
				"pivy",
				"poisonivy",
				"pwdump",
				"zxdosml"
			],
			"source_id": "ETDA",
			"reports": null
		}
	],
	"ts_created_at": 1775434158,
	"ts_updated_at": 1775826685,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/643776e0a555c13d0860a32816921de8024c1ec3.pdf",
		"text": "https://archive.orkl.eu/643776e0a555c13d0860a32816921de8024c1ec3.txt",
		"img": "https://archive.orkl.eu/643776e0a555c13d0860a32816921de8024c1ec3.jpg"
	}
}