# Under the hood of Wslink’s multilayered virtual machine

#### Author: Vladislav Hrčka

 March 2022


-----

## CONTENTS

Executive summary .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   5

General structure of virtual machines .   .   .   .   .   .   .   .   .   .   .   .   .   . 5

Documented techniques for deobfuscation of virtual machines  .   .   .   .   . 11

The Miasm framework  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   12

Wslink’s virtual machine entry – vm_entry .   .   .   .   .   .   .   .   .   .   .   .   .  13

Packer .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14

Junk code .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16

Virtual machine initialization  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 17

vm_pre_init() functions  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  17

vm_init() function .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  18

Virtual instructions .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 20

Virtual instructions of the second virtual machine  .   .   .   .   .   .   .   .   .  24

The first virtual instruction  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  24
The second virtual instruction .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  25
The third virtual instruction .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  25
The fourth virtual instruction  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  26
Automating analysis of the virtual instructions  .   .   .   .   .   .   .   .   .   27

Getting back to the first virtual machine  .   .   .   .   .   .   .   .   .   .   .   .  29

The initial virtual instruction  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  30

Intro .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  31
Outro .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 32
Analysis of the virtual context  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  33
Behavior .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 34

Initially executed virtual instructions .   .   .   .   .   .   .   .   .   .   .   .   .   . 34

The first executed virtual instruction .   .   .   .   .   .   .   .   .   .   .   .   .  34
The second executed virtual instruction .   .   .   .   .   .   .   .   .   .   .   .  36
The third executed virtual instruction .   .   .   .   .   .   .   .   .   .   .   .   . 36
The fourth executed virtual instruction  .   .   .   .   .   .   .   .   .   .   .   .  38

Automating analysis of the first virtual machine  .   .   .   .   .   .   .   .   .   .   .  39

Processing the initial bytecode block .   .   .   .   .   .   .   .   .   .   .   .   .   40

Opaque predicates  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  41
Overview  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  41

Description of our final VM analyzer code  .   .   .   .   .   .   .   .   .   .   .   .   .  45

Class Wslink .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  45

Class VirtualContext .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  45

Method get_next_instr() .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  45
Method get_irb_symbs() .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  45
Method get_updated_internal_context() .   .   .   .   .   .   .   .   .   .   .   45
Method get_state_hash()  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 45

Class MySymbolicExecutionEngine  .   .   .   .   .   .   .   .   .   .   .   .   .   .  45

Class SymbolicCFG .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 46

Class Node  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  46

Future work .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  46

Conclusion .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  47


-----

## LIST OF TABLES

Table 1. Miasm’s IR semantics  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  13

Table 2.  Statistics of the initial virtual instruction .   .   .   .   .   .   .   .   .   30

Table 3.  Statistics of the first executed virtual instruction  .   .   .   .   .   .   . 36

Table 4.  Statistics of the second executed virtual instruction  .   .   .   .   .   . 36

Table 5.  Statistics of the third executed virtual instruction  .   .   .   .   .   .   37

Table 6.  Statistics of the fourth executed virtual instruction .   .   .   .   .   .  38

Table 7.  Statistics of the first processed bytecode block .   .   .   .   .   .   .  44

## LIST OF FIGURES

Figure 1. Illustration of bytecode, where all opcodes

and operands are virtual  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   6

Figure 2. Illustration of the relationship between bytecode

and the VM’s interpreter  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   . 7

Figure 3. Illustration of Switch Dispatch, where R0 is a virtual register  .   .   . 8

Figure 4. Illustration of Direct Threading .   .   .   .   .   .   .   .   .   .   .   .   .  9

Figure 5. Overview of the virtualization process  .   .   .   .   .   .   .   .   .   .   10

Figure 6. Entry point to the virtual machine .   .   .   .   .   .   .   .   .   .   .   .  13

Figure 7. A part of vm_entry of the virtual machine

decompiled with Ghidra  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14

Figure 8. Function used to unpack NsPack in ClamAV .   .   .   .   .   .   .   .   . 15

Figure 9. A block of code in Miasm’s symbolic execution (left) and

a part of the same block in IDA’s decompiler (right) .   .   .   .   .   .  16

Figure 10. Miasm’s symbolic execution of a vm_pre_init()

showing parameters supplied to vm_init()  .   .   .   .   .   .   .   .   17

Figure 11. Busy-waiting for interpreter in vm_init() .   .   .   .   .   .   .   .   .  18

Figure 12. vm_init() summary  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  19

Figure 13. Virtual instruction table  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .  20

Figure 14. The first virtual instruction in the table  .   .   .   .   .   .   .   .   .   .  20

Figure 15. One of the vm_pre_init() functions .   .   .   .   .   .   .   .   .   .   . 21

Figure 16. Miasm’s symbolic execution of the first

virtual instruction (function at 0x0FF2DB) .   .   .   .   .   .   .   .   .  21

Figure 17. Miasm’s symbolic execution of the first block of vm2_init()  .   .   22

Figure 18. Miasm’s symbolic execution of the first block of vm_init() .   .   .  23

Figure 19. A part of the second virtual instruction table  .   .   .   .   .   .   .   .  23

Figure 20. Virtual instructions in the structure of the virtual machines .   .   .  24

Figure 21. The initial virtual instruction of the second VM .   .   .   .   .   .   .   . 25


-----

Figure 22. Bytecode of the virtual instruction .   .   .   .   .   .   .   .   .   .   .   .  25

Figure 23. Destination address and memory modified

by the second virtual instruction  .   .   .   .   .   .   .   .   .   .   .   .   25

Figure 24. Destination address and memory modified

by the third virtual instruction  .   .   .   .   .   .   .   .   .   .   .   .   .  25

Figure 25. The conditional branch and delta of the

stack pointer of the fourth virtual instruction  .   .   .   .   .   .   .   . 26

Figure 26. Destination address and memory modified by

the fourth virtual instruction .   .   .   .   .   .   .   .   .   .   .   .   .   . 26

Figure 27. Call graph generated from memory assignments and the VPC .   .  28

Figure 28. Virtual instructions in the structure of the virtual machines .   .   .  29

Figure 29. Control flow graph of the initial virtual instruction  .   .   .   .   .   .  30

Figure 30. Beginning of the intro finishing context switch of the second VM  .   .  31

Figure 31. Virtual registers of the second machine being mapped back

to the native ones at the end of the virtual instruction .   .   .   .   .  32

Figure 32. Last few virtual instructions executed before

mapping the virtual registers back to the native ones  .   .   .   .   .  33

Figure 33. Zeroing out an internal register .   .   .   .   .   .   .   .   .   .   .   .   .  35

Figure 34. Storing the stack pointer in an internal register .   .   .   .   .   .   .   37

Figure 35. Part of the fourth virtual instruction .   .   .   .   .   .   .   .   .   .   .  38

Figure 36. Part of the fourth virtual instruction performing a pop-like operation .   38

Figure 37. Part of the fourth virtual instruction performing a PUSH operation .   . 39

Figure 38. The first processed bytecode block .  .  .  .  .  .  .  .  .  .  .  . 40

Figure 39. Expressions that can be further simplified .   .   .   .   .   .   .   .   .   41

Figure 40. Code of the processed bytecode  .   .   .   .   .   .   .   .   .   .   .   .   . 42

Figure 41. Data accessed by the code – ServiceStatus .   .   .   .   .   .   .   .   .   . 42

Figure 42. Function whose pointer is used in the code .   .   .   .   .   .   .   .   .  43

Figure 43. Destination address of the bytecode .   .   .   .   .   .   .   .   .   .   .  43

Figure 44. Setting return address of the API call .   .   .   .   .   .   .   .   .   .   .  43

Figure 45. Body of the code’s loop  .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   44

Figure 46. The same part of code in the non-obfuscated binary .   .   .   .   .   44


-----

## EXECUTIVE SUMMARY

ESET researchers recently described Wslink, a unique and previously undocumented malicious loader

that runs as a server and that features a virtual-machine-based obfuscator. There are no code,

functionality or operational similarities that suggest this is likely to be a tool from a known threat actor;

[the complete analysis of the malware can be found here.](https://www.welivesecurity.com/2021/10/27/wslink-unique-undocumented-malicious-loader-runs-server/)

In this white paper we describe the structure of the virtual machine used in samples of Wslink and

suggest a possible approach to see through the obfuscation techniques used in the analyzed samples.

We demonstrate our approach on chunks of code of the protected sample. We were not motivated to

fully deobfuscate the code, because we discovered a non-obfuscated sample.

Obfuscation techniques are a kind of software protection intended to make code hard to understand

and hence conceal its objectives; obfuscating virtual machine techniques have become widely misused

for illicit purposes such as obfuscation of malware samples as they hinder both analysis and detection.

The ability to analyze malicious code and subsequently improve our detection capabilities is behind our

motivation to overcome these techniques.

Virtualized Wslink samples do not contain any clear artifacts, such as specific section names, that easily

link it to a known virtualization obfuscator. During our research, we were able to successfully design

and implement a semiautomatic solution capable of significantly facilitating analysis of the underlying

program’s code. The virtual machine introduced a diverse arsenal of obfuscation techniques, which we

were able to overcome to reveal a part of the deobfuscated malicious code that we describe in this

document. In the last sections of this analysis, we present parts of the code we developed to facilitate

our research.

This white paper also provides an overview of the internal structure of virtual machines in general, and

introduces some important terms and frameworks used in our detailed analysis of the Wslink virtual

machine.

In the past we described the structure of a custom virtual machine, along with our techniques to

devirtualize the machine. That virtual machine contained an interesting anti-disassembly trick,

[previously utilized by FinFisher – spyware with extensive spying capabilities, such as live surveillance](https://www.welivesecurity.com/2017/09/21/new-finfisher-surveillance-campaigns/)

through webcams and microphones, keylogging, and exfiltration of files. We additionally presented an

[approach for its deobfuscation. You can find more information about that case in this earlier white paper.](https://www.welivesecurity.com/wp-content/uploads/2018/01/WP-FinFisher.pdf)

## OVERVIEW OF VIRTUAL MACHINE STRUCTURES

Before diving into the analysis of Wslink’s virtual machine (VM), we provide an overview of the internal

structure of virtual machines in general, describe known approaches to deal with such obfuscation and

introduce some important terms and frameworks used in our detailed analysis of the Wslink VM.

### General structure of virtual machines

Virtual machines can be divided into two main categories:

1. System virtual machines – support execution of complete operating systems (e.g., various VMWare

products, VirtualBox)

2. Process virtual machines – execute individual programs in an OS-independent environment (e.g., Java, the

.NET Common Language Runtime)

Here, we are interested only in the second category – process virtual machines – and we will briefly

describe certain parts of their internal anatomy necessary to understand the rest of this paper.


-----

Process virtual machines run as normal applications on their host OSes, and in turn run programs whose

code is stored as OS-independent bytecode (Figure 1) that represents a series of instructions – an

[application – of a virtual ISA (instruction set architecture).](https://www.arm.com/glossary/isa)

Figure 1. Illustration of bytecode, where all opcodes and operands are virtual

One can also think about bytecode as a sort of intermediate representation (IR); an abstract

representation of code consisting of a specific instruction set that resembles assembly more than a

high-level language. It is also known as intermediate language.

The use of IR is convenient in terms of code reusability – when one needs to add support for a new

architecture or CPU instruction set, it is easier to convert it to the IR instead of writing all the required

algorithms again. Another benefit is that it can simplify the application of some optimization algorithms.

One can generally translate both high- and low-level languages into an IR. Translation of a higher-level

language is known as “lowering”, and similarly translation of a lower-level one, “lifting”.

The following example lifts an assembly block bb0 into a block with the pseudo-IR code irb0. All

assembly instructions are translated into a set of IR operations and individual operations in sets do not

affect each other, where ZF stands for zero flag and CF for carry flag:
```
bb0:
 MOV R8, 0x05
 SUB AX, DX
 XCHG ECX, EDX
irb0:
 R8 = 0x05
 EAX[:0x10] = EAX[:0x10] – EDX[:0x10]
 ZF = EAX[:0x10] – EDX[:0x10] == 0x00
 CF = EAX[:0x10] < EDX[:0x10]
 ...
 ECX = EDX
 EDX = ECX

```
Modern process VMs usually provide a compiler that can lower code written in a high-level language -
one that is easy to understand and comfortable to use – into the respective bytecode.

A VM’s ISA generally defines the supported instructions, data types and registers, among other things,

that naturally must be implemented by a virtual ISA as well.

Instructions consist of the following parts:

- opcodes – operation codes that specify an instruction

- operands – parameters of the instructions


-----

ISAs often use two well-known virtual registers:

- virtual program counter (VPC) – a pointer to the current position in the bytecode

- virtual stack pointer – a pointer to pre-allocated virtual stack space used internally by the VM

The virtual stack pointer does not have to be present in all VMs; it is common only in a certain type of

[VM – stack-based ones.](https://andreabergia.com/stack-based-virtual-machines/)

We will refer to the instructions and their respective parts of a virtual ISA simply as virtual instructions,

virtual opcodes, and virtual operands. We sometimes omit the explicit use of “virtual” when it is obvious

that we are talking about the virtual representation.

An OS-dependent (Figure 2) executable file – interpreter – processes the supplied bytecode and

sequentially interprets the underlying virtual instructions thus executing the virtualized program.

Figure 2. Illustration of the relationship between bytecode and the VM’s interpreter

Transfer of control from one virtual instruction to the next during interpretation needs to be performed

[by every VM. This process is generally known as dispatching. There are several documented dispatch](http://www.cs.toronto.edu/~matz/dissertation/matzDissertation-latex2html/node6.html)

techniques such as:


-----

- Switch Dispatch – the simplest dispatch mechanism where virtual instructions are defined as case

clauses and a virtual opcode is used as the test expression (Figure 3)

- Direct Call Threading – virtual instructions are defined as functions and virtual opcodes contain

addresses of these functions

- Direct Threading – virtual instructions are defined as functions again; however, in comparison

to Direct Call Threading, addresses of the functions are stored in a table and virtual opcodes

represent offsets to this table. Each function should indirectly call the following one according to the

specification (Figure 4)

The body of a virtual opcode in the interpreter’s code is usually called a virtual handler because it

defines the behavior of the opcode and handles it when the virtual program counter points to a location

in the bytecode that contains a virtual instruction with that opcode.

[By context, regarding VMs, we mean a sort of virtual process context: each time a process is removed](https://tldp.org/LDP/LG/issue23/flower/context.html)

from access to the processor during process switching, sufficient information on its current operating

state – its context – must be stored such that when it is again scheduled to run on the processor, it can

resume its operation from an identical position.

Figure 3. Illustration of Switch Dispatch, where R0 is a virtual register


-----

Figure 4. Illustration of Direct Threading

Obfuscation techniques are a kind of software protection intended to make code hard to understand

and hence conceal its objectives. Such techniques were initially developed to protect the intellectual

property of legitimate software, i.e., to hamper reverse engineering.

Virtual machines used as obfuscation engines are based on process virtual machines, as described

above. The primary difference is that they are not intended to run cross-platform applications and they

usually take machine code compiled or assembled for a known ISA, disassemble it and translate that to

their own virtual ISA. It is also usually the case that the VM environment and the virtualized application

code are contained in one application, whereas traditional process VMs usually consist of a process that

runs as a standalone application that loads separate, virtualized applications

The strength of this obfuscation technique resides in the fact that the ISA of the VM is unknown to

any prospective reverse engineer – a thorough analysis of the VM, which can be very time-consuming,

is required to understand the meaning of the virtual instructions and other structures of the VM.

Further, if performance is not an issue, the VM’s ISA can be designed to be arbitrarily complex,

slowing its execution of virtualized applications, but making reverse engineering even more complex.

Understanding of the VM is necessary for decoding the bytecode and making the virtualized code

understandable.

Context has a bit of a different meaning in regard to obfuscating virtual machines: each time we want

to switch from the native to virtual ISA or vice-versa, sufficient information – context – on the current

operating state must be stored so that when the lSA has to be switched back, execution can resume

with only the relevant data and registers modified.

Additionally, obfuscating VMs usually virtualize only certain “interesting” functions – native context is

mapped to the virtual one and bytecode, representing the respective function, is chosen beforehand.

The built-in interpreter is invoked afterwards (Figure 5). Beginnings of the original functions contain

code that prepares and executes the interpreter – entry of the VM (vm_entry); the rest of their code is

omitted in Figure 5.

Interpreter, bytecode, and virtual ISA code with data of obfuscating VMs are often all stored in a

dedicated section of the executable binary, along with the rest of the partially virtualized program.


-----

Figure 5 shows the way a function, Function 1, in the original application targeting a common ISA can

be virtualized for an obfuscating VM’s ISA. It needs to be converted into bytecode, for example using a
```
generate_bytecode method. Its body is afterwards overwritten by a call into vm_entry and zeroes.

```
The vm_entry function chooses the respective bytecode, for example, based on the calling function’s

address, then conducts a context switch, and next interprets the bytecode. Finally, it returns to the code

where the virtualized function, Function 1, would return.

Figure 5. Overview of the virtualization process

In VMs hosted on x86 architectures, such context switches usually consist of a series of PUSH and POP

instructions. For example:
```
PUSH EAX
PUSH EBX
PUSH ECX
...
MOV ECX, context_addr
POP DWORD PTR [ECX]
POP DWORD PTR [ECX + 4]
POP DWORD PTR [ECX + 8]
...

```
When the bytecode is fully processed, virtual context is mapped back to native context and execution

continues in the non-virtualized code; however, another virtualized function could be executed in the

same manner, right away.

Note that several context switches can occur in one virtualized function, for example when a native

instruction from the original ISA could not be translated to virtual instructions or an unknown function

from the native API needs to be executed.


-----

### Documented techniques for deobfuscation of virtual machines

Obfuscating VM techniques have become widely misused for illicit purposes such as obfuscation of

malware samples as they hinder both analysis and detection. Hence there is motivation to overcome

these obfuscation techniques so as to facilitate analysis of such malicious code and to achieve overall

improvement of detection methods.

But first, we want to clarify several terms that are used in this and following sections and might not be

known to all readers.

Symbolic execution is a code analysis technique, where specific variables are represented with symbolic

values instead of concrete data. Arbitrary operations with these symbolic values produce symbolic

expressions. It is usually applied on the code’s IR and the symbolic expressions include flags.

One can visualize the symbolic expressions like mathematical formulas as can be seen in the following

example, where irb1 contains a block of pseudo-IR:
```
irb1:
  R13 = R13 + 0x027D3930
  RBX = RCX + 0x05
  R13 = R13 + -RSI
  R13 = R13 + RBX
irb1_symb:
  RBX = RCX + 0x05
  R13 = R13 + RCX + 0x05 + -RSI + 0x027D3930
  ZF = R13 + RCX + 0x05 + -RSI + 0x027D3930 == 0x00
  ...

```
The state of symbolically executed code consists of:

- Values of all variables

- Program counter

- Accumulated constraints that the program’s inputs need to satisfy to reach the associated location from the

entry point

Accumulated constraints can be understood as a theory in logic. In order to find concrete values of the

initial variables with symbolic values – inputs – we need to find a satisfying model, which can be done

with an

_[SMT (satisfiability modulo theories) solver.](https://link.springer.com/content/pdf/10.1007/978-3-540-78800-3_24.pdf)_

Path coverage is another code analysis technique that determines all possible paths in a piece of code. It

is usually implemented using symbolic execution instructed to explore all reachable paths – reachability

of newly discovered paths is verified by an SMT solver and already known paths are marked to prevent

infinite loops.

[Microsoft describes](https://www.microsoft.com/en-us/research/project/program-synthesis/) program synthesis as “the task of automatically discovering an executable piece of

code given user intent expressed using various forms of constraints such as input-output examples,

demonstrations, natural language, etc.”.

Several techniques to deal with VM-based obfuscation have been proposed in the past. Here we briefly

walk through them and discuss their advantages and disadvantages.


-----

[Rolf Rolles described several standard steps to manually recover the original code, where the drawback is](https://www.usenix.org/legacy/event/woot09/tech/full_papers/rolles.pdf)

time-complexity:

1. Reverse engineer and understand structures of the VM

2. Detect entries into the VM

3. Develop a disassembler for the instruction set by identifying the purpose of individual virtual opcodes or

matching them against already known ones

4. Disassemble the bytecode and convert it into intermediate representation – the semantics of some

instructions might be hard to comprehend in basic blocks without further translation (e.g., stack-based VMs

would contain a lot of confusing PUSH and POP machinations”)

5. Apply compiler optimizations to get rid of additional obfuscation techniques

6. Generate the deobfuscated code

He additionally suggested the use of pure symbolic execution on the virtual opcodes in the fourth step

to obtain a representation, where each opcode is a mathematical function that is a map from its input

space into itself. The pure symbolic execution technique was later independently implemented in a

[Miasm blogpost.](https://miasm.re/blog/2016/09/03/zeusvm_analysis.html)

[Jonathan Salwan, Sébastien Bardin, and Marie-Laure Potet proposed a fully automatic approach to](http://shell-storm.org/talks/DIMVA2018-deobfuscation-salwan-bardin-potet.pdf)

overcome obfuscating VM protection on samples with a finite number of executable paths. The

approach consists of the following steps:

1. Identification of the sample’s inputs

2. Isolation of pertinent instructions dependent on the identified inputs on an execution trace

3. Performance of a path coverage analysis to reach new paths – traces

4. Reconstruction of the original program from the resulting traces – they are combined and compiler

optimizations partially recover the control flow graph

[Tim Blazytko, Moritz Contag, Cornelius Aschermann, and Thorsten Hol produced a semiautomatic](https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-blazytko.pdf)

approach, based on program synthesis, that uses instruction traces as a black-box oracle to produce

random input and output pairs. The I/O pairs are subsequently used to learn the code’s underlying

semantics with the synthesizer.

These pairs and semantics are generated for the virtual opcodes that must be identified beforehand –

the VM needs to be partially reverse engineered to locate its components.

The approach does not seem to be applicable to some complex (particularly obfuscating) VMs due to

its time complexity, as it reportedly took almost three hours to process 36 virtual opcodes of a VM –

duplication of handlers, which is a simple and common obfuscation technique, would be a huge issue.

### The Miasm framework

_[Miasm is a free and open-source reverse-engineering framework that aims to analyze, modify and](https://github.com/cea-sec/miasm/issues/441)_

generate binary programs. It has a number of useful features that we use throughout our analysis:

- Opening, modifying and generating binary files – PE and ELF

- Assembling and disassembling of various architectures such as x86, ARM, MIPS…

- Representing assembly semantics using intermediate representation

- Simplification rules for automatic deobfuscation

- Symbolic execution engine

- ...

There are several frameworks for reverse-engineering that provide the features that we needed; we

decided to use Miasm in this project simply because it is actively maintained, and we are already familiar

and satisfied with it.


-----

The features that we want to use are covered in the example section of its GitHub repository description

[and its documentation.](https://github.com/cea-sec/miasm/tree/master/doc)

We encourage the reader to get familiar at least with semantics of its IR that are summarized in Table 1,

since they are going to be used repeatedly.

**Element** **Example**

ExprId `EAX`

ExprAssign `A=B`

ExprInt `0x18`

ExprLoc `location_1`

ExprCond `A ? B : C`

ExprMem `@16[ESI]`

ExprOp `A + B`

ExprSlice `AH = EAX[8:16]`

ExprCompose `{EAX 0 32, 0x0 32 64}`

Table 1. Miasm’s IR semantics

The destination address of a symbolic execution performed over a block of code is saved in the

respective program counter such as RIP and additionally in a special variable IRDst.

Note that during Miasm’s symbolic execution: initial values of registers, which are treated as variables,

are symbolic and their format is <register name>_init. Simplification rules are applied automatically

to the symbolic expressions. For example, the symbolic expression RAX = RCX + 0x2 + 0x3 is

automatically simplified into RAX = RCX_init + 0x5.

## WSLINK’S VIRTUAL MACHINE ENTRY – VM_ENTRY

Let’s get to the analysis of Wslink’s VM now. There are several function calls that enter the VM, all of

which are followed by some gibberish data that IDA attempts to disassemble – the data most likely just

overwrites the function’s original code before virtualization (Figure 6).

Figure 6. Entry point to the virtual machine

|Element|Example|
|---|---|
|ExprId|EAX|
|ExprAssign|A=B|
|ExprInt|0x18|
|ExprLoc|location_1|
|ExprCond|A ? B : C|
|ExprMem|@16[ESI]|
|ExprOp|A + B|
|ExprSlice|AH = EAX[8:16]|
|ExprCompose|{EAX 0 32, 0x0 32 64}|


-----

The vm_entry of the VM:

- calculates the actual base address by subtracting the expected relative virtual address from the actual

virtual address of a place in the code

- unpacks code and data related to the VM on the first run; it uses the calculated base address to

determine the location of the packed VM and destination of the unpacked data

- executes an initialization function – one of the vm_pre_init() functions to be described is based on

the caller’s relative address that is mapped to the respective vm_pre_init()

## PACKER

[Wslink’s VM is packed with NsPack to reduce the size of the huge executable file; additional obfuscation](http://www.heaventools.com/pe-explorer-nspack-unpacker.htm)

is probably just a side effect. Similarities between Wslink’s unpacking code and ClamAV’s unspack()

function are clearly visible (Figure 7 and Figure 8). Note that Ghidra has optimized out calculation of the

base address.

Figure 7. A part of vm_entry of the virtual machine decompiled with Ghidra


-----

Figure 8. Function used to unpack NsPack in ClamAV

The vm_pre_init_dispatch_table in Figure 7 is the structure that maps callers’ addresses of the vm_
```
entry to the respective vm_pre_init() functions that are to be described.

```

-----

## JUNK CODE

Each part of the unpacked VM is obfuscated with lots of junk code – unnecessary additional instructions

significantly decreasing readability of the code. It often uses instruction pairs with opposite effects.

Neither the IDA nor the Ghidra decompiler is able to deal with such obfuscation; however, Miasm’s

symbolic execution was able to make the code easily readable (Figure 9).

Figure 9. A block of code in Miasm’s symbolic execution (left) and a part of the same block in IDA’s decompiler (right)


-----

## VIRTUAL MACHINE INITIALIZATION

Initialization of the VM consists of several steps, such as saving values of the native registers on the

stack and later moving them to the virtual context, relocation of its internal structures, or preparation of

bytecode. We cover these steps more thoroughly in the following subsections.

### vm_pre_init() functions
```
vm_pre_init() functions are meant only to prepare parameters for another stage of initialization

```
(Figure 10). These functions call a single vm_init() function (explained in the next section) with

specific parameters. The supplied parameters are:

- CPU flags, RFLAGS, which are stored on the stack with a PUSHF instruction at the beginning of each

function

- hardcoded offset to a virtual instruction table that represents the first virtual instruction to be

executed (its opcode)

- hardcoded address of the bytecode to be interpreted

Figure 10. Miasm’s symbolic execution of a vm_pre_init() showing parameters supplied to vm_init()


-----

### vm_init() function
```
vm_init() pushes all the native registers and the supplied CPU flags from parameters (context) onto

```
the stack; one can actually see it in Figure 9. The native context will later be moved to the virtual one

that, in addition, holds several internal registers.

One of the internal registers determines whether another instance of the VM is already running – there

is only one global virtual context and only one instance of the VM can run at a time. Figure 11 shows

the part of the code busy-waiting for the virtual register, where RBP contains the address of the virtual

context and RBX the offset of the virtual register – the internal register is stored in [RBX + RBP].

The entire function is summarized in Figure 12.

Figure 11. Busy-waiting for interpreter in vm_init()

The bytecode’s address, supplied in the parameters, is added to the virtual context along with the

address of the virtual instruction table, which is hardcoded. Both have a dedicated virtual register.

The VM calculates the base address again in the same way as was described for vm_entry; in addition,

it stores the address in another internal register that is used later, should an API be called. Then the base

address is used to relocate the instruction table, its entries, and the bytecode’s address.

The calculated base address is simply added to all the function addresses if they have not already been

relocated.


-----

Figure 12. vm_init() summary


-----

## VIRTUAL INSTRUCTIONS

There are only 45 instructions in the virtual instruction table (Figure 13).

Figure 13. Virtual instruction table

Let us look at the first one in the table. Initially, we need to relocate it; our dump of the VM starts

at address 0x00 and it is expected to be at base + 0x0F33F5, so the target address is 0x1EC74E –
```
0x0F33F5, which is 0x0F9359 (Figure 14).

```
Figure 14. The first virtual instruction in the table

The JMP in Figure 14 leads us to a function at 0x0FF2DB whose behavior is remarkably similar to
```
vm_pre_init() (Figure 15 and Figure 16 for comparison). The function appears to be pushing another

```
bytecode address, the opcode of the initial virtual instruction, and CPU flags.


-----

Figure 15. One of the vm_pre_init() functions

Figure 16. Miasm’s symbolic execution of the first virtual instruction (function at 0x0FF2DB)

Inspecting the function at 0x0F7FFF (Figure 17), into which our virtual instruction jumps, reveals that it

appears to be another vm_init() (Figure 18). When we compare it to the previous one, we can see that

their behaviors are, indeed, the same. We will refer to these functions simply as vm2_pre_init() and
```
vm2_init().

```

-----

Figure 17. Miasm’s symbolic execution of the first block of vm2_init()


-----

Figure 18. Miasm’s symbolic execution of the first block of vm_init()

Inspection of the other instructions revealed that they all execute this second VM with different vm2_
```
pre_init() functions – this clearly shows that there are two layers of VMs.

```
Virtual instructions of the first VM execute vm2_pre_init() directly without any dispatch table based

on the caller’s address. The number of virtual instructions in the second VM is significantly higher – 1071

(Figure 19).

Figure 19. A part of the second virtual instruction table


-----

### Virtual instructions of the second virtual machine

We start by looking at the first few executed virtual instructions to observe the behavior of the second

VM and then try to process the rest of them in a partially automated way.

The diagram in Figure 20 highlights with blue, where the virtual instructions of the second VM are in the

structure of the VMs.

Figure 20. Virtual instructions in the structure of the virtual machines

#### The first virtual instruction

The first virtual instruction is, exceptionally, not obfuscated, as can be seen in Figure 21. Finally, we can

see some operations in the virtual context.

By inspecting the modified memory and calculated destination address of the instruction, it is clear that

the instruction does three things:

1. Zeroes out a virtual 32-bit register at offset 0xB5 in the virtual context (highlighted in gray in Figure 21),

which is stored in the RBP register.

2. A virtual 64-bit register at offset 0x28 is increased by 0x04: it is the pointer to the bytecode – virtual

program counter. The size of the virtual instruction is hence four bytes (highlighted in red in Figure 21).

3. The next virtual instruction is prepared to be executed, the offset to the virtual instruction table – virtual

opcode – is fetched from the virtual program counter. The virtual instruction table is at offset 0xA4

(highlighted in green in Figure 21). This means that the VM uses the Direct Threading Dispatch technique.


-----

Figure 21. The initial virtual instruction of the second VM

Note that the size of the next instruction’s opcode is only two bytes and the remaining word is left

unused. We can see that it is just a zero when we look at virtual operands (Figure 22). Sizes of the other

instructions differ – it is not just padding that preserves the same size for all instructions.

Figure 22. Bytecode of the virtual instruction

#### The second virtual instruction

The second virtual instruction does not do anything special; it just zeroes out several virtual registers

and jumps to the next instruction (Figure 23).

Figure 23. Destination address and memory modified by the second virtual instruction

#### The third virtual instruction

The third virtual instruction stores the address of the stack pointer in a virtual register (Figure 24); the

offset of the register is determined by one of the operands, and its offset is 0x0141 in our case.

Figure 24. Destination address and memory modified by the third virtual instruction


-----

#### The fourth virtual instruction

The fourth instruction contains two immediately visible anomalies in comparison with previous

instructions – the stack pointer’s delta is lower at the end of the function and it contains a conditional

branch (Figure 25).

Figure 25. The conditional branch and delta of the stack pointer of the fourth virtual instruction

Symbolic execution of the first block reveals that a value is popped from the stack into a virtual register

(Figure 26), which makes sense as the values of the native registers remain on the stack after being

saved there by vm2_init(). They are now being moved to the virtual context – the context switch is

partially performed by a number of virtual instructions, each of which pops one value off the stack into a

different register.

Figure 26. Destination address and memory modified by the fourth virtual instruction

The virtual register, where the value of the native register is to be saved, is determined by an operand

and two other virtual registers at offsets 0x0B and 0x70. However, their initial value is already known:

they were set to zero by the second virtual instruction (Figure 23), which means that we can calculate

the offset of the register and simplify the expressions – they are used just to obfuscate the code.


-----

Rolling decryption

Analysis of other virtual instructions confirmed that the virtual registers at offsets 0x0B and 0x70 are

[meant just to encode operands. This technique is called rolling decryption and it is known to be used](https://back.engineering/17/05/2021/#rolling-decryption)

[by the VMProtect obfuscator. However, it is the only overlap with that obfuscator and we are highly](https://vmpsoft.com/)

confident that this VM is different.

The obfuscation technique is certainly one of the reasons for the enormous number of virtual

instructions – use of the technique requires duplication of individual instructions since each uses a

different key to decode the operands.

Simplification

The expressions can be simplified to the following when we apply the known values of the virtual

registers:
```
IRDst = (-@16[@64[RBP_init + 0x28] + 0x4] ^ 0x3038 == @16[@64[RBP_init + 0x28] +
0x6])?(0x7FEC91ABD1C,0x7FEC91ABCF6)
@64[RBP_init + {-@16[@64[RBP_init + 0x28] + 0x4] ^ 0x3038, 0, 16, 0x0, 16, 64}] =
@64[RSP_init]

```
Now let us take a look at the expression in the conditional block:
```
@64[RBP_init + {@16[@64[RBP_init + 0x28] + 0x6], 0, 16, 0x0, 16, 64}] = @64[RBP_
init + {@16[@64[RBP_init + 0x28] + 0x6], 0, 16, 0x0, 16, 64}] + 0x8

```
We can now see that the virtual instruction is definitely POP – it moves a value off the top of the stack

to a virtual register, whose offset is still obfuscated with a simple XOR; it additionally increases the stack

pointer when the destination register is not the stack pointer.

As values in the bytecode are known too, we can apply them and simplify the instruction even further

into the following final unconditional expressions:
```
IRDst = @64[@64[RBP_init + 0xA4] + 0x5A8]
@64[RBP_init + 0x28] = @64[RBP_init + 0x28] + 0x8
@64[RBP_init + 0x141] = @64[RBP_init + 0x141] + 0x8
@64[RBP_init + 0x12A] = @64[RSP_init]

#### Automating analysis of the virtual instructions

```
As doing this for more than 1000 instructions would be very time consuming, we wrote a Python script

with Miasm that collects this information for us so we can get a better overview of what is going on.

We are particularly interested in modified memory and destination addresses.

Just as in the fourth virtual instruction, we will treat certain virtual registers as concrete values to

retrieve clear expressions. These registers are dedicated to the rolling decryption and perform memory

accesses that are relative to the bytecode pointer, e.g. [<obf_reg_1>] = [<bytecode_ptr> +
```
0x05] ^ 0xABCD.

```
Subsequently we concretize the pointer to the virtual instruction table too and, by the end of the virtual

instruction: calculate addresses of the next ones, clear the symbolic state, and start with the following

virtual instructions.

We additionally save aside memory assignments that are not related to the internal registers of the VM

and gradually build a graph based on the virtual program counter (Figure 27).


-----

Figure 27. Call graph generated from memory assignments and the VPC

We stop when we cannot unambiguously determine the next virtual instructions to be executed; one

can automatically process most of the virtual instructions in this way.

Note that instructions featuring complex loops cannot be processed with certainty and need to be

addressed individually due to the path explosion problem of symbolic execution, which is described

[for example in the paper Demand-Driven Compositional Symbolic Execution: “Systematically executing](https://www.microsoft.com/en-us/research/wp-content/uploads/2008/01/fulltext.pdf)

symbolically all feasible program paths does not scale to large programs. Indeed, the number of feasible

paths can be exponential in the program size, or even infinite in presence of loops with unbounded

number of iterations.”


-----

### Getting back to the first virtual machine

Before diving into the virtual instructions of the first VM, let us recap where we currently are. We have

just described a way to semiautomate processing of the bytecode belonging to the second VM (blue

in Figure 28) that interprets virtual instructions of the first VM (red in Figure 28). Now we move on to

inspect instructions of the first VM with this approach.

Figure 28. Virtual instructions in the structure of the virtual machines


-----

### The initial virtual instruction

In this section we describe the results of processing of the initial virtual instruction of the first VM in the

semiautomatic manner that was described in the previous section.

We performed all the processing on a virtual machine with i7-4770 CPU and 4GB of memory. Statistics

in Table 2 have been extracted from the processing of the initial virtual instruction.

Size of the bytecode block in bytes 1,145

Total number of processed virtual instructions 109

Total number of underlying native instructions 17,406

Total number of resulting IR instructions (including IRDsts) 307

Execution time in seconds 10.6509

Table 2. Statistics of the initial virtual instruction

The resulting control flow graph built out of the semantics extracted from the virtual instructions of

the second VM’s bytecode that interprets the initial virtual instruction from the first VM can be seen in

Figure 29. We can divide the series into a few parts.

Figure 29. Control flow graph of the initial virtual instruction

|Size of the bytecode block in bytes|1,145|
|---|---|
|Total number of processed virtual instructions|109|
|Total number of underlying native instructions|17,406|
|Total number of resulting IR instructions (including IRDsts)|307|
|Execution time in seconds|10.6509|


-----

#### Intro

As expected, the graph starts with a series of POP instructions that move values of the native registers

saved beforehand in vm2_init() to the virtual ones (Figure 30). To determine positions of the native

registers on the stack, we could symbolically evaluate the first block of vm2_init() and map the virtual

registers to their native versions, which would make the code easier to read, but that is not important

now.

Remember that the virtual register at offset 0x1E contains the stack pointer, and that a POP instruction

moves a value off the top of the stack and usually increases the stack pointer.

Figure 30. Beginning of the intro finishing context switch of the second VM


-----

#### Outro

To map the virtual registers back to the native ones, the second VM pushes them all onto the stack and

then subsequently pops them off one by one to the native ones. Note that we set up an exclusion in our

algorithm and disabled optimizations to show assignments to registers in the last virtual instruction

(Figure 31).

Figure 31. Virtual registers of the second machine being mapped back to the native ones at the end of the virtual
instruction


-----

#### Analysis of the virtual context

In this section we analyze the behavior of the first VM based on the results of the The first virtual

_instruction section._

Figure 32 shows:

- virtual registers being pushed onto the stack at the beginning of the outro (red)

- partially the way the next virtual instruction is prepared to be executed (green)

- the virtual program counter being increased (blue)

In particular, the virtual program counter is represented by @64[@64[RBP_init + 0x38] + 0x2C],

where the register at @64[RBP_init + 0x38] holds the address of the virtual context. We can see that

size of the initial virtual instruction was 8 bytes, since the virtual program counter is increased by 8 in

the line highlighted with blue in Figure 32.

Figure 32. Last few virtual instructions executed before mapping the virtual registers back to the native ones

As one can see in Figure 31 (IRDst = @64[RBP_init + 0x74]), the virtual register at offset 0x74

determines IRDst – the address of the next instruction. If we follow the virtual register @64[RBP_init
```
+ 0x74] in Figure 32, we can see that it appears to be preparing to execute the next virtual instruction

```
[similarly to the second VM. Its code slice is the following series of expressions:](https://en.wikipedia.org/wiki/Program_slicing)
```
@64[RBP_init + 0x30] = @64[@64[RBP_init + 0x38] + 0x2C]
@64[RBP_init + 0x30] = @64[RBP_init + 0x30] + 0x2
@64[RBP_init + 0x30] = {@16[@64[RBP_init + 0x30]] 0 16, 0x0 16 64}

```

-----

```
@32[RBP_init + 0x30] = @32[RBP_init + 0x30] + 0x8E839329
@64[RBP_init + 0x30] = @64[RBP_init + 0x30] & 0xFFFF
@64[RBP_init + 0x30] = @64[RBP_init + 0x30] << 0x3
@64[RBP_init + 0xDE] = @64[@64[RBP_init + 0x38] + 0xEE]
@64[RBP_init + 0xDE] = @64[RBP_init + 0x30] + @64[RBP_init + 0xDE]
@64[RBP_init + 0x74] = @64[@64[RBP_init + 0xDE]]

```
The entire slice of @64[RBP_init + 0x30] is meant just to acquire the offset of the next virtual

instruction (opcode): it gets the virtual instruction’s offset from the bytecode whose pointer is stored

in the @64[@64[RBP_init + 0x38] + 0x2C] register, and the offset is subsequently increased by
```
0x8E839329… which could have been omitted and serves solely to obscure the virtual instruction.

```
The virtual register @64[@64[RBP_init + 0x38] + 0xEE] contains the address of the virtual

instruction table. Now it is clear that the first VM is obfuscated using known values from the bytecode

too and that the code indeed executes a next virtual instruction as well – it definitely uses Direct

Threading.

One can additionally see that @64[RBP_init + 0x50] stores the RFLAGS in Figure 32.

#### Behavior

The virtual instruction behaves similarly to the virtual instructions from the second VM – offsets of the

virtual registers to be used are fetched from the virtual instruction’s operands.

Subsequently a virtual register’s value is moved to a memory address stored in another one: [<virt_
```
reg_1>] = <virt_reg_2>. The target register is then either increased or decreased by 8: <virt_
reg_1> = <virt_reg_1> +- 8. This is most likely a PUSH instruction prepared also for environments

```
where the stack grows upwards.

### Initially executed virtual instructions

We will have a look at a few other virtual instructions to confirm our findings and the correctness of

methods for analysis of the first VM. Specifically, the virtual instructions that are initially executed as we

can compare the first VM’s initial behavior to the second VM’s.

#### The first executed virtual instruction

We can see in the highlighted line of Figure 33 that the first executed instruction of the first VM behaves

indeed just like the one in the second VM – it just zeroes out an internal register and prepares another

virtual instruction to be executed.


-----

Figure 33. Zeroing out an internal register


-----

Statistics in Table 3 have been extracted from the processing of the first executed virtual instruction.

Size of the bytecode block in bytes 548

Total number of processed virtual instructions 62

Total number of underlying native instructions 9,444

Total number of resulting IR instructions (including IRDsts) 195

Execution time in seconds 6.4810

Table 3. Statistics of the first executed virtual instruction

#### The second executed virtual instruction

The second virtual instruction just zeroes out several internal registers, which are most likely about to

be used for obfuscation, as in the second VM.

Statistics in Table 4 have been extracted from the processing of the second executed virtual instruction.

Size of the bytecode block in bytes 755

Total number of processed virtual instructions 83

Total number of underlying native instructions 13,740

Total number of resulting IR instructions (including IRDsts) 259

Execution time in seconds 7.7718

Table 4. Statistics of the second executed virtual instruction

#### The third executed virtual instruction

The third virtual instruction behaves just like the third one of the second VM too – it stores the stack

pointer (highlighted in Figure 34). The addition of 0x98 is present due to applied optimizations which

took into account the previously executed POP instructions in the Intro section.

|Size of the bytecode block in bytes|548|
|---|---|
|Total number of processed virtual instructions|62|
|Total number of underlying native instructions|9,444|
|Total number of resulting IR instructions (including IRDsts)|195|
|Execution time in seconds|6.4810|

|Size of the bytecode block in bytes|755|
|---|---|
|Total number of processed virtual instructions|83|
|Total number of underlying native instructions|13,740|
|Total number of resulting IR instructions (including IRDsts)|259|
|Execution time in seconds|7.7718|


-----

Figure 34. Storing the stack pointer in an internal register

Statistics in Table 5 have been extracted from the processing of the third executed virtual instruction.

Size of the bytecode block in bytes 586

Total number of processed virtual instructions 66

Total number of underlying native instructions 10,263

Total number of resulting IR instructions (including IRDsts) 207

Execution time in seconds 6.8428

Table 5. Statistics of the third executed virtual instruction

|Size of the bytecode block in bytes|586|
|---|---|
|Total number of processed virtual instructions|66|
|Total number of underlying native instructions|10,263|
|Total number of resulting IR instructions (including IRDsts)|207|
|Execution time in seconds|6.8428|


-----

#### The fourth executed virtual instruction

We naturally expect this instruction to be a POP as in the second VM; however, it is hard to confirm

statically as the already described obfuscation techniques make it too hard to understand. One can see

part of the virtual instruction in Figure 35.

Statistics in Table 6 have been extracted from the processing of the fourth executed virtual instruction.

Size of the bytecode block in bytes 4,883

Total number of processed virtual instructions 425

Total number of underlying native instructions 71,192

Total number of resulting IR instructions (including IRDsts) 1,038

Execution time in seconds 28.1638

Table 6. Statistics of the fourth executed virtual instruction

Figure 35. Part of the fourth virtual instruction

When we look closely at certain parts of Figure 35, it appears to be able to behave as a POP instruction.

The part of the virtual instruction in Figure 36 clearly behaves just like the fourth one of the second VM

– it moves a value off the top of the stack, and if the target register is different from the stack pointer,

the stack pointer is increased.

Figure 36. Part of the fourth virtual instruction performing a pop-like operation

|Size of the bytecode block in bytes|4,883|
|---|---|
|Total number of processed virtual instructions|425|
|Total number of underlying native instructions|71,192|
|Total number of resulting IR instructions (including IRDsts)|1,038|
|Execution time in seconds|28.1638|


-----

Instruction merging

However, the instruction also seems to be capable of performing a PUSH and other operations as well,

based on the operands (Figure 37), which means that it consists of several other instructions merged

[into one, which is a kind of obfuscation technique. It most likely merged several instructions with different](https://tigress.wtf/merge.html)

rolling keys into one.

Figure 37. Part of the fourth virtual instruction performing a PUSH operation

## AUTOMATING ANALYSIS OF THE FIRST VIRTUAL MACHINE

Now that we know what the internal structure of the first VM is like, we can process the VM as the

second one since analyzing all the virtual instructions would be complicated due to the additional

obfuscation techniques – we can again effectively eliminate them with symbolic execution.

We definitely need to concretize the virtual instruction table and internal registers dedicated for

obfuscation as in the previous one, which is not complicated. The question is: What do we do with the

second VM?

There is a pretty simple solution – instead of preserving the entire context of the second VM and

working with both at once, we can simply concretize the entire second VM as we know what memory

ranges belong to the VMs.

We will also ignore all memory assignments to the second VM’s context and not preserve any

information about its structure. This will enable us to focus only on the first one and build the same

graph as before.

We could also preserve the obfuscated IR of all the virtual instructions of the first VM and use them

instead – it would save a significant amount of time during the processing since we would not

repeatedly disassemble, translate and deobfuscate the second VM for each opcode in the bytecode

blocks of the first VM. However, we want to show that it is possible to process both layers at once.


-----

### Processing the initial bytecode block

We processed the very first bytecode block as was described in the previous section. The resulting code

still appears to be too complex since we expected a series of POPs, the deobfuscated code and then a

series of PUSHes and finally mapping back to the native registers. However, there are additional, multiple

branches. One can see part of the code in Figure 38.

Figure 38. The first processed bytecode block


-----

#### Opaque predicates

Looking at the code more closely, we notice two types of expressions that can be further simplified.

The first is the value of RBP_init, which is the address of the virtual context and it is known (Figure 39).

Figure 39. Expressions that can be further simplified

Both paths that follow the initial block in Figure 39 contain the same code, hence this is not the same case

as with the POP virtual instruction, where it was important to know what the target register was because

it determined the subsequent behavior of the virtual instruction. These checks are, on the other hand,

[unimportant and we can just get rid of them – they can be considered as a sort of opaque predicate.](https://tigress.wtf/addOpaque.html)

Note that the branch of the POP virtual instruction was now optimized out automatically since offsets of

the registers were present in the bytecode and directly known.

Finally, these were the last obfuscation techniques, and we can look at the simplified code.

#### Overview

We are finally greeted with a familiar, even pleasant, view in Figure 40 – as expected the code begins

with a series of POPs (red) and ends with a series of PUSHes (green) that represent parts of the context

switches.

Another interesting detail is that the VM uses a special internal register to store the destination address

– the final jump is not visible, but the code jumps to @64[RBP_init + 0x133]. As was mentioned

earlier, the VM also stores the base address of its code section; this is stored in virtual register @64[RBP_
```
init + 0x80] in our case.

```
One can see that the code in Figure 40 also accesses certain data using the base address, specifically

at offset 0x0E3808 (blue). After looking up the address, we found that it belongs to a ServiceStatus

structure (Figure 41).


-----

Figure 40. Code of the processed bytecode

Figure 41. Data accessed by the code – ServiceStatus


-----

It additionally sets a register before recovering the native state to a data address at offset 0x2FB0

(yellow). The address contains a non-obfuscated function shown in Figure 42.

Figure 42. Function whose pointer is used in the code

Let us now focus on the destination address (gray) – it is set to <base address> + 0x8C038. Looking

up that address in the sample, we see it belongs to the Windows API RegisterServiceCtrlHandlerW,

which makes sense as the application is a service (Figure 43).

Figure 43. Destination address of the bytecode

The question is now, what is the return address of the API call. When we look at the end of the code, we

see that it sets the return address – the highlighted assignment in Figure 44 appears to be 0x88 bytes

above the stack pointer, but we need to keep in mind that we started below the stack pointer because

we did not perform the initial context pushing from vm_init() and in reality, it is the return address.

The return address is set to another vm_pre_init().

Figure 44. Setting return address of the API call

The last part of the code that needs to be analyzed is the body of the loop (Figure 45). It is pretty simple

– it zeroes out a memory range. If we look back at Figure 40 and look up what is in @64[RBP_init +
```
0x74], we see that it is set to the address of the ServiceStatus structure (blue) – this piece of code

```
zeroes out the structure. Meanwhile, @64[RBP_init + 0x4F] (pink in Figure 40) initially contains the

constant 0x1C – size of the structure – and @64[RBP_init + 0xCC], the CPU flags.


-----

Figure 45. Body of the code’s loop

Now we look at the discovered non-obfuscated sample and compare it against our findings. We can

confirm that we deobfuscated the first bytecode block successfully (Figure 46).

Figure 46. The same part of code in the non-obfuscated binary

Statistics in Table 7 have been extracted from the processing of the first bytecode block.

Size of the bytecode block in bytes 695

Total number of processed virtual instructions 62

Total number of underlying native instructions 3,536,427

Total number of resulting IR instructions (including IRDsts) 192

Execution time in seconds 382.5678

Table 7. Statistics of the first processed bytecode block

|Size of the bytecode block in bytes|695|
|---|---|
|Total number of processed virtual instructions|62|
|Total number of underlying native instructions|3,536,427|
|Total number of resulting IR instructions (including IRDsts)|192|
|Execution time in seconds|382.5678|


-----

## DESCRIPTION OF OUR FINAL VM ANALYZER CODE

Our final analyzer code consists of several classes that interact together, as described in the following

[sections. The full code listing is available in our GitHub repository. The classes follow the high-level](https://github.com/eset/wslink-vm-analyzer)

descriptions from the previous Automating analysis sections.

### Class Wslink
```
Wslink is a mediator that handles interaction of the remaining classes, its constructor processes the

```
supplied memory dump, and its method process() accepts the value of the virtual program counter

– pointer to the bytecode – with the opcode of the initial instruction. The bytecode is subsequently

processed using classes VirtualContext, SymbolicCFG and MySymbolicExecutionEngine; the

[resulting control flow graph is written into a DOT file vma.dot.](https://graphviz.org/doc/info/lang.html)

Parts of the VM, such as address of the instruction table or offsets of the virtual registers for

obfuscation, should be overwritten to provide specific values for individual VMs.

### Class VirtualContext

This class represents the virtual context – it contains most notably the initial values of the virtual

registers for obfuscation, virtual program counter, and the address of the instruction table.

It also provides several methods for working with the context described in the following sections.

#### Method get_next_instr()

The method get_next_instr() applies the address of the instruction table to the destination address

to simplify the corresponding expression and attempts to unambiguously determine the address of the

next virtual instruction to be executed.

#### Method get_irb_symbs()

This method simply acquires the expressions that should be preserved in the nodes of the resulting

control flow graph.

#### Method get_updated_internal_context()

The method get_updated_internal_context() updates values of the internal registers that need

to be preserved between virtual instructions, such as the virtual program counter or the obfuscation

registers.

#### Method get_state_hash()

This method calculates a hash for virtual instructions – the hash is used to specify the actual position

in the bytecode to reconstruct the control flow graph without duplicate nodes or paths and to avoid

infinite loops in cycles. It is calculated just from the virtual program counter.

### Class MySymbolicExecutionEngine

This class overrides the method mem_read() of Miasm’s class SymbolicExecutionEngine primarily

to transform memory accesses relative to the virtual program counter and the virtual instruction table

into concrete values. It is additionally meant to make the second VM completely concrete when we are

processing the first one.


-----

### Class SymbolicCFG

This class is meant to construct the resulting control flow graph. It uses class Node to process individual

virtual instructions, to acquire the expressions that need to be preserved, and to determine addresses of

the next virtual instructions.

Each Node is tied to a hash generated by get_state_hash() (as described above) and the address,
```
StateID, of the block of code that is being processed. This means that virtual instructions containing

```
unbounded loops cannot currently be processed correctly because when we connect a state to an

already processed one, it will not take into account the changes introduced in the body of the loop.

### Class Node

This class simply represents a node in the resulting control flow graph – it most notably contains the

values of the obfuscation registers and virtual program counter that are together called init_symbols.

These are the values required to determine the addresses of the next virtual instructions.

It provides a method process_addr() that can get the following Nodes classes that have not yet

been processed and return them along with the expressions that should be preserved in a data-class
```
ContextResult.

```
The new Nodes are created based on the supplied addresses using method _get_next(), which

accepts several arguments. The arguments can instruct the function to slightly modify the resulting
```
Node – make a copy of the actual symbolic state when there is a branch, or update init_symbols for a

```
new virtual instruction.

## FUTURE WORK

Once we discovered a non-obfuscated sample, we were not motivated to completely deobfuscate the

rest of the code.

Our next steps would consist of:

1. Getting rid of the intro and outro and mapping the virtual registers directly to the native ones.

2. Automatically processing the subsequent bytecode blocks and extending the graph with resulting code

listings to get an overview of the whole function.

3. Optionally addressing individual instructions with unbounded loops that cannot be fully processed using

[symbolic execution (e.g., instructions like DEC_RC4 mentioned in Miasm’s blog) and manually creating](https://miasm.re/blog/2016/09/03/zeusvm_analysis.html)

[their IR to be added to the graph. We could also experiment with some enhancements of symbolic](https://is.muni.cz/th/t52nv/trtik_phdThesis.pdf)

execution that attempt to mitigate the issue.

4. Optionally matching resulting IR expressions against known IR expressions of assembly instructions to

recover assembly code.


-----

## CONCLUSION

We have described internals of an advanced multilayered virtual machine featured in Wslink and

successfully designed and implemented a semiautomatic solution capable of significantly facilitating

analysis of the program’s code. This virtual machine introduced several other obfuscation techniques

such as junk code, encoding of virtual operands, duplication of virtual opcodes, opaque predicates,

merging of virtual instructions and a nested virtual machine to further obstruct reverse engineering of

the code that it protects, yet we successfully overcame them all.

To deal with the obfuscation we modified a known technique that extracts the semantics of the virtual

opcodes using symbolic execution with simplifying rules. Additionally, we made concrete the internal

virtual registers for obfuscation along with memory accesses relative to the virtual program counter

to automatically apply known values and deobfuscate semantics of the virtual instructions – this

additionally broke down boundaries between individual virtual instructions. Boundaries are necessary to

prevent path explosion of the symbolic execution; we would lose track of the virtual program counter –

our position in the interpreted code – without them.

We defined new boundaries by symbolizing the address of the virtual instruction table, since it is

required to get the next instruction, and concretized it only when we needed to move to the following

virtual instructions. We subsequently constructed a control flow graph of the original code in an

intermediate representation from one of the bytecode blocks based on the virtual program counter, and

extracted deobfuscated semantics of individual virtual instructions. We finally extended the approach to

process both virtual machines at once by entirely concretizing the nested one.


-----