# Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers ## TOSHINORI USUI, NTT Secure Platform Laboratories/Institute of Industrial Science, The University of Tokyo, Japan ## YUTO OTSUKI, TOMONORI IKUSE, YUHEI KAWAKOYA, MAKOTO IWAMURA, and JUN MIYOSHI, NTT Secure Platform Laboratories, Japan KANTA MATSUURA, Institute of Industrial Science, The University of Tokyo, Japan Script languages are designed to be easy-to-use and require low learning costs. These features provide attackers options to choose a script language for developing their malicious scripts. This diversity of choice in the attacker side unexpectedly imposes a significant cost on the preparation for analysis tools in the defense side. That is, we have to prepare for multiple script languages to analyze malicious scripts written in them. We call this unbalanced cost for script languages asymmetry _problem._ To solve this problem, we propose a method for automatically detecting the hook and tap points in a script engine binary that is essential for building a script Application Programming Interface (API) tracer. Our method allows us to reduce the cost of reverse engineering of a script engine binary, which is the largest portion of the development of a script API tracer, and build a script API tracer for a script language with minimum manual intervention. This advantage results in solving the asymmetry problem. The experimental results showed that our method generated the script API tracers for the three script languages popular among attackers (Visual Basic for Applications (VBA), Microsoft Visual Basic Scripting Edition (VBScript), and PowerShell). The results also demonstrated that these script API tracers successfully analyzed real-world malicious scripts. CCS Concepts: • Security and privacy → **Malware and its mitigation; Software reverse engineering; • Software and** **its engineering →** **Simulator / interpreter; • Computing methodologies →** _Optimization algorithms;_ Additional Key Words and Phrases: Malicious script, dynamic analysis, reverse engineering, function enhancement **ACM Reference format:** Toshinori Usui, Yuto Otsuki, Tomonori Ikuse, Yuhei Kawakoya, Makoto Iwamura, Jun Miyoshi, and Kanta Matsuura. 2021. Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers. Digit. Threat.: Res. Pract. 2, 1, Article 5 (January 2021), 31 pages. [https://doi.org/10.1145/3416126](https://doi.org/10.1145/3416126) Presently, Y. Otsuki is with NTT Security (Japan) KK, Japan. This work was partially supported by JSPS KAKENHI Grant Number JP17KT0081. Authors’ addresses: T. Usui, NTT Secure Platform Laboratories/Institute of Industrial Science, The University of Tokyo, 3-9-11 Midoricho, Musashino-shi, Tokyo, Japan, 180-8585; email: toshinori.usui.rt@hco.ntt.co.jp; Y. Otsuki, T. Ikuse, Y. Kawakoya, M. Iwamura, and J. Miyoshi, NTT Secure Platform Laboratories, Japan; emails: yuuto.ootsuki.uh@hco.ntt.co.jp, tomonori.ikuse.ez@hco.ntt.co.jp, yuuhei.kawakoya.sy@hco.ntt.co.jp, makoto.iwamura.sw@hco.ntt.co.jp, jun.miyoshi.fu@hco.ntt.co.jp; K. Matsuura, Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo, Japan, 153-8505; email: kanta@iis.u-tokyo.ac.jp. [This work is licensed under a Creative Commons Attribution-Share Alike International 4.0 License.](https://creativecommons.org/licenses/by-sa/4.0/) © 2021 Copyright held by the owner/author(s). 2576-5337/2021/01-ART5 [https://doi.org/10.1145/3416126](https://doi.org/10.1145/3416126) Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. # 5 ----- 5:2 - T. Usui et al. ### 1 INTRODUCTION The diversity of script languages creates a blind spot for malicious scripts to hide from analysis and detection. Attackers can flexibly choose a script language to develop a module of their malicious scripts and change scripts for developing another module of them. However, we (security side) are not always well-prepared for any script languages since the development of analysis tools for even a single script language incurs a certain cost. We call this gap of costs between attackers and defenders the asymmetry problem. This asymmetry problem provides attackers an advantage in evading the security of their target systems. That is, an attacker can choose one script language for which a target organization may not be well-prepared to develop malicious scripts for attacking the system without detection. One approach for solving this asymmetry problem is focusing on system-level monitoring such as Windows Application Programming Interfaces (APIs) or system calls. We can universally monitor the behavior of malicious scripts no matter what script languages the malicious scripts are written in if we set hooks for monitoring at the system-level. As long as malicious scripts run on a Windows platform, it has to more or less depend on Windows APIs or system calls to perform certain actions. If we set hooks on each API and monitor the invocations of those APIs from malicious scripts, we can probably comprehend the behavior of these scripts. However, this systemlevel monitoring approach is not sufficient from the viewpoint of analysis efficiency because some script API calls do not reach any system APIs, such as string or object operations. That is, we do not always capture the complete behavior of malicious scripts running on the platform. This lack of captures results in partial understanding of malicious scripts and leads to underestimating the threat of such scripts. Another approach for malicious script analysis is focusing on a specific language and embedding monitoring mechanisms into a runtime environment of the script. This approach resolves the semantic gap problem mentioned above but requires deep domain knowledge to develop a monitoring tool. For example, we have to know both the specifications of a script language and the internal architecture of the script engine to develop a dynamic analysis tool for the script. In addition, this approach supports only a target script language. That is, we need to develop an analysis tool for each script language separately. In summary, we (security side) need an approach universally applicable for any script languages and finegrained enough for analyzing the detailed behavior of a malicious script. However, previous studies satisfied only either of these requirements at the same time. To mitigate the gap between attackers and defenders, we propose a method of generating script API tracers with a small amount of human intervention. The basic idea of our method is to eliminate the knowledge of script engine internals from the requirements for developing analysis tools for a script language. Instead, we complement this knowledge with several test programs written in the script language (test scripts) and run them on the script engine for differential execution analysis [8, 57] to clarify the local functions corresponding to the script APIs, which are usually acquired with the manual analysis of the script engine. Bravely speaking, our method allows us to replace the knowledge of script engine internals with one of the specifications of the script for writing test scripts. Our method is composed of five steps: execution trace logging, hook point detection, tap point detection, hook and tap point verification, and script API tracer generation. The most important function of our method is detecting points called hook points in which the method inserts hooks to append code to script engines for script analysis as well as points called tap points, which are memory regions logged by the code for analysis. Our method first acquires branch traces by executing manually crafted scripts called test scripts, each of which only calls a specific script API of the analysis target. Our method then obtains hook and tap points that correspond to the target script API by analyzing the obtained branch trace with the differential execution analysis-based hook point detection method. By inserting hooks into the hook points that dump the memory of the tap points to logs, our method generates a script API tracer. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:3 Note that we define a script API as a callable functionality provided by a script engine. For example, each builtin function and statement of Visual Basic for Applications (VBA) and Microsoft Visual Basic Scripting Edition (VBScript), such as CreateObject and Eval, and commandlets (Cmdlets) of PowerShell, such as Invoke-Expression, are script APIs. A challenge in this research was efficiently finding the local function that corresponds to the target script API from the large number of local functions of a script engine binary. We addressed this challenge by emphasizing the local function corresponding to the target script API as the difference in branch traces of two scripts that call the target script API different times. To achieve this differentiation, we modified the Smith-Waterman algorithm [44] borrowed from bioinformatics, which finds a similar common subsequence from two or more sequences, to fit it to this problem. Our method does not allow us to directly fulfill the second requirement, i.e., universal applicability. However, we believe that our method allows us to reduce the cost of developing an analysis tool for each script language. Therefore, we can lower the bar for preparing analysis tools for any script languages. We implemented a prototype system that uses our method called STAGER, a script analyzer generator based on engine reversing, for evaluating the method. We conducted experiments on STAGER with VBA, VBScript, and PowerShell. The experimental results indicate that our method can precisely detect hook and tap points and generate script API tracers that can output analysis logs containing script semantics. The hook and tap points are detected within a few tens of seconds. Using the STAGER-generated script API tracers, we analyzed real-world malicious scripts obtained from VirusTotal [1], a malware sharing service for research. The output logs showed that the script API tracers could effectively analyze malicious scripts in a short time. Our method enables the generation of a script API tracer for proprietary script languages for which existing methods cannot construct analysis tools. It can therefore contribute to providing better protection against malicious scripts. Our contributions are as follows. —We first propose a method that generates a script API tracer by analyzing the script engine binaries. —We confirmed that our method can accurately detect hook and tap points within realistic time through experiments. In addition, our method only requires tens of seconds of human intervention for analyzing a script API. —We showed that the script API tracers generated with our method can provide information useful for analysts by analyzing malicious scripts in the wild. This article is an extended version of our previous work [48]. ### 2 BACKGROUND AND MOTIVATION 2.1 Motivating Example Our running example is a malicious script collected from VirusTotal, and its analysis logs acquired using several different script analysis tools. Note that the script analysis tools in this section include all tools that can extract the behavior of scripts regardless of whether they were explicitly designed to analyze scripts. Therefore, system API tracers are included in the script analysis tools in the subsequent sections. Figure 1 shows a malicious script and acquired analysis logs corresponding to it. The upper left, Figure 1(a) shows an excerpt of this malicious script that has more than 1,000 lines of code. As shown in the figure, the malicious script is heavily obfuscated; thus, static analysis is difficult. The upper right, Figure 1(b) shows the deobfuscated script obtained from manual analysis. Since analysts can easily comprehend the behavior of the malicious script, it would be ideal as the analysis log. However, manually analyzing such malicious script is tedious and time consuming and is sometimes nearly impossible depending on the heaviness of the obfuscation. The lower left, Figure 1(c) shows an excerpt of the system API trace log obtained by attaching a system API tracer called API Monitor [5] to the script engine process. This log contains a large number of system API calls that Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- 5:4 - T. Usui et al. Fig. 1. Obfuscated malicious script and its analysis logs acquired from several different script analysis tools. are both relevant and irrelevant to the malicious script. The irrelevant calls are involved in the script engine. Some system API calls that are relevant to remote procedure calls (e.g., component object model (COM) and Windows Management Instrumentation (WMI)) by the malicious script and the ones that are only handled in the script engine (e.g., eval) do not appear in the log. These prevent analysts from comprehending the behavior of the malicious script; therefore, the system API tracer is not appropriate for analyzing malicious scripts. The lower right, Figure 1(d) shows the script API trace log that we aim to create with our method. This log has similar semantics to the one from manual analysis in which analysts can comprehend its behavior through it. Therefore, script API tracers are essential for malicious script analysis. However, building such script API tracer is difficult as discussed in detail in Section 2.3.3. Thus, our goal is to propose a method for easily and systematically building script API tracers that can acquire such logs. ### 2.2 Requirements of Script Analysis Tool We clarify the three requirements that script analysis tools should fulfill from the perspective of malicious script analysis. _(1) Universal applicability. Attackers use various script languages to create their malicious scripts. Hence,_ methods for constructing script analysis tools (hereafter, construction methods) should be applicable to various languages with diverse language specifications. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:5 _(2) Preservability of script semantics. When analyzing scripts, the more output logs lose script semantics, the less_ information analysts can obtain from the logs. Therefore, construction methods should preserve script semantics to provide better information for analysis. _(3) Binary applicability. When constructing script analysis tools of script engines, which are proprietary soft-_ ware (we call them proprietary script engines), their source code is not available. Because attackers often use such proprietary script languages, it is necessary for construction methods to be applicable to binaries. We also discuss what form of logs should be output with script analysis tools. As mentioned in requirement (2), the logs should preserve script semantics. That is, logs that can reconstruct the script APIs, and their arguments that the target script used are desirable. For example, when a script executes CreateObject(WScript.Shell), the corresponding analysis log should contain the script API CreateObject and its argument WScript.Shell. A script API tracer generated with our method outputs such logs. ### 2.3 Design and Problem of Script Analysis Tool _2.3.1_ _Script-level Monitoring._ _Design. Script-level monitoring inserts hooks directly into the target script. Since malicious scripts are gener-_ ally obfuscated, it is difficult to find appropriate hook points inside scripts that can output insightful information for analysts. Therefore, hooks are inserted using a hook point-free method, i.e., by overriding specific script APIs. Listing 1 shows a code snippet that achieves script-level monitoring of a script API eval in JavaScript. In this code, a hook is inserted by overriding the eval function (line 2), which inserts the code for analysis that outputs its argument as a log (line 3). _Problem. There are two problems with script-level monitoring: applicability and stealthiness. Since this design_ requires overriding script APIs, it is only applicable to the script languages that allow overriding of the builtin functions. Therefore, it does not fulfill the requirement of language independence mentioned in Section 2.2. This design is not sufficiently practical for malicious script analysis because few script languages support such a language feature. Listing. 1. Example of script-level monitoring implementation. _2.3.2_ _System-level Monitoring._ _Design. System-level monitoring inserts hooks into system APIs and/or system calls for monitoring their in-_ vocation. It then analyzes scripts by executing the target script while observing the script engine process. _Problem. System-level monitoring causes a problem of a semantic gap due to the distance between the hook_ points in a system and the target scripts. There are two specific problems caused by a semantic gap: avalanche effect and semantic loss. The avalanche effect is a problem that makes an observation capture a large amount of noise, which occurs when one or more layers exist between an observation target and an observation point. Ralf et al. [23] referred to the avalanche effect caused by the existence of the COM layer, and we found that that of the script engine layer also causes the avalanche effect. The main concern with semantic loss is that it decreases information useful for analysts. For example, a script API Document.Cookie.Set, which has the semantics of setting cookies in the script layer, loses some semantics in the system API layer because it is just observed as WriteFile. For these reasons, system-level monitoring does not fulfill the requirement of the preservability of script semantics mentioned in Section 2.2. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- 5:6 - T. Usui et al. Table 1. Summary of Requirements Fulfillment with Each Design Design (1) Universal (2) Semantics (3) Binary Script-level System-level Script engine-level Proposed _2.3.3_ _Script Engine-level Monitoring._ _Design. Script engine-level monitoring inserts hooks into specific functionalities in script engines. Because_ inserting hooks into script engines requires deep understanding of its implementation, there are few methods that can obtain such knowledge. One is analyzing script engines by reading source code or reverse-engineering binaries. Another is building an emulator to obtain a fully understood implementation of the target script engine. Unlike script-level monitoring, script engine-level monitoring is independent of language specifications. It also does not cause a semantic gap, unlike system-level monitoring. _Problem. The problem with this design is its implementation difficulty. Although this design may be easily_ achieved if a script engine provides interfaces for analysis such as Antimalware Scan Interface (AMSI) [34], this is just a limited example. In general, a developer of analysis tools with this design has to discover appropriate hook and tap points for inserting hooks into the target script engine binary. For open source script engines, we can find hook and tap points by analyzing the source code. However, only the limited script languages have their corresponding script engines whose source code is available. In addition, even source code analysis requires certain workloads. Moreover, obtaining the hook and tap points for proprietary script engines requires reverse-engineering, and there is no automatic method for this. In addition, manual analysis requires skilled reverse-engineers and unrealistic human effort. Therefore, this design does not fulfill the requirement of binary applicability mentioned in Section 2.2. ### 2.4 Approach and Assumption Table 1 summarizes how each design fulfills the requirements mentioned in Section 2.2. As mentioned in the previous section, neither script-level nor system-level monitoring can fulfill all the requirements. It is also, in principle, difficult for them to fulfill the requirements through their improvement. The problem with the binary applicability of script engine-level monitoring will be solved if automatic reverse-engineering of script engines is enabled. Therefore, our approach is to automatically obtain information required for hooking by analyzing script engine binaries, which makes it applicable to binaries. When analyzing script engine binaries, we assume knowledge of the language specifications of the target script. This knowledge is used for writing test scripts that are input to script engines during analysis. We do not assume knowledge of internal implementation of the target script engines. Therefore, no previous reverseengineering of the target script engines is required. ### 2.5 Formal Problem Definition A script engine binary B is modeled as a tuple (M, _C) where M is a set of memory blocks associated with E, and_ C is a set of code blocks that implements B. Here, let a, ... ∈ _A be a set of the script APIs of the observing targets,_ _ia, ... ∈_ _IA ⊂_ _C be their corresponding implementation, and ra, ... ∈_ _RA ⊂_ _M be arguments of the script APIs, the_ problem is finding IA and RA from C, M, and A, which is, in general, difficult. Therefore, our goal is to provide a map f : M × C × A → _IA × RA._ Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. |Design|(1) Universal|(2) Semantics|(3) Binary| |---|---|---|---| |Script-level|||| |System-level|||| |Script engine-level|||| |Proposed|||| ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:7 Fig. 2. Overview of our method. ### 3 METHOD 3.1 Overview Figure 2 shows an overview of our method. The main purpose of our method is automatically detecting hook and tap points by analyzing script engine binaries. The method uses test scripts that are input to the target script engine and executed during dynamic analysis of the engine. These test scripts are manually written before using our method. As mentioned above, our method is composed of five steps: execution trace logging, hook point detection, tap point detection, hook and tap points verification, and script API tracer generation. The execution trace logging step first acquires execution traces by monitoring the script engine executing the test scripts. The hook point detection step extracts hook point candidates by the application of our modified Smith-Waterman algorithm to the execution trace obtained in the previous step. After the hook point candidates are obtained, the tap point detection step extracts tap points and confirms the hook point. The verification step tests the detected hook and tap points to avoid false positives of script API trace logs. Using the obtained hook and tap points, the final step inserts hooks into the target script engine and outputs it as a script API tracer. We define hook and tap points as follows. —A hook point is the entry of any local function that corresponds to the target script API in a script engine. —A tap point is defined as any argument of the local function at which the hook point is set. These definitions are reasonable for well-designed script engines. It is normal for such engines to implement each script API in the corresponding local functions for better cohesion and coupling. In the implementation, the arguments of a script API call would be ordinarily passed via the arguments of the local functions. Note that obfuscations, such as control-flow flattening and unreasonable function inlining, are unusual among our analysis targets since they are not malicious binaries. In our method, we let hook points that correspond to target script APIs A be ha0, _ha1, ... ∈_ _HA and tap points be_ _ta0,0,_ _ta0,1, ... ∈_ _TA whose index of each element indicates the script API and the index of its arguments. Therefore,_ Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- 5:8 - T. Usui et al. Fig. 3. Hook and tap points in generic design of script engine. _f : M × C × A →_ _IA × RA ⇒_ _f : M × C × A →_ _HA ×_ _TA. Also, we let a set of test scripts be s0, ... ∈_ _S and the_ execution traces corresponding to it be es0, ... ∈ _ES_ . We locate hook and tap points in a generic script engine for better understanding of what our method is analyzing. Figure 3 depicts generic design of script engines and the hook and tap points in its virtual machine (VM). Recent script engines generally use a VM that executes bytecode for script interpretation. The input script is translated into the bytecode through the analysis phase, which is responsible for lexical, syntactic, and semantic analysis, and the code generation phase, which is responsible for code optimization and generation. The VM executes VM instructions in the bytecode that are implemented as VM instruction handlers by using a decoder and dispatcher. The script APIs, which are generally implemented as functions, are called by the instructions. The hook points are placed at the entry of the functions and the tap points at the memory corresponding to the arguments of the hooked functions. Somestudies [9, 18, 25] identified VM instruction handlers; however, to the best of our knowledge, no studies have been conducted regarding identification of script APIs and their arguments. ### 3.2 Preliminary: Test Script Preparation Test scripts used with our method have to fulfill the following four requirements. (1) A test script executes the target script API with no error. (2) A test script only has the behavior relating to the target script API. It is also allowed to execute script APIs essential for executing the target script API. For example, if the target script API is Invoke (i.e., COM method invocation), CreateObject is essentially required. (3) Two test scripts are required to analyze one target script API. One calls the target script API only once and the other calls it N times. Note that N is a predefined parameter. (4) The arguments of the target script API are arbitrarily defined as long as the script API is not skipped when it is executed multiple times. For example, executing CreateObject multiple times with the same argument may be skipped because copying the existing object instead of creating a new object is a better approach. A test script works as a specifier of the target script API, which our method analyzes. Therefore, it contains only the target script API. For example, when one wants to analyze the local functions regarding the script API _CreateObject and obtain the corresponding hook point, the test script only contains a call of CreateObject such_ as in Listing 2. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:9 Listings 2 and 3 show an example of test scripts for the script API of CreateObject in VBScript. As shown in the scripts, they fulfill the four requirements of the test scripts. They call the target script API CreateObject with no error (requirement 1) and only have the behavior relating to it (requirement 2). They are two test scripts in which one calls the target script API only once and the other calls it three times (requirement 3). The different arguments of WScript.Shell, MSXML.XMLHTTP, and ADODB.Stream are chosen for each call of the target script API so that the calls are not skipped even when they are called multiple times (requirement 4). These test scripts have to be manually prepared before the analysis. Writing test scripts requires knowledge of the language specifications of the target script language, which does not conflict with the assumption given in Section 2.4. The amount of human effort required for preparing test scripts is evaluated in Section 5.8. Since this preparation (manually) converts the target script APIs into the corresponding test scripts, it provides a map д : A → _SA._ Listing. 2. Example of test script for CreateObject in VBScript that calls once. Listing. 3. Example of test script for CreateObject in VBScript that calls three times. ### 3.3 Execution Trace Logging This step acquires the execution traces that correspond to the test scripts for the target script APIs by executing and monitoring the script engine binary. Therefore, it provides a map h : M × C × SA → _ESA_ . An execution trace with our method consists of an API trace and branch trace. The API trace contains the system APIs and their arguments called during the execution. This trace is acquired by inserting code for outputting logs by API hooks and executing the test scripts. The branch trace logs the type of executed branch instructions and their source and destination addresses. This is achieved by instruction hooks, which inserts code for log output to each branch instruction. This step logs only call, ret, and indirect jmp instructions because these types of branch instructions generally relate to script API calls. ### 3.4 Hook Point Detection The hook point detection step uses a dynamic analysis technique called differential execution analysis. This analysis technique first acquires multiple execution traces by changing their execution conditions then analyzes their differences. A concept of this step is illustrated in Figure 4. It is assumed that an execution trace with one script API call differs from another with multiple calls only in the limited part of the trace regarding the called script API. Since we use a branch trace in this step, its analysis granularity is code block-level. Therefore, this step is even effective for script APIs that do not call system APIs. For example of such script APIs, Eval in VBScript, which only interacts with the script engine, does not need to call system APIs. Also, script APIs regarding COM method invocation does not call system APIs. Therefore, system-level monitoring, which uses system API calls as a clue, cannot observe the behavior of these script APIs. However, our method is effective even for these script APIs since this step is independent from system API calls. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- 5:10 - T. Usui et al. Fig. 4. Concept of hook point detection by differential execution analysis. This step uses multiple test scripts, i.e., one that calls the target script API once and the other(s) that calls it multiple times, as described in Section 3.2. This step differentiates the execution traces acquired with these test scripts and finds the parts of the traces related to the target script API that appears in the difference. This differentiation is done by finding common subsequences with high similarity from multiple branch traces. Note that this common subsequence is defined as a subset of branch traces, which appears once in the trace of the test script that calls the target script API once and appears N times in the trace of one that calls it N times. To extract these common sequences, our method uses a modified version of the Smith-Waterman algorithm borrowed from bioinformatics. The Smith-Waterman algorithm performs local sequence alignment, which extracts a subsequence with high similarity from two or more sequences. However, we have a problem in that it does not take into account the number of common subsequences that appeared; therefore, we modified it to take this into account. We first explain the original Smith-Waterman algorithm, then introduce our modified version. The SmithWaterman algorithm is a sequence alignment algorithm based on dynamic programming (DP) that can detect a subsequence of the highest similarity appearing in two or more sequences. This algorithm uses a table called a DP table. In a DP table, one sequence is located at the table head, another is located at the table side, and each cell contains a match score. A match score F (i, j) of cell (i, j) is calculated based on Equation (1), where i is the index of rows and j is the index of columns. 0 _F (i −_ 1, j − 1) + s (i, j) _F (i −_ 1, j) + d _F (i, j −_ 1) + d, ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩ � 2 (match) (2) −2 (unmatch) (1) where _F (i, j) = max_ _s_ (i, j) = _d = −1_ (3) Our modified algorithm is the same as the original up to filling all cells of the DP table. We provide an example of a DP table in Figure 5 for further explanation. A sequence of A, B, and C in this figure indicates one of the gray boxes in Figure 4. The letter S indicates the white box that appears at the start of the execution trace, whereas E indicates the white box at the end. The letter M denotes the white boxes that appear between the gray boxes as margins. Although, these elements actually consist of multiple lines of branch trace logs; they are compressed as A, B, and so on, for simplification. The original Smith-Waterman algorithm only finds the common subsequence of the Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:11 Fig. 5. Modified Smith-Waterman algorithm. highest similarity (S, A, B, and C with dotted line in Figure 5) by backtracking from the cell with the maximum score (the cell with score 8 in Figure 5). After finding one such sequence, it exits the exploration. After this procedure, the modified Smith-Waterman algorithm performs further exploration. Algorithm 1 shows our modified Smith-Waterman algorithm. This algorithm repeatedly extracts subsequences of high similarity from the rows that are the same as the common subsequence extracted with the original algorithm (i.e., the dashed rounded rectangle in Figure 5). This is done by finding the local maximum value from the rows and backtracking from it. The modified algorithm repeats this procedure N times to extract N common subsequences (the three dotted circles in Figure 5). If the similarity among the subsequences exceeds the predefined threshold, the algorithm detects the branches constructing the subsequence as hook point candidates. Otherwise, it examines the cell with the next highest score. Algorithm 1 shows the detail of the modified Smith-Waterman algorithm. This step provides a map k : ESA1 × ESAN → _HA where SA1 ⊂_ _SA indicates the test scripts that call the target_ script API once and SAN does those that call twice. ### 3.5 Tap Point Detection The tap point detection step plays two important roles. The first is to select the final hook points from the hook point candidates obtained in the previous step. The second is to find the memory regions that should be dumped Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- 5:12 - T. Usui et al. **ALGORITHM 1: Modified Smith-Waterman algorithm** **Require: seq1,** _seq2, N_, _threshold_ **Ensure: result_seqs** _dptbl ⇐_ DPTable(seq1, _seq2).fillCell()_ _i ⇐_ 1 **repeat** _result_seqs ⇐_ [] _max_cell ⇐_ _dptbl_ .searchNthMaxCell(i) _max_seq ⇐_ _dptbl_ .backtrackFrom(max_cell ) _result_seqs.append(max_seq)_ _rows ⇐_ _dptbl_ .getSameRows(max_seq)j ⇐ 1 **for n = 1 to N do** **repeat** _max_cell ⇐_ _dptbl_ .searchNthMaxCellInRows(j, _rows)_ _max_seq ⇐_ _dptbl_ .backtrackFrom(max_cell ) _j ⇐_ _j + 1_ **until isNotSubseq(max_seq,** _result_seqs)_ _result_seqs.append(max_seq)_ **end for** _min_similarity ⇐_ 1.0 **for seq1 ∈** _result_seq do_ **for seq2 ∈** _result_seq do_ _similarity ⇐_ calcSimilarity(seq1, _seq2)_ **if similarity < min_similarity then** _min_similarity ⇐_ _similarity_ **end if** **end for** **end for** _i ⇐_ _i + 1_ **until min_similarity > threshold** into logs. Such memory regions have two patterns: arguments and return values of script APIs. This step provides a map l : M × C × SA × HA → _TA._ _3.5.1_ _Argument. This step adopts a value-based approach that finds the matched values between the test_ script and the memory region of the script engine process. If an argument value of the script APIs in the test scripts also appears in a specific memory region, the location of the memory region is identified as a tap point. Tap point detection for arguments of script APIs is carried out by exploring the arguments of the local functions detected as hook point candidates. To do this, this step acquires the execution trace again with hooks inserted into the hook point candidates obtained in the previous step. The arguments of the hook point candidates are available by referring to the memory location based on the calling convention. Since the type information (e.g., integer, string, and structure) of each argument is not available, further exploration requires heuristics. Figure 6 illustrates the exploration heuristics used with this step. First, if an argument of a hook point candidate is not possible to be dereferenced as a pointer (i.e., the pointer address is not mapped), this step regards it as a value of primitive types. Otherwise, this step regards it as a pointer value and dereferences it. When an argument is regarded as a value, we consider the value as the various known types including the known structures for matching. In addition, this step also regards a pointer as the one pointing a structure with the predefined size and alignment to explore the unknown user-defined structures. As a result of this exploration, if the arguments Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:13 Fig. 6. Concept of tap point detection. in the test script are observed as the arguments at a hook point candidate, this step regards the candidate as legitimate and determines the memory region of the argument as a tap point. This exploration is improved if the type information is available. Therefore, this step may explore the memory regions more precisely by applying research conducted on reverse-engineering type information such as Laika [10], Type Inference on Executables (TIE) [29], Howard [43], Reverse Engineering Work for Automatic Revelation of Data Structures (REWARDS) [31], and Argos [56] or that on predicting type information such as Debin [20] and TypeMiner [33]. _3.5.2_ _Return Value. There are two problems with tap point detection for return values of script APIs. The first_ is that return values in test scripts tend to have low controllability. As mentioned in Section 3.5, tap point detection uses matching between the values in a test script and those in script engines. If a value in a test script is hardly controllable (e.g., it will always be 0 or 1), its matching would be more difficult than that with controllable values. The second problem is a gap between a script and script engine. Due to this gap, how a variable is managed in a script and script engine may differ. This makes the return values in scripts and actual values in script engines different. For example, an object in a script engine returned by an object creation function may be returned as an integer that indicates the index of an object management table in scripts. We use value-based detection in a similar manner as tap point detection for arguments. The difference is the entry point of the exploration. Since return values of script APIs may be passed through the return value and output arguments of the corresponding function in the script engine, the proposed method begins to explore from them. If the return value in the test script does not appear in the script engine, the proposed method tentatively regards the return value of the hook point function as that of script APIs. ### 3.6 Hook and Tap Point Verification After hook and tap point detection, verifying their effectiveness is an important step. We define false positives (FPs) and false negatives (FNs) in the context of script API tracing regarding hook and tap points as follows. FPs indicate the log lines of called script APIs that are NOT actually called by the target script regarding the hook and tap points. FNs indicate the script APIs missing in the log lines, which are actually called by the target script. Figure 7 shows an example case that produces an FP. In this figure, the hook and tap points for script API A are set at the function dispatch and its argument, which are actually shared between script API A and script API _B. The hook with the points can log script API A calls; however, a call of script API B is also logged incorrectly at_ Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- 5:14 - T. Usui et al. Fig. 7. False positive case. the time script API A is called. Therefore, this hook is inappropriate since it produces FPs. This problem is caused by the fact that the proposition “hook and tap points are appropriate → a correct script API log to a test script is available” is true, whereas its converse is false. For many hook and tap points and test scripts, the converse is also true. However, a counter example shown in Figure 7 exists. Since our method implicitly depends on the converse, it would be a pitfall that causes the FP case on rare occasions. To avoid this, this step verifies the hook and tap points selected for a script API and reselects the others from the candidates if the FPs are produced during verification. The verification uses multiple scripts called verification scripts that call the target script. The only requirement of these scripts is that they contain a call of the target script API whose arguments are comprehensible. Therefore, since verification scripts do not have to fulfill the complexed requirements like test scripts, they are automatically collectable from websites on the Internet such as official documents of the target script language and software development platforms like GitHub [17]. Note that since the verification depends on the corrected verification scripts, it reduces FPs on a best effort basis. This step first extracts the script API calls and their arguments from the corrected scripts. Since benign scripts corrected from the Internet are not generally obfuscated, the extraction is done with no difficulty by static analysis. This step then executes the scripts with the generated script API tracer to obtain analysis logs. If the difference between the script API calls extracted from the verification scripts and those from the analysis logs is observed, the verification is failed and the other hook and tap point candidates are reselected. Through this step, our method can experimentally select the hook and tap points that produce fewer FPs. ### 3.7 Script API Tracer Generation We use the hook and tap points obtained in the above steps for appending script API trace capability to the target script engines. By using the maps h, _д,_ _k,_ _l that are provided in the above sections and the inputs of our method_ _B (i.e., (M,_ _C)) and A, the method can construct the map of the goal f : M × C × SA →_ _HA ×_ _TA. Therefore, this_ step can use the hook and tap points that corresponds to the target script APIs obtained with the above steps. Our method hooks the local functions that correspond to the hook points and inserts analysis code. Note that a hook point indicates the entry of a local function that is related to a script API, as mentioned in Section 3.1. The analysis code dumps the memory of the tap points with the appropriate type into the analysis log. This code insertion is achieved using generic binary instrumentation techniques. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:15 Although execution trace logging step uses instruction-level hooking, script API tracer generation step generates script API tracers by using function-level hooking. The former step requires instruction-level hooking for exhaustively capturing all branches executed in the script engine binaries. However, as the definitions of hook and tap points in Section 3.1 indicate, they are located at the function entry and its arguments; the latter step is done only with function-level hooking. ### 4 IMPLEMENTATION To evaluate our method, we implemented it in a prototype system called STAGER, which is a script analyzer generator based on engine reversing. STAGER uses Intel Pin [32] to insert instruction-level hooks into the target script engine for acquiring execution traces. Intel Pin is a dynamic binary instrumentation framework that uses dynamic binary translation with a VM. _STAGER enumerates symbols of the system libraries in the target script engine process and inserts hooks into_ them for obtaining called system APIs and their arguments. It also hooks an instruction ins executed in the target script engine process when one of the following conditions is true. — INS_IsIndirectBranchOrCall(ins) && INS_IsBranch(ins)} — INS_IsCall(ins) — INS_IsRet(ins) As mentioned in Section 3, our method hooks detected hook and tap points with function-level hooking. Although Intel Pin also provides a function-level hooking feature with dynamic binary translation, it generally has a heavier overhead than the one with inline hooking. Therefore, STAGER uses Detours [39], which provides an inline hooking feature, for generating script API tracers. Detours is a dynamic binary instrumentation framework that enables inline hooking of functions. Although its main target of hooking is Windows APIs, it is also applicable to hook local functions that have known addresses and arguments. Our script API tracer is implemented as a dynamic link library (DLL), which is preloaded into the process of the target script engine. It reads the configuration file in which hook and tap points are written and inserts inline hooks regarding them into the script engine with Detours. It is universally applicable to various script engines by using the corresponding configuration files. Since STAGER automatically detects the hook and tap points and outputs it to the configuration file, the script API tracer is easily generated for the script engines that STAGER analyzed. ### 5 EVALUATION We conducted experiments on STAGER to answer the following research questions (RQs). — **RQ1: What is the accuracy of hook and tap point detection with STAGER?** — **RQ2: How much performance overhead does STAGER introduce to generate a script API tracer?** — **RQ3: Is the STAGER-generated tracer applicable to malicious scripts in the wild?** — **RQ4: How many FPs and FNs does the script API tracer, generated with STAGER (STAGER-generated** tracer), produce? — **RQ5: How well does the STAGER-generated tracer work compared with existing analysis tools?** — **RQ6: How much overhead do the STAGER-generated tracers produce?** — **RQ7: How much human effort is required to prepare test scripts?** ### 5.1 Experimental Setup Table 2 summarizes the experimental setup. We set up this environment as a VM. One virtual Central Processing Unit (CPU) was assigned to this VM. Although STAGER is more beneficial for proprietary script engines, we applied it to both open source and proprietary script engines. Open source engines are used because we can easily confirm the correctness of the Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- 5:16 - T. Usui et al. Table 2. Experimental Environment OS Windows 7 32-bit CPU Intel Core i7-6600U CPU @ 2.60GHz RAM 2GB VBA VBE7.dll (Version 7.1.10.48) VBScript vbscript.dll (Version 5.8.9600.18698) VBScript vbscript.dll (ReactOS 0.4.9) PowerShell PowerShell 6.0.3 hook and tap points. Note that the source code is only used for confirming the results, and STAGER did not use it for its analysis. Therefore, the analysis with STAGER is done in the same manner as that of proprietary script engines. In addition, proprietary engines are used to confirm the effectiveness of STAGER for real-world proprietary engines. For open source script engines, we used VBScript implemented in ReactOS project [38] and PowerShell Core [47], which is an open source version of the PowerShell implementation. We selected these script engines for the experiments because both have open source implementation of proprietary script engines and their supporting languages are frequently used by attackers for writing malicious scripts. For VBScript of ReactOS, we extracted vbscript.dll from ReactOS and transplanted it into the Windows of the experimental VM environment because Intel Pin used by STAGER does not work properly on ReactOS. For the proprietary script engines, we used Microsoft VBScript and VBA implemented in Microsoft Office. These script engines were also selected because they are widely used by attackers. When we analyze the script engine of VBA, we first execute Microsoft Office and observe its process during the execution of the attached script. ### 5.2 Detection Accuracy To answer RQ1, we evaluated the detection accuracy of the hook and tap point detection steps. We detected hook and tap points of VBA, VBScript, and PowerShell using STAGER. We selected script APIs that are widely used by malicious scripts for the target of hook and tap point detection. VBA and VBScript were designed to use COM objects for interacting with the OS, instead of directly interacting with it. Therefore, malicious scripts using VBA and VBScript use script APIs related to COM object handling. In addition, VBA has useful script APIs and VBScript has those of reflection such as Eval and Execute, used for obfuscation. PowerShell has script APIs called Cmdlets that provide various functionalities including OS interaction. We selected Cmdlets of object creation, file operation, process execution, internet access, reflection, and so on, which are often used by malicious scripts. We set 0.8 as the threshold of the similarity of subsequences used for differential execution analysis-based hook point detection. This threshold was defined on the basis of the manual analysis of the DP tables in a preliminary experiment conducted separately from this one. Because the DP tables had a similar pattern, we found this threshold could be used globally. Table 3 shows the results of the experiments. The Original Points column shows the number of branches obtained by the branch traces. The Hook Point Candidates column shows the number of hook point candidates filtered by hook point detection. The Hook and Tap Point Detection column has ✓ if the final hook and tap points were obtained. The Log Availability column has ✓ if the obtained hook and tap points output the correct log corresponding to the known scripts. For VBA and VBScript, STAGER could accurately detect all hook and tap points that can output logs showing the script APIs and their arguments. Despite the large number of obtained branches, STAGER could precisely filter the branches that are irrelevant to the target script APIs. This showed that STAGER is applicable to real-world proprietary script engines to generate the corresponding script API tracers. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. |OS|Windows 7 32-bit| |---|---| |CPU|Intel Core i7-6600U CPU @ 2.60GHz| |RAM|2GB| |VBA|VBE7.dll (Version 7.1.10.48)| |VBScript|vbscript.dll (Version 5.8.9600.18698)| |VBScript|vbscript.dll (ReactOS 0.4.9)| |PowerShell|PowerShell 6.0.3| ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:17 Table 3. Result of Hook and Tap Point Detection Hook Point Hook and Tap Script Script API Original Points Candidates Point Detection Log Availability CreateObject 93,000,090 53 Invoke (COM) 101,993,701 98 VBA Declare 94,281,492 34 Open 85,641,170 42 Print 90,024,821 29 CreateObject 390,836 48 Invoke (COM) 1,148,225 92 VBScript Eval 369,070 121 Execute 371,040 134 CreateObject 89,213 32 VBScript Invoke (COM) 128,511 43 (ReactOS) Eval - - Not applicable Not applicable Execute - - Not applicable Not applicable New-Object 210,852 54 Import-Module 185,192 48 New-Item (File) 198,327 93 PowerShell Set-Content (File) 200,822 54 Start-Process 152,841 119 Invoke-WebRequest 315,380 98 Invoke-Expression 271,054 82 _STAGER could also detect CreateObject and Invoke on VBScript of ReactOS. However, it was not applicable for_ detecting the hook points of Eval and Execute because the VBScript in ReactOS has just mocks of them, which have no actual implementation. We checked the source code to confirm the corresponding location of the detected hook points. The hook was inserted into the local function of create_object, which definitely creates objects. We found that the hook was inserted into the local function of disp_call, which is responsible for invocation of the IDispatch::Invoke COM interface. As shown in Table 3, STAGER also detected proper hook and tap points for PowerShell. A notable difference among the script engines of PowerShell and the others is the existence of an additional layer: a common language infrastructure (CLI). PowerShell uses a CLI of the Microsoft .NET Framework, which is an additional layer between the OS and script engine. Since STAGER properly found the hook and tap points of PowerShell with bytecode analysis, we confirmed that it works even for script engines with an additional layer such as a CLI layer. Overall, STAGER could properly detect all hook and tap points in all VBA, VBScript, VBScript (ReactOS), and PowerShell script engines except Eval and Execute of VBScript (ReactOS), which were not implemented. ### 5.3 Performance To answer RQ2, we evaluated the performance of STAGER by measuring the execution duration of each of its steps. Figure 8 shows the results. Note that the execution time in this figure does not include the time for preparing test scripts because it should be manually created before the execution. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. |Script|Script API|Original Points|Hook Point Candidates|Hook and Tap Point Detection|Log Availability| |---|---|---|---|---|---| |VBA|CreateObject|93,000,090|53||| ||Invoke (COM)|101,993,701|98||| ||Declare|94,281,492|34||| ||Open|85,641,170|42||| ||Print|90,024,821|29||| |VBScript|CreateObject|390,836|48||| ||Invoke (COM)|1,148,225|92||| ||Eval|369,070|121||| ||Execute|371,040|134||| |VBScript (ReactOS)|CreateObject|89,213|32||| ||Invoke (COM)|128,511|43||| ||Eval|-|-|Not applicable|Not applicable| ||Execute|-|-|Not applicable|Not applicable| |PowerShell|New-Object|210,852|54||| ||Import-Module|185,192|48||| ||New-Item (File)|198,327|93||| ||Set-Content (File)|200,822|54||| ||Start-Process|152,841|119||| ||Invoke-WebRequest|315,380|98||| ||Invoke-Expression|271,054|82||| ----- 5:18 - T. Usui et al. Fig. 8. Execution duration of our method. Execution trace logging and tap point detection required about 10 seconds due to the overhead of execution and log output with Intel Pin. Backtrace performed just a little exploration of execution trace; therefore, it took little time. On the other hand, differential execution analysis took about 5 seconds. The computational complexity of the Smith-Waterman algorithm is O (MN ), where the length of one sequence is M and the other sequence is _N_ . Thus, the longer the execution trace becomes, the longer the execution duration will be. Overall, hook and tap point detection for one script API took about 30 seconds. The total number of script APIs in a script language, for example in VBScript, is less than one hundred according to the language specifications, and the script APIs of interest for malicious script analysis are limited. Therefore, the proposed method could quickly analyze script engines and generate a script API tracer, which is sufficient for practical use. ### 5.4 Analysis of Real-world Malicious Scripts To answer RQ3, we applied the script API tracers generated by STAGER for analyzing malicious scripts in the wild. We collected 205 samples of malicious scripts that were uploaded to VirusTotal [1] between 2017/1 and 2017/7. We then analyzed them using the script API tracers. We found that the script API tracers could properly extract the called script APIs and their arguments executed by the malicious scripts. We investigated the URLs obtained as arguments of script APIs. All were identified as malicious (positives > 1). We also investigated the file streams of the script API arguments. The results of this investigation indicated that the streams were ransomware such as Dridex. We also confirmed that the script API tracers generated by STAGER are applicable to real-world malicious scripts. We selected four samples and their analysis logs as case studies. The first is a VBA Injector, the second is a VBScript downloader, the third is a PowerShell fileless malware module, and the last is an evasive malicious script. _5.4.1_ _Case Study 1: VBA Injector. Figure 9 shows the analysis log of a VBA injector generated by a script API_ tracer. This malicious script uses the Declare statement that loads a library and resolves a procedure in it to call Windows APIs. It first creates a process of rundll32.exe in a suspended state. It then allocates 0x31c bytes of memory with write and execute permission and writes code of the size to the memory byte-by-byte. Finally, a remote thread that executes the written code in the process is created. As shown in the figure, the script API tracer could generate a log that only contains the APIs called from the input script through the Declare statement, Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:19 Fig. 9. Analysis log of VBA injector acquired with STAGER-generated script API tracer. Fig. 10. Analysis log of VBScript downloader acquired with STAGER-generated script API tracer. whereas the system API tracer in Figure 1 generated one containing APIs called from both the input script and script engine. This can significantly help analysts comprehend the behavior of malicious scripts. _5.4.2_ _Case Study 2: VBScript Downloader. Figure 10 shows the analysis log of a VBScript downloader_ generated by a script API tracer. Although this malicious script has 1,500+ lines of obfuscated code, the log consists of only 16 lines, which are responsible for the main behavior of downloading. Section (1) in the figure shows a part of the log in which the malicious script accessed a URL. Section (2) shows that the script saved the HTTP response to a specific file in the Temp folder. The saved buffer is also visible as a byte array of 0x3c 0x68 .... Section (3) shows that the saved file was executed through cmd.exe. As shown in this figure, the script API tracers generated by STAGER could successfully extract important indicators of compromise (IOCs) such as URLs, binaries, file paths and executed commands. Note that the log fulfills the requirement of the preservability of semantics mentioned in Section 2.2. _5.4.3_ _Case Study 3: PowerShell Fileless Malware. Figure 11 shows an excerpt of the analysis log of a module_ used by PowerShell fileless malware. This module seems to retrieve additional PowerShell modules from the Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- 5:20 - T. Usui et al. Fig. 11. Analysis log of PowerShell fileless malware acquired with STAGER-generated script API tracer. Fig. 12. Analysis log of evasive malicious script acquired with STAGER-generated script API tracer. C&C server and execute it. Section (1) in this figure shows the spawn of a new PowerShell process with commands used for Web access. We can see the executed command in deobfuscated form. Section (2) shows the simple downloading of the additional code using a system Web proxy. Section (3) shows the execution of the retrieved additional PowerShell code with the reflection function Invoke-Expression. In addition to Case Study 1, we can understand what code is dynamically evaluated by reflection functions. This will help malware analysts understand the behavior of malicious scripts. _5.4.4_ _Case Study 4: Evasive Malicious Script. Although STAGER-generated script API tracers have no anti-_ evasion feature, it can even help analysts understand the root cause of evasion. To demonstrate this, we chose an evasive malicious script obtained from VirusTotal and analyzed it with a STAGER-generated script API tracer. Figure 12 shows the analysis log of the evasive sample in VBA. Due to the evasion, the only behavior captured by the tracer was sending a ping to a host and obtaining its status code through winmgmts, which is WMI. However, the analyst can even obtain a clue that the status code may be relevant to the evasion mechanism. As Yokoyama et al. [54] suggested, evasive malware (including malicious scripts) have to obtain information of the executed environment (in this case, the status code) to determine whether they run or evade. In general, script API invocation is required for achieving it in terms of malicious scripts. Therefore, the tracer can help analysts to reveal evasive mechanisms of malicious scripts. ### 5.5 False Positives and False Negatives To answer RQ4, we tested the number of FPs and FNs produced by the hook and tap points of the STAGERgenerated script API tracers by analyzing known malicious scripts. We know we could evaluate only partial FPs and FNs; however, we conducted this because exhaustively evaluating the number of FPs and FNs is difficult. FPs indicate the log lines of called script APIs that are NOT actually called by the target script regarding the hook and tap points. FNs indicate the script APIs missing in the log lines, which are actually called by the target script regarding the hook and tap points. The script API tracers used for this experiment have tracing capability of the script APIs shown in Table 3. We used five samples whose called script APIs are known from manual analysis. The results indicated that the hook and tap points produced neither FPs nor FNs. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:21 Table 4. Comparison with Existing Tracers Tracer Observed behaviors Log lines Failure rate API Monitor 0.25 10,000+ 0 ViperMonkey 0.8 16 0.6 _STAGER-generated_ 1 20 0 ### 5.6 Comparison with Existing Tracer To answer RQ5, we compared STAGER-generated script API tracers with two existing tracers: API Monitor [5] and ViperMonkey [28]. API Monitor is a system API tracer based on system-level monitoring. We enabled all system API hooks of API Monitor and made it observe the target script engine process. ViperMonkey is a script API tracer for VBA based on script-level monitoring using the VBA emulator. To evaluate them under the same condition, we gathered VBA malicious scripts since ViperMonkey is a tracer of the scrip APIs of VBA. Therefore, we generated a script API tracer for VBA with STAGER (STAGER-generated tracer). We randomly chose five samples from the dataset and manually analyzed them to create ground truth. The evaluation was conducted from three viewpoints: amount of properly observed behavior, average number of log lines, and analysis failure rate. Table 4 shows the results of the experiment. Note that the results in the columns of observed behavior and log lines of ViperMonkey were calculated only with the samples that were analyzed successfully. API Monitor could only observe a small amount of behavior because some behavior such as COM method invocation and reflection cannot be directly observed through system APIs. In addition, it produced a large number of log lines that are irrelevant to the behavior of the samples because it cannot focus only on their behavior. The log lines include the behavior derived from the script engines, as well as that derived from the samples. In other words, the avalanche effect mentioned in Section 2.3.2 occurred. ViperMonkey failed to analyze three samples due to insufficient implementation of the VBA emulator. When it failed to parse the samples, it terminated execution with an error. ViperMonkey missed some behavior because of the lack of the hooked script APIs. The STAGER-generated tracer did not fail to analyze the samples. This is because it uses the real script engine of VBA and its instrumentation does not ruin the functionality of the engine. It could observe the entire behavior with few lines of logs that properly focused on the script APIs of the samples. ### 5.7 Performance of Generated Script API Tracer To answer RQ6, we evaluated the performance of the STAGER-generated script API tracers. We measured the execution duration of the script API tracers while analyzing the test and malicious scripts. In addition, we measured that of vanilla script engines for comparison. We measured the execution duration from the process start of the script engine until its end. Since VBA malicious scripts do not terminate the process even after script execution, we inserted the code that explicitly exits the process. Figure 13 shows the result of these measurement. The analysis with the STAGER-generated script API tracers took 1.51, 0.62, and 1.27 seconds per file (sec/file) on average for VBA, VBScript, and PowerShell malicious scripts. Overall, it takes about 1.2 sec/file in average. Therefore, the STAGER-generated script API tracers can analyze about 72,000 files per day per VM instance. Note that the time required for reverting the VM was not taken into account. The STAGER-generated script API tracers have only about 10% overhead compared with vanilla script engines. This result is natural because the STAGER-generated script API tracers require additional time only when the script APIs are called, which costs little overhead of memory and file input/output (I/O) operations for logging. This shows that the STAGER-generated script API tracers can execute malicious scripts almost as quick as vanilla script engines, which in turn indicates that the STAGER-generated script API tracers are quick dynamic analysis tools. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. |Tracer|Observed behaviors|Log lines|Failure rate| |---|---|---|---| |API Monitor|0.25|10,000+|0| |ViperMonkey|0.8|16|0.6| |STAGER-generated|1|20|0| ----- 5:22 - T. Usui et al. Fig. 13. Execution duration of STAGER-generated script API tracers and vanilla script engines. Table 5. Lines of Code (LOC) of Test Scripts Script languages Average LOC VBA 3.8 VBScript 2.75 PowerShell 2 ### 5.8 Human Effort To answer RQ7, we conducted an experiment to evaluate the amount of human effort required to prepare test scripts. We evaluated this from two perspectives: lines of code (LOC) of test scripts and required time to create them. We gathered 10 people (eight graduate students, one technical staff member, and one visiting researcher) belonging to the computer science department as the participants of this experiment. We then explained the concept and requirements of the test scripts described in Section 3.2 to them. We asked them to write valid test scripts while measuring the required time. The list of script APIs to be written in the test scripts are provided to them in advance. The list, which is composed of script APIs frequently used by malicious scripts, is identical to the one used for the evaluation of the detection accuracy in Section 5.2. Many did not have experience of writing the script languages of VBA, VBScript, and PowerShell. Therefore, we asked them to spend some time learning the language specifications since we assume that test script writers have knowledge on the target language. Note that we confirmed that all the created test scripts argued below are valid with STAGER. Table 5 shows the average LOC of the created test scripts for each language. The LOC of the test scripts for each language are within the range of 2 to 3.8. This indicates that test scripts that our method uses are just simple ones. Figure 14 shows the average time required for creating test scripts for each language. The average required time per script API was 36.6 seconds for VBScript, 42.6 seconds for VBA, and 42.6 seconds for PowerShell. The average time for all languages was about 59.5 seconds. These results indicate that writing valid test scripts takes less time for programmers who have knowledge of the target script language. Therefore, the amount of human Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. |Script languages|Average LOC| |---|---| |VBA|3.8| |VBScript|2.75| |PowerShell|2| ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:23 Fig. 14. Required time for test script preparation. effort required for using STAGER is much less than manual reverse-engineering of script engines since manual reverse-engineering requires weeks or months of analysis time. ### 6 DISCUSSION 6.1 Limitations We discuss four cases in which our method cannot detect hook and tap points. The first is that in which the target script API does not have arguments to which we can set arbitrary values. Since tap point detection uses argument matching, which is based on setting unique arguments, this detection fails in principal if this matching is not available. The second is that in which the target script API contains only a small amount of program code. In this case, hook point detection by differential execution analysis might not be applicable because the difference is not well observed. However, since it is difficult for such simple script APIs to achieve significant functionality, they would not be interesting targets for malware analysts. The third is that the script engine is heavily obfuscated for software protection. For example, when the control flow graph is flattened to implement the script engine with one function, the proposed method cannot accurately detect hook points. Nevertheless, such obfuscated implementation is rarely seen in recent script engines, to the best of our knowledge. The last is script APIs that produce false positives and are rarely used in the real-world scripts. As described in Section 3.6, verification scripts are required to reduce the false positives. However, if the script APIs are rarely used, collecting the verification scripts from the Internet is difficult. Since the verification is best effort basis that depends on the collected verification scripts, such script APIs would be a limitation of our method. ### 6.2 Just-In-Time Compilation Many existing script engines have Just-In-Time (JIT) compilation functionality that translates repeatedly executed bytecode into native code for accelerating its execution. We investigated JIT compilation mechanisms of existing script engines to understand how this JIT compilation affects hook and tap points of script APIs. The mechanisms indicate that the existence of script API inlining is key. We thus discuss both patterns below: JIT compilation with and without inlining of script APIs. Figure 15 shows a generic mechanism of JIT compilation without inlining. As shown in the figure, this mechanism only translates bytecode regarding VM instructions into native code. In this case, the native code continues to call the script APIs implemented in the script engine. Therefore, the script API hooks properly work without changing the hook and tap points even after JIT compilation. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- 5:24 - T. Usui et al. Fig. 15. Generic mechanism of JIT compilation without inlining. Fig. 16. Generic mechanism of JIT compilation with inlining. Figure 16 shows a generic mechanism of JIT compilation with inlining. This mechanism inlines the called script APIs into the native code generated by JIT compilation. During JIT compilation, the code that is implementing the called script APIs is copied into the native code. When the inlined script APIs are executed, the script APIs in the script engine at which the script API hooks are set are not called. Therefore, hooking the hook and tap points generated with our method cannot acquire script API trace logs. This problem is solved by slight modification for tracking the copy of script APIs and propagating the corresponding script API hooks. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:25 Overall, our method is not affected by JIT compilation, or even it is affected, we can handle it with a slight modification on the implementation of the generated script API tracers. Therefore, JIT compilation is not a limitation of our method. ### 6.3 Human-assisted Analysis Although our method introduces automatic detection of hook and tap points, it is also helpful for its analysis to be assisted by humans. In particular, human-assisted analysis is beneficial for the case in which tap point detection does not work in principal. One such case is that human assistance can eliminate the first limitation discussed in the previous section. Our method identifies tap points by matching values in test scripts and functions arguments in script engines without taking into account any semantics regarding the values. However, manual analysis can take into account the semantics of values. Therefore, it is possible to discover tap points using the semantic information even when value matching is not available. In addition, since manual analysis by humans can provide better type information of variables by analyzing how the variables are used, the exploration for tap point detection becomes more accurate with human assistance. Note that the burden of manual analysis with our method is much less than complete manual analysis. This is because the number of functions that should be analyzed becomes much less by hook point detection, as described in Table 3. Without hook point detection, a reverse-engineer has to analyze thousands of functions to obtain tap points, whereas only tens of functions should be analyzed when it is performed with hook point detection. ### 7 RELATED WORK 7.1 Script Analysis Tools There is a large amount of research on constructing script analysis tools. There are multiple script analysis tools that adopt script-level monitoring. The tool jAEk [36] hooks JavaScript APIs by overriding built-in functions. It inserts hooks on open/send methods of XMLHttpRequest objects and methods regarding HTMLElement.prototype to obtain URLs accessed by Ajax communication. Practical script analysis tools such as Revelo [24], box-js [7], jsunpack-n [19], and JSDetox [46] also use script-level monitoring. These tools offer strong script behavior analysis capability on JavaScript. However, they do not fulfill the requirements mentioned in Section 2.2 because they deeply depend on the language specifications of JavaScript. There are also script analysis tools based on script engine-level monitoring. Sulo [21, 22] is a instrumentation framework for Action Script of Adobe Flash using Intel Pin. It is based on the analysis of the source code of the Actionscript Virtual Machine (AVM). JSand [2] hooks built-in methods of JavaScript by implementing a specific emulator. FlashDetect [49] modifies an open source script engine of Flash for their hooks. These are examples of script engine-level monitoring. ViperMonkey [28] is an emulator of VBA, which can output logs of notable script APIs. For system-level monitoring, many binary analysis tools that can hook system APIs and/or system calls such as API Chaser [26], Alkanet [35], Ether [12], Nitro [37], CXPInspector [51], IntroLib [11], and Drakvuf [30] are available. However, none of these tools can fulfill the requirements introduced in Section 2.2. ### 7.2 Script Engine Enhancement Chef [4, 6] is a symbolic execution engine for script languages. It uses a real script engine for building a symbolic execution engine. It achieves symbolic execution of the target scripts by symbolically executing the script engine binaries with a specific path exploration strategy. The design is similar to that of STAGER in that it reuses the target script engine for building a script analysis tool by instrumentation. On the other hand, the approaches and goals with Chef are different from those of STAGER. Its approach is based on manual source code analysis, whereas we used binary analysis. In addition, the goal with Chef is building symbolic execution engines, whereas ours is building script API tracers. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- 5:26 - T. Usui et al. ### 7.3 Virtual Machine Introspection Several techniques were developed for mitigating the semantic gap between the guest OSes and the VM monitor (VMM). Their goal is to observe the behavior within the guest OSes through the VM by mitigation, which is called VM introspection (VMI). Virtuoso [14] automatically creates VM introspection tools that can produce the same results as a reference tool executed in a VM from the out-of-VM. Virtuoso first acquires execution traces by executing the reference tool in the VM. This step is referred as training. It then extracts a program slice, which is only required for creating the tool. This method is similar to ours in that it extracts required information by analyzing formerly acquired execution traces. It differs from ours in its application target as well as the algorithm it uses to extract information from execution traces. VM-Space Traveler (VMST) [16] is a system that can automatically bridge the semantic gap for generating VMI tools. It achieved the automation of the VMI tools generation, while Virtuoso, one of the state-of-the-art studies at that time, is not fully automated. Its key idea is to redirect the code and data executed on the machine of introspection target to another machine prepared for VMI for obtaining the execution results. This idea depends on the key insight that the executed code for the same program is usually identical even across different machines. To do this, VMST identifies the context of the system call execution and the data redirectable to the machine for VMI. Tappan Zee (North) Bridge [13], or TZB, discovers tap points effective for VM introspection. It monitors memory access of software inside a VM with various inputs for learning. It then finds tap points by identifying the memory location where the input value appears. It is used to monitor the tap points in real time from the outof-VM for achieving effective VM instrospection. Hybrid-Bridge [41] is a system that uses decoupled execution and training memorization for efficient redirection-based VMI. The decoupled execution is a technique to decouple heavy-weight online analysis that uses software-based virtualization from light-weight hardware-based virtualization. It uses two execution components: Slow-Bridge and Fast-Bridge. Slow-Bridge extracts meta-data using online data redirection like VMST on a VM with heavy-weight software-based virtualization for training and memorizes the trained meta-data (called training memorization). Fast-Bridge uses the meta-data for VMI on a VM with light-weight hardware-based virtualization. Only when the meta-data is incomplete, the execution on Fast-Bridge falls back to Slow-Bridge. AutoTap [55] automatically discovers tap points inside an OS kernel for monitoring various types of accesses to kernel objects such as creation, read, write, and deletion. It dynamically tracks kernel objects and their propagation starting from its creation while resolving the execution context, the types of the arguments, and the access types. It then dumps these meta data into a log file. After the tracking, it analyzes the log file to discover the tap points of interest to introspection. Overall, the goal of the studies above, mitigating the semantic gap around the VM, is similar to ours. In addition, the approaches of some studies to find the tap points are similar to ours; however, their targets (i.e., OS kernels and VMMs) and algorithms are different from ours. ### 7.4 Reverse Engineering of Virtual Machine Since our method analyzes VMs of script engines for obtaining hook and tap points, we present existing research regarding reverse engineering of VMs. Although no VM analysis study in terms of script engine VMs has been conducted, there have been studies conducted regarding software protection and malware analysis. Sharif et al. [42] proposed a method of automatically reverse engineering VMs used by malware for obfuscation. They used data flow analysis to identify bytecode syntax and semantics as well as the fundamental characteristics of VMs. Since script engines that our method analyzes are generally based on such VMs, their goal of automatically analyzing the VMs is similar to ours. However, their analysis target is different from ours. Their method identifies information about VMs and bytecode, whereas our method detects the local functions that corresponds to script APIs. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:27 Rolles [40] provided a method of circumventing virtualization-obfuscation used by malware with a running example of the common software protector VMProtect [45]. The method generates optimized x86 code that is equivalent to bytecode by reverse-engineering VMs, producing a disassembler for VM instructions, and optimizing with intermediate representation (IR). This study showed that protection by virtualization-obfuscations is evaded by such analysis. However, it assumed manual analysis implicitly and its automation was not considered in that article. Coogan et al. [9] proposed an approach to identify the bytecode instructions responsible for invoking system calls. Since system calls are strongly relevant to malware behavior, their goal was to approximate the behavior by the set of the identified bytecode instructions involved in the invocation of the system calls. Their goal, focus, and approach differed from ours mainly for the following three points. First, their goal was approximating the behavior of malware obfuscated by VMs, whereas ours is mitigating semantic gaps between script APIs and system APIs or system calls. Second, their focus was only on the bytecode instructions relevant to the invocation of system calls, whereas ours was all script APIs regardless of the existence of system calls. Finally, their approach strongly relied on the invoked system calls and arguments, whereas ours relied only on the branch instructions logged with test scripts. Kinder et al. extended static analysis to make it applicable to programs protected by virtualization-obfuscation. Their method, called VPC-sensitive static analysis, extended conventional static analysis with abstract interpretation whose states are location-sensitive (i.e., sensitive only to the program counter (PC)). Their analysis is sensitive to both PC and VPC and enables us to analyze VMs properly, whereas the conventional analysis suffers from over-approximation on states. Although their method of static analysis is different from ours of dynamic analysis, applying it combined with ours might be beneficial. VMAttack [25] deobfuscates virtualization-obfuscated binaries based on automated static and dynamic analysis. Its goal is to simplify the execution traces acquired from the target binaries. It first locates VM instruction handlers by dynamic program slicing; and clustering then maps bytecode instructions to the corresponding native assembly ones by analyzing the switch-case structure of the VM. The disassembled bytecode is optimized through stack based IR (SBIR) and only the important instructions are presented to reverse-engineers as simplified code. Nightingale [18] translates virtualization-obfuscated code into host code such as x86 via dynamic analysis. It locates the dispatcher and handlers of VM instructions by clustering acquired execution trace. This approach is similar to ours in that the aim is to recognize specific functions implemented in a VM (i.e., VM instruction handlers in the Nightingale and script APIs in our method). However, it differs from ours regarding the two points. First, it discovers VM instruction handlers, while ours finds local functions corresponding to scrip APIs. Second, it only recognizes while ours clarifies what function corresponds to what scrip API. VMHunt [53] is a deobfuscation tool that first handles partially virtualized binaries. It first detects the boundaries between the virtualized snippets and the native snippets by finding context switch instructions in the acquired execution trace and identifies VM instructions by clustering. It then extracts the virtualized kernels, which have the global behavior that affects beyond the boundaries, and symbolically execute them with multiple granularities for reverse engineering them. The analysis of partially virtualized binaries is significantly important for analyzing real-world malware. However, since such binaries are rarely seen among script engines, their motivation differs from ours. Overall, most of existing studies on reverse-engineering VMs focused on virtualization-obfuscation mainly used by malware. The virtualization-obfuscators only translate instructions of original binaries into VM instructions and rarely provide APIs to the binaries. Therefore, none of the existing studies focused on API function identification while many were conducted to recognize VM instructions. In addition, the bytecode of script engine VMs is arbitrarily operable by changing input scripts while that of virtualization-obfuscated binaries is not. To the best of our knowledge, our research is the first that proposes a reverse-engineering method taking such operable case into account. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- 5:28 - T. Usui et al. ### 7.5 Differential Execution Analysis Carmony et al. [8] proposed a method that uses differential analysis of multiple execution and memory traces for identifying tap points of Adobe Acrobat Reader. The traces are logged on condition that PDFs with JavaScript, Well-Formed PDFs, and Malformed PDFs are input to the reader. Based on the differential analysis of the traces, the method identifies tap points that enable the extraction of JavaScript as well as those that represent the termination and error of input file processing. Zhu et al. [57] used differential execution analysis to identify the blocking conditions used by anti-adblockers. They accessed websites and logged the traces of JavaScript execution with and without an adblocker. They then analyzed the traces to discover branch divergences caused by the adblocker and identified the branch conditions that cause the divergences. Although they used differential execution analysis the same as with our method, their focus (Adobe Acrobat Reader and JavaScript in websites) was different from ours (i.e., script engines). In addition, our differentiation algorithm (i.e., the modified Smith-Waterman algorithm) is different from those used in the above studies because their target problems to solve were also different from ours (i.e., identification of the commonly appeared sequences). ### 7.6 Feature Location Feature location techniques aim to locate the module implementing a specific software feature, which are studied in software engineering. Although their target (i.e., source code) is different from ours (i.e., binaries), some studies use differential analysis of execution traces the same as ours. Wilde et al. [50] proposed a method called software reconnaissance, which locates software features by comparing execution traces obtained on condition that the feature of interest is active and inactive. Wong et al. [52] presented an approach that compares execution slices instead of execution traces. Because the slices include data related to a feature of interest, their approach takes data flow into account in addition to control flow. Eisenbarth et al. [15] presented an approach that addresses a problem of the difficulty of defining a condition that activates exactly one feature. Their approach uses the dynamic analysis of binaries combined with the formal static analysis of the program dependency graph and source code. Koschke et al. [27] extended their work by enabling them to handle statement-level analysis instead of their method-level one. Asadi et al. [3] proposed a method that adopts techniques of natural language processing to analyze source code and comments in it, in addition to the analysis of execution traces. Since the underlying motivation of understanding programs and the basic approach of comparing multiple execution traces are common among their studies and ours, our method can be regarded as feature location whose target is a binary. ### 8 CONCLUSION In this article, we focused on the problems of current dynamic script analysis tools and proposed a method for automatically generating script API tracers by automatically analyzing the binaries of script engines. The method detects appropriate hook and tap points in script engines through dynamic analysis using test scripts. Through the experiments with a prototype system implemented with our method, we confirmed that the method can properly append script behavior analysis capability to the script engines for generating script API tracers. Our case studies also showed that the generated script API tracers can analyze malicious scripts in the wild. Appending more effective script analysis capabilities is for future work. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:29 ### ACKNOWLEDGMENTS The authors would like to thank Tomoya Matsumoto, Yuki Kimura, and the members of Matsuura Laboratory for their kind support as the participants in the experiment. We also thank the anonymous reviewers for their insightful comments. ### REFERENCES [[1] VirusTotal. [n.d.]. Retrieved March 9, 2017 from https://www.virustotal.com/.](https://www.virustotal.com/) [2] Pieter Agten, Steven Van Acker, Yoran Brondsema, Phu H Phung, Lieven Desmet, and Frank Piessens. 2012. JSand: Complete client-side sandboxing of third-party JavaScript without browser modifications. In Proceedings of the 28th Annual Computer Security Applications _Conference (ACSAC’12). ACM, 1–10._ [3] Fatemeh Asadi, Massimiliano Di Penta, Giuliano Antoniol, and Yann-Gaël Guéhéneuc. 2010. A heuristic-based approach to identify concepts in execution traces. In Proceedings of the 14th European Conference on Software Maintenance and Reengineering (CSMR’10). IEEE, 31–40. [[4] The Dependable Systems Lab at EPFL in Lausanne. [n.d.]. Chef. Retrieved January 1, 2018 from https://github.com/S2E/s2e-old/tree/](https://github.com/S2E/s2e-old/tree/chef) [chef.](https://github.com/S2E/s2e-old/tree/chef) [[5] Rohitab Batra. [n.d.]. API Monitor. Retrieved February 15, 2019 from http://www.rohitab.com/apimonitor.](http://www.rohitab.com/apimonitor) [6] Stefan Bucur, Johannes Kinder, and George Candea. 2014. Prototyping symbolic execution engines for interpreted languages. In ACM _SIGPLAN Notices, Vol. 49. ACM, 239–254._ [[7] CapacitorSet. [n.d.]. box.js. Retrieved February 15, 2019 from https://github.com/CapacitorSet/box-js.](https://github.com/CapacitorSet/box-js) [8] Curtis Carmony, Xunchao Hu, Heng Yin, Abhishek Vasisht Bhaskar, and Mu Zhang. 2016. Extract me if you can: Abusing PDF parsers in malware detectors. In Proceedings of the 23rd Annual Network and Distributed System Security Symposium (NDSS’16). Internet Society, 1–15. [9] Kevin Coogan, Gen Lu, and Saumya Debray. 2011. Deobfuscation of virtualization-obfuscated software: A semantics-based approach. In Proceedings of the 18th ACM Conference on Computer and Communications Security (CCS’11). ACM, 275–284. [10] Anthony Cozzie, Frank Stratton, Hui Xue, and Samuel T. King. 2008. Digging for data structures. In Proceedings of the 8th USENIX _Symposium on Operating Systems Design and Implementation (OSDI’08), Vol. 8. 255–266._ [11] Zhui Deng, Dongyan Xu, Xiangyu Zhang, and Xuxiang Jiang. 2012. Introlib: Efficient and transparent library call introspection for malware forensics. Digital Investigation 9 (2012), S13–S23. [12] Artem Dinaburg, Paul Royal, Monirul Sharif, and Wenke Lee. 2008. Ether: Malware analysis via hardware virtualization extensions. In _Proceedings of the 15th ACM Conference on Computer and Communications Security (CCS’08). ACM, 51–62._ [13] Brendan Dolan-Gavitt, Tim Leek, Josh Hodosh, and Wenke Lee. 2013. Tappan Zee (north) bridge: Mining memory accesses for intro spection. In Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security (CCS’13). ACM, 839–850. [14] Brendan Dolan-Gavitt, Tim Leek, Michael Zhivich, Jonathon Giffin, and Wenke Lee. 2011. Virtuoso: Narrowing the semantic gap in virtual machine introspection. In Proceedings of the IEEE Symposium on Security and Privacy 2011 (SP’11). IEEE, 297–312. [15] Thomas Eisenbarth, Rainer Koschke, and Daniel Simon. 2003. Locating features in source code. IEEE Transactions on Software Engi _neering 29, 3 (2003), 210–224._ [16] Yangchun Fu and Zhiqiang Lin. 2012. Space traveling across VM: Automatically bridging the semantic gap in virtual machine intro spection via online kernel data redirection. In Proceedings of the 33rd IEEE Symposium on Security and Privacy (SP’12). IEEE, 586–600. [[17] Inc. GitHub. [n.d.]. GitHub. Retrieved May 14, 2020 from https://github.com/.](https://github.com/) [18] Xie Haijiang, Zhang Yuanyuan, Li Juanru, and Gu Dawu. 2017. Nightingale: Translating embedded VM code in x86 binary executables. In Proceedings of the 20th International Conference on Information Security (ISC’17). Springer, 387–404. [[19] Blake Hartstein. [n.d.]. jsunpack-n. Retrieved February 15, 2019 from https://github.com/urule99/jsunpack-n.](https://github.com/urule99/jsunpack-n) [20] Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, and Martin Vechev. 2018. Debin: Predicting debug information in stripped binaries. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS’18). ACM, 1667–1680. [[21] Timo Hirvonen. [n.d.]. Sulo. Retrieved February 15, 2019 from https://github.com/F-Secure/Sulo.](https://github.com/F-Secure/Sulo) [22] Timo Hirvonen. 2014. Dynamic Flash instrumentation for fun and profit. Blackhat USA briefings 2014, Retrieved February 15, 2019 [from https://www.blackhat.com/docs/us-14/materials/us-14-Hirvonen-Dynamic-Flash-Instrumentation-For-Fun-And-Profit.pdf.](https://www.blackhat.com/docs/us-14/materials/us-14-Hirvonen-Dynamic-Flash-Instrumentation-For-Fun-And-Profit.pdf) [23] Ralf Hund. 2016. The beast within—Evading dynamic malware analysis using Microsoft COM. Blackhat USA briefings 2016. [[24] KahuSecurity. [n.d.]. Revelo Javascript Deobfuscator. Retrieved February 15, 2019 from http://www.kahusecurity.com/posts/revelo_](http://www.kahusecurity.com/posts/revelo_javascript_deobfuscator.html) [javascript_deobfuscator.html.](http://www.kahusecurity.com/posts/revelo_javascript_deobfuscator.html) [25] Anatoli Kalysch, Johannes Götzfried, and Tilo Müller. 2017. VMAttack: Deobfuscating virtualization-based packed binaries. In Proceed _ings of the 12th International Conference on Availability, Reliability and Security (ARES’17). 1–10._ [26] Yuhei Kawakoya, Makoto Iwamura, Eitaro Shioji, and Takeo Hariu. 2013. API Chaser: Anti-analysis resistant malware analyzer. In _Proceedings of the 16th International Symposium on Research in Attacks, Intrusions and Defenses (RAID’15). Springer, 123–143._ Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- 5:30 - T. Usui et al. [27] Rainer Koschke and Jochen Quante. 2005. On dynamic feature location. In Proceedings of the 20th IEEE/ACM International Conference _on Automated Software Engineering (ASE’05). 86–95._ [[28] Philippe Lagadec. [n.d.]. ViperMonkey. Retrieved September 20, 2019 from https://github.com/decalage2/ViperMonkey.](https://github.com/decalage2/ViperMonkey) [29] JongHyup Lee, Thanassis Avgerinos, and David Brumley. 2011. TIE: Principled reverse engineering of types in binary programs. In _Proceedings of the 18th Annual Network and Distributed System Security Symposium (NDSS’11). Internet Society, 1–18._ [30] Tamas K. Lengyel, Steve Maresca, Bryan D. Payne, George D. Webster, Sebastian Vogl, and Aggelos Kiayias. 2014. Scalability, fidelity and stealth in the DRAKVUF dynamic malware analysis system. In Proceedings of the 30th Annual Computer Security Applications _Conference (ACSAC’14). ACM, 386–395._ [31] Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. 2010. Automatic reverse engineering of data structures from binary execution. In _Proceedings of the 17th Annual Network and Distributed System Security Symposium (NDSS’10). Internet Society, 1–18._ [32] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In ACM Sigplan Notices, Vol. 40. ACM, 190–200. [33] Alwin Maier, Hugo Gascon, Christian Wressnegger, and Konrad Rieck. 2019. TypeMiner: Recovering types in binary programs us ing machine learning. In Proceedings of the 16th International Conference on Detection of Intrusions and Malware, and Vulnerability _Assessment (DIMVA’19). Springer, 288–308._ [[34] Microsoft. [n.d.]. Antimalware Scan Interface. Retrieved August 16, 2018 from https://docs.microsoft.com/en-us/windows/desktop/](https://docs.microsoft.com/en-us/windows/desktop/amsi/antimalware-scan-interface-portal) [amsi/antimalware-scan-interface-portal.](https://docs.microsoft.com/en-us/windows/desktop/amsi/antimalware-scan-interface-portal) [35] Yuto Otsuki, Eiji Takimoto, Shoichi Saito, Eric W. Cooper, and Koichi Mouri. 2015. Identifying system calls invoked by malware using branch trace facilities. In International MultiConference of Engineers and Computer Scientists (IMECS’15). Newswood Limited. [36] Giancarlo Pellegrino, Constantin Tschürtz, Eric Bodden, and Christian Rossow. 2015. jäk: Using dynamic analysis to crawl and test modern web applications. In Proceedings of the 18th International Symposium on Research in Attacks, Intrusions and Defenses (RAID’15). Springer, 295–316. [37] Jonas Pfoh, Christian Schneider, and Claudia Eckert. 2011. Nitro: Hardware-based system call tracing for virtual machines. In Proceed _ings of the 6th International Workshop on Security (IWSEC’11). Springer, 96–112._ [[38] ReactOS Project. [n.d.]. ReactOS. Retrieved August 16, 2018 from https://www.reactos.org/.](https://www.reactos.org/) [[39] Microsoft Research. [n.d.]. Detours. Retrieved April 8, 2020 from https://github.com/microsoft/Detours.](https://github.com/microsoft/Detours) [40] Rolf Rolles. 2009. Unpacking virtualization obfuscators. In Proceedings of the 3rd USENIX Workshop on Offensive Technologies (WOOT’09). USENIX. [41] Alireza Saberi, Yangchun Fu, and Zhiqiang Lin. 2014. Hybrid-bridge: Efficiently bridging the semantic gap in virtual machine intro spection via decoupled execution and training memoization. In Proceedings of the 21st Annual Network and Distributed System Security _Symposium (NDSS’14). Internet Society._ [42] Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee. 2009. Automatic reverse engineering of malware emulators. In Proceed _ings of the 2009 30th IEEE Symposium on Security and Privacy. IEEE, 94–109._ [43] Asia Slowinska, Traian Stancescu, and Herbert Bos. 2011. Howard: A dynamic excavator for reverse engineering data structures. In _Proceedings of the 18th Annual Network and Distributed System Security Symposium (NDSS’11). Internet Society, 1–20._ [44] Temple F. Smith, Michael S. Waterman, et al. 1981. Identification of common molecular subsequences. Journal of Molecular Biology 147, 1 (1981), 195–197. [[45] VMProtect Software. [n.d.]. VMProtect. Retrieved April 27, 2020 from https://vmpsoft.com/.](https://vmpsoft.com/) [[46] T. Sven. [n.d.]. JSDetox. Retrieved September 20, 2019 from http://relentless-coding.org/projects/jsdetox/.](http://relentless-coding.org/projects/jsdetox/) [[47] PowerShell Team. [n.d.]. PowerShell. Retrieved August 16, 2018 from https://github.com/powershell.](https://github.com/powershell) [48] Toshinori Usui, Yuto Otsuki, Yuhei Kawakoya, Makoto Iwamura, Jun Miyoshi, and Kanta Matsuura. 2019. My script engines know what you did in the dark: Converting engines into script API tracers. In Proceedings of the 35th Annual Computer Security Applications _Conference (ACSAC’19). ACSA, 466–477._ [49] Timon Van Overveldt, Christopher Kruegel, and Giovanni Vigna. 2012. FlashDetect: ActionScript 3 malware detection. In Proceedings _of the 15th International Symposium on Research in Attacks, Intrusions and Defenses (RAID’12). Springer, 274–293._ [50] Norman Wilde and Michael C. Scully. 1995. Software reconnaissance: Mapping program features to code. Journal of Software Mainte _nance: Research and Practice 7, 1 (1995), 49–62._ [51] Carsten Willems, Ralf Hund, and Thorsten Holz. 2013. CXPInspector: Hypervisor-based, hardware-assisted system monitoring. Tech _nical Report TR-HGI-2012-002 (2013), 24._ [52] W. Eric Wong, Swapna S. Gokhale, Joseph R. Horgan, and Kishor S. Trivedi. 1999. Locating program features using execution slices. In Proceedings of the 1999 IEEE Symposium on Application-Specific Systems and Software Engineering and Technology (Cat. No. PR00122) _(ASSET’99). IEEE, 194–203._ [53] Dongpeng Xu, Jiang Ming, Yu Fu, and Dinghao Wu. 2018. VMHunt: A verifiable approach to partially-virtualized binary code simpli fication. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS’18). ACM, 442–458. Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. ----- Automatic Reverse Engineering of Script Engine Binaries for Building Script API Tracers - 5:31 [54] Akira Yokoyama, Kou Ishii, Rui Tanabe, Yinmin Papa, Katsunari Yoshioka, Tsutomu Matsumoto, Takahiro Kasama, Daisuke Inoue, Michael Brengel, Michael Backes, et al. 2016. SandPrint: Fingerprinting malware sandboxes to provide intelligence for sandbox evasion. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses (RAID’16). Springer, 165–187. [55] Junyuan Zeng, Yangchun Fu, and Zhiqiang Lin. 2016. Automatic uncovering of tap points from kernel executions. In Proceedings of the _19th International Symposium on Research in Attacks, Intrusions and Defenses (RAID’16). Springer, 49–70._ [56] Junyuan Zeng and Zhiqiang Lin. 2015. Towards automatic inference of kernel object semantics from binary code. In Proceedings of the _18th International Symposium on Research in Attacks, Intrusions and Defenses (RAID’15). Springer, 538–561._ [57] Shitong Zhu, Xunchao Hu, Zhiyun Qian, Zubair Shafiq, and Heng Yin. 2018. Measuring and disrupting anti-adblockers using differential execution analysis. In Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS’18). Internet Society. Received May 2020; accepted August 2020 Digital Threats: Research and Practice, Vol. 2, No. 1, Article 5. Publication date: January 2021. -----