{
	"id": "6a5f1a9c-c595-4a6f-bffd-778115d687ec",
	"created_at": "2026-04-06T00:20:08.005257Z",
	"updated_at": "2026-04-10T13:11:48.54422Z",
	"deleted_at": null,
	"sha1_hash": "d756a710363dff192c32481266645104a903305c",
	"title": "Hex-Rays Microcode API vs. Obfuscating Compiler – Hex Rays",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 2528500,
	"plain_text": "Hex-Rays Microcode API vs. Obfuscating Compiler – Hex Rays\r\nBy Julien De Bona\r\nPublished: 2018-09-18 · Archived: 2026-04-05 18:01:41 UTC\r\nThis is a guest entry written by Rolf Rolles from Mobius Strip Reverse Engineering. His views and opinions\r\nare his own, and not those of Hex-Rays. Any technical or maintenance issues regarding the code herein\r\nshould be directed to him.\r\nIn this entry, we’ll investigate an in-the-wild malware sample that was compiled by an obfuscating compiler to\r\nhinder analysis. We begin by examining its obfuscation techniques and formulating strategies for removing them.\r\nFollowing a brief detour into the Hex-Rays CTREE API, we find that the newly-released microcode API is more\r\npowerful and flexible for our task. We give an overview of the microcode API, and then we write a Hex-Rays\r\nplugin to automatically remove the obfuscation and present the user with a clean decompilation.\r\nThe plugin is open source and weighs in at roughly 4KLOC of heavily-commented C++. Additionally, we are also\r\nreleasing a helpful plugin for aspiring microcode plugin developers called the Microcode Explorer, which will\r\nalso be distributed with the Hex-Rays SDK in subsequent releases. In brief, for the sample we’ll explore in this\r\nentry, its assembly language code looks like this:\r\nhttp://www.hexblog.com/?p=1248\r\nPage 1 of 43\n\nThat function’s Hex-Rays decompilation looks like this:\r\nhttp://www.hexblog.com/?p=1248\r\nPage 2 of 43\n\nOnce our deobfuscation plugin is installed, it will automatically rewrite the decompilation to look like this:\r\nhttp://www.hexblog.com/?p=1248\r\nPage 3 of 43\n\nInitial Investigation\r\nThe sample we’ll be examining was given to me by a student in my SMT-based binary analysis class. The binary\r\nlooks clean at first. IDA’s navigation bar doesn’t immediately indicate tell-tale signs of obfuscation:\r\nhttp://www.hexblog.com/?p=1248\r\nPage 4 of 43\n\nThe binary is statically linked with the ordinary Microsoft Visual C runtime, indicating that it was compiled with\r\nVisual Studio:\r\nAnd finally, the binary has a RICH header, indicating that it was linked with the Microsoft Linker:\r\nThus far, the binary seems normal. However, nearly any function’s assembly and decompilation listings\r\nimmediately tells a different tale, as shown in the figures at the top of this entry. We can see constants with high\r\nentropy, redundant computations that an ordinary compiler optimization would have removed, and an unusual\r\ncontrol flow structure.\r\nPattern-Based Obfuscation\r\nIn the decompilation listing, we see repeated patterns:\r\nhttp://www.hexblog.com/?p=1248\r\nPage 5 of 43\n\nThe underlined terms are identical. With a little thought, we can determine that the underlined sequence always\r\nevaluates to 0 at run-time, because:\r\nx is either even or odd, and x-1 has the opposite parity\r\nAn even number times an odd number is always even\r\nEven numbers have their lowest bit clear\r\nThus, AND by 1 produces the value 0\r\nThat the same pattern appears repeatedly is an indication that the obfuscating compiler has a repertoire of patterns\r\nthat it introduces into the code prior to compilation.\r\nOpaque Predicates\r\nAnother note about the previous figure is that the topmost occurrence of the x*(x-1) \u0026 1 pattern is inside of an\r\nif -statement with an AND-compound conditional. Given that this expression always evaluates to zero, the\r\nAND-compound will fail and the body of the if-statement will never execute. This is a form of obfuscation known\r\nas opaque predicates: conditional branches that in fact are not conditional, but can only evaluate one way or the\r\nother at runtime.\r\nControl-Flow Flattening\r\nThe obfuscated functions exhibit unusual control flow. Each contains a switch statement in a loop (though the\r\n“switch statement” is compiled via binary search instead of with a table). This is evidence of a well-known form\r\nof obfuscation called “control flow flattening”. In brief, it works as follows:\r\n1. Assign a number to each basic block.\r\n2. The obfuscator introduces a block number variable, indicating which block should execute.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 6 of 43\n\n3. Each block, instead of transferring control to a successor with a branch instruction as usual, updates the\r\nblock number variable to its chosen successor.\r\n4. The ordinary control flow is replaced with a switch statement over the block number variable, wrapped\r\ninside of a loop.\r\nThe following animation illustrates the control-flow flattening process:\r\nHere’s the assembly language implementation of control flow flattening switch for a small function.\r\n0:00 / 0:24\r\nhttp://www.hexblog.com/?p=1248\r\nPage 7 of 43\n\nOn the first line, var_1C — the block number variable mentioned above — is initialized to some random-looking number. Immediately following that is a series of comparisons of var_1C against other random-looking\r\nnumbers. ( var_1C is copied into var_20 , and var_20 is used for comparisons after the first.) The targets of\r\nthese equality comparisons are the original function’s basic blocks. Each one updates var_1C to indicate which\r\nblock should execute next, before branching back to the code just shown, which will then perform the equality\r\nhttp://www.hexblog.com/?p=1248\r\nPage 8 of 43\n\ncomparisons and select the corresponding block to execute. For blocks with one successor, the obfuscator simply\r\nassigns var_1C to a constant value, as in the following figure.\r\nFor blocks with two possible successors (such as if-statements), the obfuscator introduces x86 CMOV instructions\r\nto set var_1C to one of two possible values, as shown below:\r\nGraphically, each function looks like this:\r\nhttp://www.hexblog.com/?p=1248\r\nPage 9 of 43\n\nIn the figure above, the red and orange nodes are the switch-as-binary-search implementation. The blue nodes are\r\nthe original basic blocks from the function (subject to further obfuscation). The purple node at the bottom is the\r\nloop back to the beginning of the switch-as-binary-search construct (the red node).\r\nhttp://www.hexblog.com/?p=1248\r\nPage 10 of 43\n\nOdd Stack Manipulations\r\nFinally, we can also see that the obfuscator manipulates the stack pointer in unusual ways. Particularly, it uses\r\n__alloca_probe to reserve stack space for function arguments and local variables, where a normal compiler\r\nwould, respectively, use the push instruction and reserve space for all local variables at once in the prologue.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 11 of 43\n\nhttp://www.hexblog.com/?p=1248\r\nPage 12 of 43\n\nIDA has built-in heuristics to determine the numeric argument to __alloca_probe and track the effects of these\r\ncalls upon the stack pointer. However, the output of the obfuscator leaves IDA unable to determine the numeric\r\nargument, so IDA cannot properly track the stack pointer.\r\nAside: Where did this Binary Come From?\r\nI am not entirely sure how this binary was produced. Obfuscator-LLVM also uses pattern-based obfuscation and\r\ncontrol flow flattening, but Obfuscator-LLVM has different patterns than this sample, and there are some\r\nsuperficial differences with how control flow flattening is implemented. Also, Obfuscator-LLVM does not\r\ngenerate opaque predicates, nor the alloca -related obfuscation. And, needless to say, the fact that the binary\r\nincludes the Microsoft CRT and a RICH header is also puzzling. If you have any further information about this\r\nbinary, please contact me.\r\nUpdate: following discussions on twitter with an Obfuscator-LLVM developer and another knowledgeable\r\nindividual, in fact, the obfuscating compiler in question is Obfuscator-LLVM, which has been integrated with the\r\nMicrosoft Visual Studio toolchain. The paragraph above falsely stated that Obfuscator-LLVM used different\r\npatterns and did not insert opaque predicates. The author regrets these errors. In theory, the plugin we develop in\r\nthis entry might work for other binaries produced by the same compilation process, or even for Obfuscator-LLVM\r\nin general, but this theory has not been tested and no guarantees are offered.\r\nPlan of Attack\r\nNow that we’ve seen the obfuscation techniques, let’s break them.\r\nA maxim I’ve learned doing deobfuscation is that the best results come from working at the same level of\r\nabstraction that the obfuscator used. For obfuscators that work on the assembly-language level, historically my\r\nbest results have come in using techniques that represent the obfuscated code in terms of assembly language. For\r\nobfuscators that work at the source- or compiler internal-level, my best results have come from using a\r\ndecompiled representation. So, for this obfuscator, a Hex-Rays plugin seemed among our best options.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 13 of 43\n\nThe investigation above illuminated four obfuscation techniques for us to contend with:\r\nPattern-based obfuscation\r\nOpaque predicates\r\nAlloca-related stack manipulation\r\nControl flow flattening\r\nThe first two techniques are implemented via pattern substitutions inside of the obfuscating compiler. Pattern-based deobfuscation techniques, for all their downsides, tend to work well when the obfuscator itself employed a\r\nrepertoire of patterns — especially a limited one — as seems to be the case here. So, we will attack these via\r\npattern matching and replacement.\r\nThe alloca -related stack manipulation is the simplest technique to bypass. The obfuscator’s non-standard\r\nconstructs have thwarted IDA’s ordinary analysis surrounding calls to __alloca_probe , and hence the\r\nobfuscation prevented IDA from properly accounting for the stack differentials induced by these calls. To break\r\nthis, we will let Hex-Rays do most of the work for us. For every function that calls __alloca_probe , we will use\r\nthe API to decompile it, and then at every call site to __alloca_probe , we will extract the numeric value of its\r\nsole argument. Finally, we will use this information to create proper stack displacements within the disassembly\r\nlisting. The code for this is very straightforward.\r\nAs for control flow flattening, this is the most complicated of the transformations above. We’ll get back to it later.\r\nFirst Approach: Using the CTREE API\r\nI began my deobfuscation by examining the decompilation of the obfuscated functions and cataloging the\r\nobfuscated patterns therein. The following is a partial listing:\r\nhttp://www.hexblog.com/?p=1248\r\nPage 14 of 43\n\nThough I later switched to the Hex-Rays microcode API, I started with the CTREE API, the one that has been\r\navailable since the first releases of the Hex-Rays SDK. It is overall simpler than the microcode API, and has\r\nIDAPython bindings where the microcode API currently does not.\r\nThe CTREE API provides a data structure representation of the decompiled code, from which the decompilation\r\nlisting that is presented to the user is generated. Thus, there is a direct, one-to-one correspondence between the\r\ndecompilation listing and the CTREE representation. For example, an if-statement in the decompilation listing\r\ncorresponds to a CTREE data structure of type cif_t , which contains a pointer to a CTREE data structure of\r\ntype cexpr_t representing the if -statement’s conditional expression, as well as a pointer to a CTREE data\r\nstructure of type cinsn_t representing the body of the if -statement.\r\nWe will need to know how our patterns are represented in terms of CTREE data structures. To assist us, the VDS5\r\nsample plugin from the Hex-Rays SDK helpfully displays the graph of a function’s CTREE data structures. (The\r\nthird-party plugin HexRaysCodeXplorer implements this functionality in terms of IDA’s built-in graphing\r\ncapabilities, whereas the VDS5 sample uses the external WinGraph viewer.) The following figure shows\r\ndecompilation output (in the top left) and its corresponding CTREE representation in graphical form. Hopefully,\r\nthe parallels between them are clear.\r\nTo implement our pattern-based deobfuscation rules, we simply need to write functions to locate instances within\r\nthe function’s CTREE of the data types associated with the obfuscated patterns, and replace them with CTREE\r\nversions of their deobfuscated equivalents. For example, to match the (x-1) * x \u0026 1 pattern we saw before, we\r\ndetermine the CTREE representation and write an if -statement that matches it, as follows:\r\nhttp://www.hexblog.com/?p=1248\r\nPage 15 of 43\n\n(\r\nIn practice, these rules should be written more generically when possible. I.e., multiplication and bitwise AND are\r\ncommutative; the pattern matching code should be able to account for this, and match terms with the operands\r\nswapped. Also, see the open-source project HRAST for an IDAPython framework that offers a less cumbersome\r\napproach to pattern-matching and replacement.)\r\nThe only point of subtlety in replacing obfuscated CTREE elements with deobfuscated equivalents is that each\r\nCTREE expression has associated type information, and we must carefully ensure that our replacements are of the\r\nproper type. The easiest solution is simply to copy the type information from the CTREE expression we’re\r\nreplacing.\r\nFirst Major CTREE Issue: Compiler Optimizations\r\nCataloging the patterns and writing match and replace functions for them was straightforward. However, after\r\nhaving done so, the decompilation showed obvious opportunities for improvement by application of standard\r\ncompiler optimizations, as shown in the following animation.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 16 of 43\n\nThis perplexed me at first. I knew that Hex-Rays already implemented these compiler optimizations, so I was\r\nconfused that they weren’t being applied in this situation. Igor Skochinsky suggested that, while Hex-Rays does\r\nindeed implement these optimizations, that they take place during the microcode phase of decompilation, and that\r\nthese optimizations don’t happen anymore once the CTREE representation has been generated. Thus, I would\r\neither have to port my plugin to the microcode world, or write these optimizations myself on the CTREE level. I\r\nset the issue aside for the time being and continued with the other parts of the project.\r\nControl Flow Unflattening via the CTREE API\r\nNext, I began working on the control flow unflattening portion. I envisioned this taking place in three stages. My\r\nfinal solution included none of these steps, so I won’t devote a lot of print space to my early plan. But, I’ll discuss\r\nthe original idea, and the issues that lead me to my final solution.\r\n0:00 / 0:32\r\nhttp://www.hexblog.com/?p=1248\r\nPage 17 of 43\n\n1. Starting from the switch-as-binary-search implementation, rebuild an actual switch statement (rather\r\nthan a mess of nested if and goto statements).\r\n2. Examine how each switch case updates the block number variable to recover the original control flow\r\ngraph. I.e., each update to the block number variable corresponds to an edge from one block to its\r\nnumbered target.\r\n3. Given the control flow graph, reconstruct high-level control flow structures such as loops, if / else\r\nstatements, break , continue , return , and so on.\r\nI began by writing a CTREE-based component to reconstruct switch statements from obfuscated functions. The\r\nbasic idea — inspired by the assembly language implementation — is to identify the variable that represents the\r\nblock number to execute, find equality comparisons of this variable against constant numbers, and extract these\r\nnumbers (these are the case labels) as well the address of the code that executes if the comparison matches (these\r\nare the bodies of the case statements).\r\nThis proved more difficult than I expected. Although the assembly language implementations had a predictable\r\nstructure, Hex-Rays had applied transformations to the high-level control flow which made it difficult to extract\r\nthe information I was after, as we can see in the following figure.\r\nWe see above the introduction of a strange while loop in the inner switch , and the final if -statement has\r\nbeen inverted to a != conditional rather than a == conditional, which might seem a more logical translation of\r\nthe assembly code. The example above doesn’t show it, but sometimes Hex-Rays rebuilds small switch\r\nstatements that cover portions of the larger switch . Thus, our switch reconstruction logic must take into\r\naccount that these transformations might have taken place.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 18 of 43\n\nFor ordinary decompilation tasks, these transformations would have been valuable improvements to the output;\r\nbut in my unusual situation, it meant my switch recovery algorithm was basically fighting against these\r\ntransformations. My first attempt at rebuilding switches had a lot of cumbersome corner cases, and overall did not\r\nwork very well.\r\nControl Flow Reconstruction\r\nStill, I pressed on. I started thinking about how to rebuild high-level control flow structure ( if statements,\r\nwhile loops, returns , etc.) from the recovered control flow graph. While it seemed like a fun challenge, I\r\nquickly realized that Hex-Rays obviously already includes this functionality. Could I re-use Hex-Rays’ existing\r\nalgorithms to do that?\r\nAnother conversation with Igor lead to a similar answer as before: in order to take advantage of Hex-Rays’ built-in\r\ncontrol flow structuring algorithms, I would need to operate at the microcode level instead of the CTREE level. At\r\nthis point, all of my issues seemed to be pointing me toward the newly-available microcode API. I bit the bullet\r\nand started over with the project using the microcode API.\r\nOverview of the Hex-Rays Microcode API\r\nMy first order of business was to read the SDK’s hexrays.hpp , which now includes the microcode API. I’ll\r\nsummarize some of my findings here; I have provided some more, optional information in an appendix.\r\nAt Igor’s suggestion, I compiled the VDS9 plugin included with the Hex-Rays SDK. This plugin demonstrates\r\nhow to generate microcode for a given function (using the gen_microcode() API) and print it to the output\r\nwindow (using mbl_array_t::print() ).\r\nMicrocode API Data Structures\r\nFor my purposes, the most important things to understand about the microcode API were four key data structures:\r\n1. minsn_t , microcode instructions.\r\n2. mop_t , operands for microcode instructions.\r\n3. mbl_array_t , which contains the graph for the microcode function.\r\n4. mblock_t , the basic blocks within the microcode graph, which contain the instructions, and the edges\r\nbetween the blocks.\r\nFor the first two points, Ilfak has given an overview presentation about the microcode instruction set. For the\r\nsecond two points, he has published a blog entry showing graphically how all of these data structures relate to one\r\nanother. Aspiring microcode API plugin developers would do well to read those entries; the latter includes many\r\nnice figures such as this one:\r\nhttp://www.hexblog.com/?p=1248\r\nPage 19 of 43\n\nMicrocode Maturity\r\nAs Hex-Rays internally optimizes and transforms the microcode, it moves through so-called “maturity phases”,\r\nindicated by an enumerated element of type mba_maturity_t . For example, immediately after generation, the\r\nmicrocode is said to be at maturity MMAT_GENERATED . After local optimizations have been performed, the\r\nmicrocode moves to maturity MMAT_LOCOPT . After performing analysis of function calls (such as deciding which\r\nhttp://www.hexblog.com/?p=1248\r\nPage 20 of 43\n\npushes onto the stack correspond to which called function), the microcode moves to maturity MMAT_CALLS . When\r\ngenerating microcode via the gen_microcode() API, the user can specify the desired maturity level to which the\r\nmicrocode should be optimized.\r\nThe Microcode Explorer Plugin\r\nExamining the microcode at various levels of maturity is an informative and impressive undertaking that I\r\nrecommend for all would-be microcode API plugin developers. It sheds light on which transformations take place\r\nin which order, and the textual output is easy to comprehend. At the start of this project, I spent a good bit of time\r\nreading through microcode dumps at various levels of maturity.\r\nThough the microcode dump output is very nice and easy to read, its output does not show the low-level details of\r\nhow the microcode instructions and operands are represented — which is critical information for writing\r\nmicrocode plugins. As such, to understand the low-level representation, I wrote functions to dump minsn_t\r\ninstructions and mop_t operands in textual form.\r\nFor the benefit of would-be microcode plugin developers, I created a plugin I call the Microcode Explorer. With\r\nyour cursor within a function, run the plugin. It will ask you to select a decompiler maturity level:\r\nOnce the user makes a selection, the plugin shows a custom viewer in IDA with the microcode dump at the\r\nselected maturity level.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 21 of 43\n\nThe microcode dump is mostly non-interactive, but it does offer the user two additional features. First, pressing\r\nG in the custom viewer will display a graph of the entire microcode representation. For example:\r\nSecond, the Microcode Explorer can display the graph for a selected microinstruction and its operands, akin to the\r\nVDS5 plugin we saw earlier which displayed a graph of a function’s CTREE representation. Simply position your\r\ncursor on any line in the viewer and press the I key.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 22 of 43\n\nThe appendix discusses the microcode instruction set in more detail, and I recommend that aspiring microcode\r\nAPI plugin developers read it.\r\nPattern Deobfuscation with the Microcode API\r\nOnce I had a basic handle on the microcode API instruction set, I began by porting my CTREE-level pattern\r\nmatching and replacement code to the microcode API. This was more laborious due to the more elaborate nature\r\nof the microcode API, and the fact I had to write it in C++ instead of Python. All in all, the porting process was\r\nmostly straightforward. The code can be found here, and here’s an example of a pattern match and replacement.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 23 of 43\n\nAlso, I needed to know how to integrate my pattern replacement with the rest of Hex-Rays’ decompiler\r\ninfrastructure. It was easy enough to write and test my pattern replacement code against the data returned by the\r\ngen_microcode() API, but doing so has no effect on the decompilation listing that the user ultimately sees (since\r\nthe decompiler calls gen_microcode() internally, and we don’t have access to the mbl_array_t that it\r\ngenerates).\r\nThe VDS10 SDK sample illustrates how to integrate pattern-replacement into the Hex-Rays infrastructure. In\r\nparticular, the SDK defines an “instruction optimizer” data type called optinsn_t . The virtual method\r\noptinsn_t::func() is given a microinstruction as input. That method must inspect the provided microinstruction\r\nand try to optimize it, returning a non-zero value if it can. Once the user installs their instruction optimizer with\r\nthe SDK function install_optinsn_handler() , their custom optimizer will be called periodically by the Hex-Rays decompiler kernel, thus achieving integration that ultimately affects the user’s view of the decompilation\r\nlisting.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 24 of 43\n\nYou may recall that a major impetus for moving the pattern-matching to the microcode world was that, after the\r\nreplacements had been performed, Hex-Rays had an opportunity to improve the code further via standard compiler\r\noptimizations. We showed what we expected the result of such optimizations would be, but no optimizations had\r\nbeen applied when we wrote our pattern-replacement with the CTREE API. By moving to the microcode world,\r\nnow we do get the compiler optimizations we desire.\r\nAfter installing our pattern-replacement hook, here’s the decompilation listing for the compiler optimization\r\nanimation shown earlier:\r\nThat’s exactly the result we had been expecting. Great! I didn’t have to code those optimizations myself after all.\r\nAside: Tricky Issues with Pattern Replacement in the Microcode World\r\nWhen we wrote our CTREE pattern matching and replacement code, we targeted a specific CTREE maturity\r\nlevel, which lead to predictable CTREE data structures implementing the patterns. In the microcode world, as\r\ndiscussed more in the appendix, the microcode implementation changes dramatically as it matures. Furthermore,\r\nour instruction optimizer callback gets called all throughout the maturity lifecycle. Some of our patterns won’t yet\r\nbe ready to match at earlier maturity phases; we’ll have to write our patterns targeting the lowest maturity level at\r\nwhich we can reasonably match them.\r\nWhile porting my CTREE pattern replacement code to the microcode world, at first I also adopted my strategy\r\nfrom the CTREE world of generating my pattern replacement objects from scratch, and inserting them into the\r\nmicrocode atop the terms I wanted to replace. However, I experienced a lot of difficulty in doing so. Since I was\r\nnew to the microcode API, I did not have a clear mental picture of what Hex-Rays internally expected about my\r\nmicrocode objects, which lead to mistakes (internal errors and a few crashes). I quickly switched strategies such\r\nthat my replacements would modify the existing microinstruction and microoperand objects, rather than\r\ngenerating my own, which reduced my burden of generating correct minsn_t and mop_t objects (since this\r\nstrategy allowed me to start from valid objects).\r\nControl Flow Unflattening, Overview\r\nTo recap, control flow flattening eliminates direct block-to-block control flow transfers. The flattening process\r\nintroduced a “block number variable” which determines the block that should execute at each step of the\r\nfunction’s execution. Each flattened function’s control flow structure has been changed into a switch over the\r\nblock number variable, which ultimately shepherds execution to the correct block. Every block must update the\r\nblock number variable to indicate the block that should execute next after the current one (where conditional\r\nbranches are implemented via conditional move instructions, updating the block number variable to the block\r\nnumber of either the taken branch, or of the non-taken branch).\r\nhttp://www.hexblog.com/?p=1248\r\nPage 25 of 43\n\nThe control flow unflattening process is conceptually simple. Put simply, our task is to rebuild the direct block-to-block control flows, and in so doing, eliminate the control flow switch mechanism. Implementation-wise,\r\nunflattening is integrated with the Hex-Rays decompiler kernel in a similar fashion to how we integrated pattern-matching. Specifically, we register an optblock_t callback object with Hex-Rays, such that our unflattener will\r\nbe automatically invoked by the Hex-Rays kernel, providing a fully automated experience for the user.\r\nThe next chapter will discuss the implementation in more depth.\r\nIn the following subsections, we’ll show an overview of the process pictorially. Just three steps are all we need to\r\nremove the control flow flattening. Once we rebuild the original control flow transfers, all of Hex-Rays’ existing\r\nmachinery for control flow restructuring will do the rest of the work for us. This was perhaps my favorite result\r\nfrom this project; all I had to do was re-insert proper control flow transfers, and Hex-Rays did everything else for\r\nme automatically.\r\nStep #1: Determine Flattened Block Number to Hex-Rays Block Number Mapping\r\nOur first task is to determine which flattened block number corresponds to which Hex-Rays mblock_t . The\r\nfollowing figure is the microcode-level representation for a small function’s control flow switch:\r\nHex-Rays is currently calling the block number variable ST14_4.4 . If that variable matches 0xCBAD6A23 , the\r\njz instruction on block @2 transfers control to block @6. Similarly, 0x25F52EB5 corresponds to block @9, and\r\nhttp://www.hexblog.com/?p=1248\r\nPage 26 of 43\n\n0x31B8F0BC corresponds to block @10. The information just described is the mapping between flattened block\r\nnumbers and Hex-Rays block numbers. (Of course, our plugin will need to extract it automatically.)\r\nStep #2: Determine Each Flattened Block’s Successors\r\nNext, for each flattened block, we need to determine the flattened block numbers to which it might transfer\r\ncontrol. Flattened blocks may have one successor if their original control flow was unconditional, or two potential\r\nsuccessors if their original control flow was conditional. First, here’s the microcode from block @9, which has\r\none successor. (Line 9.3 has been truncated because it was long and its details are immaterial.)\r\nWe can see on line 9.4 that this block updates the block number variable to 0xCBAD6A23 , before executing a\r\ngoto back to the control flow switch (on the Hex-Rays block numbered @2). From what we learned in step #1,\r\nwe know that, by setting the block number variable to this value, the next trip through the control flow switch will\r\nexecute the Hex-Rays mblock_t numbered @6.\r\nThe second case is when a block has two possible successors, as does Hex-Rays block @6 in the following figure.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 27 of 43\n\nLine 8.0 updates the block number variable with the value of eax , before line 8.1 executes a goto back to the\r\ncontrol flow switch at Hex-Rays block @2. If the jz instruction on line 6.4 is taken, then eax will have the\r\nvalue 0x31B8F0BC (obtained on line 6.1). If the jz instruction is not taken, then eax will contain the value\r\n0x25F52EB5 from the assignment on line 7.0. Consulting the information we obtained in step #1, this block will\r\ntransfer control to Hex-Rays block @10 or @9 during the next trip through the control flow switch.\r\nStep #3: Insert Control Transfers Directly from Source Blocks to Destinations\r\nFinally, now that we know the Hex-Rays mblock_t numbers to which each flattened block shall pass control, we\r\ncan modify the control flow instructions in the microcode to point directly to their successors, rather than going\r\nthrough the control flow switch. If we do this for all flattened blocks, then the control flow switch will no longer\r\nbe reachable, and we can delete it, leaving only the function’s original, unflattened control flow. Continuing the\r\nexample from above, in the analysis in step #2, we determined that Hex-Rays block @9 ultimately transferred\r\ncontrol to Hex-Rays block @6. Block @9 ended with a goto statement back to the control flow switch located\r\non block @2. We simply modify the target of the existing goto statement to point to block @6 instead of block\r\n@2, as in the following figure. (Note that we also deleted the assignment to the block number variable, since it’s\r\nno longer necessary.)\r\nhttp://www.hexblog.com/?p=1248\r\nPage 28 of 43\n\nThe case where a block has two potential successors is slightly more complicated, but the basic idea is the same:\r\naltering the existing control flow back to the control flow switch to point directly to the Hex-Rays targeted blocks.\r\nHere’s Hex-Rays block @6 again, with two possible successors.\r\nTo unflatten this, we will:\r\n1. Copy the instructions from block @8 onto the end of block @7.\r\n2. Change the goto instruction on block @7 (which was just copied from block @8) to point to block @9\r\n(since we learned in step #1 that 0x25F52EB5 corresponds to block @9).\r\n3. Update the goto target on block @8 to block @10 (since we learned in step #1 that 0x31B8F0BC\r\ncorresponds to block @10).\r\nhttp://www.hexblog.com/?p=1248\r\nPage 29 of 43\n\nWe can also eliminate the update to the block number variable on line 8.0, and the assignments to eax on lines\r\n6.1 and 7.0.\r\nThat’s it! As we make these changes for every basic block targeted by the control flow switch, the control flow\r\nswitch dispatcher will lose all of its incoming references, at which point we can prune it from the Hex-Rays\r\nmicrocode graph, and then the flattening will be gone for good.\r\nControl Flow Unflattening, In More Detail\r\nAs always, the real world is messier than curated examples. The remainder of this section details the practical\r\nengineering considerations that go into implementing unflattening as a fully-automated procedure.\r\nHeuristically Identifying Flattened Functions\r\nIt turns out that a few non-library functions within the binary were not flattened. I had enough work to do simply\r\nmaking my unflattening code work for flattened functions, such that I did not need the added hassle of tracking\r\ndown issues stemming from spurious attempts to unflatten non-flattened functions.\r\nThus, I devised a heuristic for determining whether or not a given function was flattened. I basically just asked\r\nmyself which identifying characteristics the flattened functions have. I looked at the microcode for a control flow\r\nswitch:\r\nTwo points came to mind:\r\nhttp://www.hexblog.com/?p=1248\r\nPage 30 of 43\n\n1. The functions compare one variable — the block number variable — against numeric constants in jz and\r\njg instructions\r\n2. Those numeric constants are highly entropic, appearing to have been pseudorandomly generated\r\nWith that characterization, the algorithm for heuristically determining whether a function was flattened practically\r\nwrote itself.\r\n1. Iterate through all microinstructions within a function. For this, the SDK handily provides the\r\nmbl_array_t::for_all_topinsns function, to be used with a class called minsn_visitor_t .\r\n2. For every jz and jg instruction that compares a variable to a number, record that information in a list.\r\n3. After iteration, choose the variable that had been compared against the largest number of constants.\r\n4. Perform an entropy check on the constants. In particular, count the number of bits set and divide by the\r\ntotal number of bits. If roughly 50% of the bits were set, decide that the function has been flattened.\r\nYou can see the implementation in the code — specifically the JZInfo::ShouldBlacklist() method.\r\nSimplify the Graph Structure\r\nThe flattened functions sometimes have jumps leading directly to other jumps, or sometimes the microcode\r\ntranslator inserts goto instructions that target other goto instructions. For example, in the following figure,\r\nblock 4 contains a single goto instruction to block 8, which in turn has a goto instruction to block 15.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 31 of 43\n\nThese complicate our later book-keeping, so I decided to eliminate goto -to- goto transfers. I.e. if block @X\r\nends with a goto @N instruction, and block @N contains a single goto @M instruction, update the goto\r\n@N to goto @M. In fact, we apply this process recursively; if block @M contained a single goto @P, then we\r\nwould update goto @N to goto @P, and so on for any number of chained gotos .\r\nThe Hex-Rays SDK sample VDS11 does what was just described in the last paragraph. My code is similar, but a\r\nbit more general, and therefore a bit more complicated. It also handles the case where a block falls through to a\r\nblock with a single goto — in this case, it inserts a new goto onto the end of the leading block, with the same\r\ndestination as the original goto instruction in the trailing block.\r\nExtract Block Number Information\r\nhttp://www.hexblog.com/?p=1248\r\nPage 32 of 43\n\nIn step #1 of the unflattening procedure described previously, we need to know:\r\nWhich variable contains the block number\r\nWhich block number corresponds to which Hex-Rays microcode block\r\nWhen heuristically determining whether a function appears to have been flattened, we already found the variable\r\nwith the most conditional comparisons, and the numbers it was compared against. Are we done? No — because as\r\nusual, there are complications. Many of the flattened functions use two variables, not one, for block number-related purposes. For those that use two, the function’s basic blocks update a different variable than the one that is\r\ncompared by the control flow switch construct. I call this the block update variable. and I renamed my\r\nterminology for the other one to the block comparison variable. Toward the beginning of the control flow\r\nswitch, the value of the block update variable is copied into the block comparison variable, after which all\r\nsubsequent comparisons reference the block comparison variable. For example, see the following figure:\r\nIn the above, block @1 is the function’s prologue. The control flow switch begins on block @2. Notice that block\r\n@1 assigns a numeric value to a variable called ST18_4.4 . Note that the first comparison in the control flow\r\nswitch, on line 2.3, compares against this variable. Note also that line 2.1 copies that variable into another variable\r\ncalled ST14_4.4 , which is then used for the subsequent comparisons (as on line 3.1, and all control flow switch\r\ncomparisons thereafter). Then, the function’s flattened blocks update the variable ST18_4 :\r\nhttp://www.hexblog.com/?p=1248\r\nPage 33 of 43\n\n(Confusingly, the function’s flattened blocks update both variables — however, only the assignment to the block\r\nupdate variable ST18_4.4 is used. The block comparison variable, ST14_4.4 , is redefined on line 2.1 above\r\nbefore its value is used.)\r\nSo, we actually have three tasks:\r\n1. Determine which variable is the block comparison variable (which we already have from the entropy\r\ncheck).\r\n2. Determine if there is a block update variable, and if so, which variable it is.\r\n3. Extract the numeric constants from the jz comparisons against the block comparison variable to\r\ndetermine the flattened block number to Hex-Rays mblock_t number mapping.\r\nI quickly examined all of the flattened functions to see if I could find a pattern as to how to locate the block update\r\nvariable. It was simple enough: for any variable assigned a numeric constant value in the first block, see if it is\r\nlater copied into the block comparison variable. There should be only one of these. It was easy to code using\r\nsimilar techniques to the entropy check, and it worked reliably.\r\nThe code for reconstructing the flattened Hex-Rays block number mapping is nearly identical to the code used for\r\nheuristically identifying flattened functions, and so we don’t need to say anything in particular about it.\r\nUnflattening\r\nFrom the above, we now know which variable is the block update variable (or block comparison variable, if there\r\nis none). We also know which flattened block number corresponds to which Hex-Rays mblock_t number. For\r\nevery flattened block, we need to determine the number to which it sets the block update variable. We walk\r\nbackwards, from the end of the flattened block region, looking for assignments to the block update variable. If we\r\nfind an assignment from another variable, we recursively begin tracking the other variable. If we find a number,\r\nwe’re done.\r\nAs described previously, flattened blocks come in two cases:\r\n1. The flattened block always sets the block update variable to a single value (corresponding to an\r\nunconditional branch).\r\nhttp://www.hexblog.com/?p=1248\r\nPage 34 of 43\n\n2. The flattened block uses an x86 CMOV instruction to set the block update variable to one of two possible\r\nvalues (corresponding to a conditional branch).\r\nIn the first case, our job is simply to find one number. For example, the following flattened block falls into case #1\r\nfrom above:\r\nIn this case, the block update variable is ST14_4.4 . Our task is to find the numeric assignment on line 9.4. In\r\nconcert with the flattened block number Hex-Rays mblock_t number mapping we extracted from the previous\r\nstep, we can now change the goto on the final line to the proper Hex-Rays mblock_t number.\r\nThe following flattened block falls into the second case:\r\nhttp://www.hexblog.com/?p=1248\r\nPage 35 of 43\n\nOur job is to determine that ST14_4.4 might be updated to either 0xCBAD6A23 or 0x25F52EB5 on lines 6.0 and\r\n7.0, respectively.\r\nComplication: Flattened Blocks Might Contain Many Hex-Rays Blocks\r\nThis part of the project forced me to contend with a number of complications, some of which aren’t shown by the\r\nexamples above.\r\nOne complication is that a flattened block may be implemented by more than one Hex-Rays mblock_t as in the\r\nfirst case above, or more than three Hex-Rays mblock_t objects in the second case above. In particular, Hex-Rays splits basic blocks on function call boundaries — so there may be any number of Hex-Rays mblock_t\r\nobjects for a single flattened block. Since we need to work backwards from the end of a flattened region, how do\r\nwe know where the end of the region is? I solved this problem by computing the function’s dominator tree and\r\nfinding the block dominated by the flattened block header that branches back to the control flow switch.\r\nComplication: Data-Flow Tracking\r\nFinding the numeric values assigned to the block update variable ranges from trivial to “mathematically hard”. I\r\nwound up cheating in the mathematically hard cases.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 36 of 43\n\nSometimes Hex-Rays’ constant propagation algorithms make our lives easy by creating a microinstruction that\r\ndirectly moves a numeric constant into the block update variable. A slightly less simple, but still easy, case is when\r\nthe assignment to the block update variable involves a number being copied between a few registers or stack\r\nvariables along the way. As long as there aren’t any errant memory writes to clobber saved values on the stack, it’s\r\neasy enough to follow the chain of mov instructions backwards back to the original constant value.\r\nTo handle both of these cases, I wrote a function that starts at the bottom of a block and searches for assignments\r\nto the block number variable in the backwards direction. For assignments from other variables, it resumes\r\nsearching for assignments to those variables. Once it finally finds a numeric assignment, it succeeds.\r\nHowever, there is a harder case for which the above algorithm will not work. In particular, it will not work when\r\nthe flattened blocks perform memory writes through pointers, for which Hex-Rays cannot determine legal pointer\r\nvalue sets. Hex-Rays, quite reasonably, can not and does not perform constant propagation across memory values\r\nif there are unknown writes to memory in the meantime. Such transformations would break the decompilation\r\nlisting and cause the analyst not to trust the tool. And yet, this part of the project presents us with the very problem\r\nof constant propagation across unknown memory writes.\r\nHere’s an example of the hard case manifesting itself. At the beginning of a flattened block, we see the two\r\ndestination block numbers being written into registers, and then saved to stack variables.\r\nLater on, the flattened block has several memory writes through pointers.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 37 of 43\n\nFinally, at the end of the block, the destination block numbers — which were spilled to stack variables at the\r\nbeginning of the flattened block — are then loaded from their stack slots, and used in a conditional block number\r\nupdate.\r\nThe problem this presents us is that we need, or Hex-Rays needs, to formally prove that the memory writes in the\r\nmiddle did not overwrite the saved block update numbers. In general, pointer aliasing is an undecidable problem,\r\nmeaning it is impossible to write an algorithm to solve every instance of it. So instead, I cheated. When my\r\nnumeric definition scanner encounters an instruction whose memory side effects cannot be bounded, I go to the\r\nbeginning of the flattened block region and scan forwards looking for numeric assignments to the last variables I\r\nwas tracking before encountering an unbounded memory reference. I.e., in the three assembly snippets above, I\r\njump to the first one and find the numeric assignments to var_B4 and var_BC . This is a hack; it’s unsafe, and\r\ncould very well break. But, it happens to work for every function in this sample, and will likely work for every\r\nsample compiled by this obfuscating compiler.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 38 of 43\n\nAppendix: More about the Microcode API\r\nWhat follows are some topics about the Microcode API that I thought were important enough to write up, but I did\r\nnot want them to alter the narrative flow. Perhaps you can put off reading this appendix until you get around to\r\nwriting your first microcode plugin.\r\nThe Microcode Verifier\r\nChances are good that if you’re going use the microcode API, you probably will be modifying the microcode\r\nobjects described in the previous section. This is murky territory for third-party plugin developers, especially\r\nthose of us who are new to the microcode API, since modifying the microcode objects in an illegal fashion can\r\nlead to crashes or internal errors.\r\nTo aid plugin developers in diagnosing and debugging issues stemming from illegal modifications, the microcode\r\nAPI offers “verification”, which is accessible in the API through a method called mbl_array_t::verify() . (The\r\nother objects also support verification, but their individual verify() methods are not currently exposed through\r\nthe API.) Basically, mbl_array_t::verify() applies a comprehensive set of test suites to the microcode objects\r\n(such as mblock_t , minsn_t , and mop_t ).\r\nFor one example of verification, Hex-Rays has a set of assumptions about the legal operand types for its\r\nmicroinstructions. The m_add instruction must have at least two operands, and those operands must be the same\r\nsize. m_add can optionally store the result in a “destination” operand; if this is the case, certain destination types\r\nare illegal (e.g., in C, it does not make any sense to have a number on the left-hand side of an assignment\r\nstatement, as in 1 = x + y; . The analogous concept in the microcode world, storing the result of an addition into\r\na number, also does not make sense and should be rejected as illegal.)\r\nThe source code for the verify() methods is included in the Hex-Rays SDK under verifier\\verify.cpp .\r\n(There is an analogous version for the CTREE API under verifier\\cverify.cpp .) When the verifier detects an\r\nillegal condition, it raises a numbered “internal error” within IDA, as in the following screenshot. The plugin\r\ndeveloper can search for this number within the verifier source code to determine the source of the error.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 39 of 43\n\nThe verifier source code is, in my opinion, the best and most important source of documentation about Hex-Rays’\r\ninternal expectations. It touches on many different parts of the microcode API, and provides examples of how to\r\ncall certain API functions that may not be covered by the other example plugins in the SDK. Wading through\r\ninternal errors, tracking them down in the verifier, and learning Hex-Rays’ expectations about the microcode\r\nobjects (as well as how it verifies them) is a rite of passage for any would-be microcode API plugin developer.\r\nIntermediate Representations and the Microcode Instruction Set\r\nIf you’ve ever studied compilers, you are surely familiar with the notion of an intermediate representation. The\r\nminsn_t and mop_t data types, taken together, are the intermediate represention used in the microcode phase of\r\nthe Hex-Rays decompiler.\r\nIf you’ve studied compilers at an advanced level, you might be familiar with the idea that compilers frequently use\r\nmore than one intermediate representation. For example, Muchnick describes a compiler archetype using three\r\nintermediate representations, that he respectively calls HIR (“high-level” intermediate representation), MIR\r\n(“mid-level”), and LIR (“low-level”). HIR resembles a high-level language such as C, which supports nested\r\nexpressions. I.e., in C, one may perform multiple operations in a single statement, such as a = ((b + c) * d) /\r\ne . On the other hand, low-level languages such as LIR or assembly generally can only perform one operation per\r\nstatement; to represent the same code in a low-level language, we would need at least three statements (ADD,\r\nMUL, and DIV). LIR is basically a “pseudo-assembly language”.\r\nSo then, given that the Hex-Rays microcode API has only intermediate representation, which type is it — is it\r\ncloser to HIR, or is it closer to LIR? The answer is, it uses a clever design to simulate both HIR and LIR! As the\r\nmicrocode matures, it is gradually transformed from a LIR-like representation, with only one operation per\r\nstatement, to a HIR-like representation, with arbitrarily many operations per statement. Let’s take a closer look\r\nwith the microcode explorer.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 40 of 43\n\nWhen first generating the microcode (i.e., microcode maturity level MMAT_GENERATED ), we can see that the\r\nmicrocode looks a lot like an assembly language. Notice that each microinstruction has two or three operands\r\napiece, and each operand is something like a number, register name, or name of a global variable. I.e., this is what\r\nwe would call LIR in a compiler back-end.\r\nShortly thereafter in the maturity pipeline, in the MMAT_LOCOPT phase, we can see that the microcode\r\nrepresentation for the same code in the same function is already quite different. In the figure below, many of the\r\nlines in the bottom half have complex expressions inside them, instead of the simple operands we saw just\r\npreviously. I.e., we are no longer dealing with LIR.\r\nhttp://www.hexblog.com/?p=1248\r\nPage 41 of 43\n\nFinally, at the highest level of microcode maturity, MMAT_LVARS , the same code has shrunk down to three lines,\r\nwith the final one being so long that I had to truncate it to fit it reasonably into the picture:\r\nMicroinstructions and Microoperands\r\nThat’s a pretty impressive trick — supporting multiple varieties of compiler IRs with a single set of data types.\r\nHow did they do it? Let’s look more carefully at the internal representations of microinstructions and\r\nmicrooperands to figure it out.\r\nRespectively, microinstructions and microoperands are implemented via the minsn_t and mop_t classes. Here\r\nagain is the graph representation for a microinstruction:\r\nhttp://www.hexblog.com/?p=1248\r\nPage 42 of 43\n\nIn the figure above, the top-level microcode instruction is shown in the topmost node. It is represented by an\r\ninstruction of type m_and , which in this case uses three comma-separated operands, of type mop_d (result of\r\nanother instruction), mop_n (a number), and mop_r (destination is a register). The mop_d operand is a\r\ncompound instruction with two expressions joined together with a bitwise OR — thus, it corresponds to a\r\nmicroinstruction of type m_or , whose operands themselves are respectively the result of bitwise AND and\r\nbitwise XOR operands, and as such, these operands are of type mop_d , instructions respectively of type m_and\r\nand m_xor . The inputs to the AND and XOR operators are all stack variables, i.e., micro-operands of type\r\nmop_S .\r\nNow we can see how the microcode API supports such dramatic differences in microcode representation using the\r\nsame underlying data structures. Specifically, the example above makes use of the mop_d microoperand type,\r\nwhich refers to the result of another microinstruction. I.e., microinstructions contain microoperands, and\r\nmicrooperands can contain microinstructions (which then contain other microoperands, which may recursively\r\ncontain other microinstructions, etc). This technique allows the same data structures to represent both HIR- and\r\nLIR-like representations. The initial microcode generation phase does not generate mop_d operands. Subsequent\r\nmaturity transformations introduce them in order to build a higher-level representation.\r\nThe proper name for this language design technique is mutual recursion: where one category of a grammar refers\r\nto another category, and the second refers back to the first. I found this design technique very elegant and clever.\r\nApart from using different data structures at each level of representation, I can’t think of any cleaner ways to\r\naccommodate multi-level representations. That said, this type of programming is mostly common only among\r\npeople with serious professional experience with programming language theory and compiler internals. Ordinary\r\ndevelopers would do well to study some programming language theory if they want to make good use of the\r\nmicrocode API.\r\nSource: http://www.hexblog.com/?p=1248\r\nhttp://www.hexblog.com/?p=1248\r\nPage 43 of 43\n\nfollowing figure Hex-Rays is the microcode-level is currently calling representation the block number for a small function’s variable ST14_4.4 control flow . If that variable matches switch: 0xCBAD6A23 , the\njz instruction on block @2 transfers control to block @6. Similarly, 0x25F52EB5 corresponds to block @9, and\n   Page 26 of 43   \n\nThe flattened functions translator inserts sometimes goto instructions have jumps leading that target other directly goto instructions. to other jumps, or For example, sometimes the in the following microcode figure,\nblock 4 contains a single goto instruction to block 8, which in turn has a goto instruction to block 15.\n   Page 31 of 43",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia",
		"ETDA"
	],
	"origins": [
		"web"
	],
	"references": [
		"http://www.hexblog.com/?p=1248"
	],
	"report_names": [
		"?p=1248"
	],
	"threat_actors": [],
	"ts_created_at": 1775434808,
	"ts_updated_at": 1775826708,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/d756a710363dff192c32481266645104a903305c.pdf",
		"text": "https://archive.orkl.eu/d756a710363dff192c32481266645104a903305c.txt",
		"img": "https://archive.orkl.eu/d756a710363dff192c32481266645104a903305c.jpg"
	}
}