{
	"id": "0d0d118f-10d0-465c-8ec3-b4fb3f952849",
	"created_at": "2026-04-06T00:22:05.855649Z",
	"updated_at": "2026-04-10T13:11:44.888679Z",
	"deleted_at": null,
	"sha1_hash": "4a0a71af45657599716252251a03b3e7008bceeb",
	"title": "Deobfuscation of Lumma Stealer",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 1336802,
	"plain_text": "Deobfuscation of Lumma Stealer\r\nPublished: 2024-12-14 · Archived: 2026-04-05 14:31:45 UTC\r\nIntroduction\r\nLumma Stealer is an infostealer that has been around for several years now, and consistently tops statistics on sites like\r\nMalwareBazaar as one of the most commonly distributed malware families. When it first released, Lumma Stealer had little\r\nto no obfuscation at all. Eventually, it incorporated things like control flow flattening, opaque predicates and more recently\r\naround the beginning of 2024 began using control flow indirection. I set about developing a Hex-Rays plugin to deobfuscate\r\nthe sample from the first link prior to the news about control flow indirection being added. However, this methodology\r\nshould still be applicable to the newer version as long as the control flow indirection is removed first. In this writeup, I\r\nwill go through the different challenges I experienced during this project and how I overcame them.\r\nInitial Analysis of the Obfuscation\r\nUpon opening the WinMain() function in IDA, we can immediately see that control flow flattening has been applied. This\r\nis one of the simplest instances in this particular binary:\r\nAll that is required to solve this function is more or less what I did in my old Agent Tesla post, which is to find which blocks\r\ncorrespond to the mapping numbers and patch the flattened blocks to jump to their proper destinations. Here, we either have\r\njz or jnz instructions which check the dispatcher variable ( eax register) value. If it matches and the opcode is jz ,\r\nthe jump is taken. Otherwise, if it’s jnz the fallthrough branch is taken. This is how it would look in it’s unflattened\r\nform:\r\n1\r\n2\r\n3\r\n4\r\n5\r\n6\r\n7\r\nint __stdcall __noreturn WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd)\r\n{\r\n mw_integrity_checks();\r\n mw_set_cnc();\r\n mw_main_routine();\r\n ExitProcess(0);\r\n}\r\nThe next function looks much more complex. Not only is it much longer at 400 lines, but something weird is going on -\r\nthere is a pattern that is repeated constantly in the function. It seems to involve a bunch of math operations, ends with an\r\nif statement which multiples a variable (in this case v9 ) by a random 16-bit constant, then subtracts 1 and checks if it\r\nequals v9 . The dispatcher value changes depending on if this is true or not.\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 1 of 19\n\nAfter looking closely, we can determine that these are opaque predicates.\r\nWhy? Because the above would require v9 to be a fraction to be true, and these are integers. Therefore, the true branch of\r\nthe jz is NEVER taken, and leads to junk code. That means v3 will always be set to 0xCA7D9612 . There are\r\nadditionally blocks that use jnz instead, and in those cases the opposite is true where the true branch is always taken.\r\nBut we aren’t out of the woods yet with this function. There is yet another problem:\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 2 of 19\n\nA large amount of the flattened blocks don’t go directly back to the dispatcher. They go to what I call an ‘optimization\r\nblock’. What is happening here is that since all of the affected blocks have their variable result stored in the eax register\r\nand require it to be moved to the stack variable var_1074 , the compiler created one block and pointed them all to it instead\r\nof having a copy of that same ‘optimization’ block for every affected flattened block.\r\nI noted down the aforementioned issues preventing me from unflattening for later, and began thinking of where to begin\r\ndevelopment of a plugin that would deobfuscate the binary.\r\nChoosing a Starting Point\r\nI had never worked with the Hex-Rays API at all before this, but I was aware of two projects that utilized it to deobfuscate\r\nflattened code. The first one is the original HexRaysDeob plugin written in C++ by Rolf Rolles of which every other Hex-Rays unflattener plugin has been based off of. If you have not read his writeup, I would suggest to do so as I consider it\r\nrequired reading. The second is D810 by Boris Batteux, which is written in Python and supports plugins/profiles in itself. I\r\nalso highly recommend reading this writeup as well. However, not only am I much more comfortable using C/C++ over\r\nPython, but I wanted to work directly with the microcode itself and not have to in addition learn the inner workings of D-810, so I opted to update the former. In the end, various challenges I encountered would result in me more or less completely\r\ngutting the original plugin code.\r\nThe Journey Begins\r\nDuring the development of the plugin, I didn’t necessarily go in the exact order as described below. I tried removing the\r\nopaque predicates before completely finishing the unflattening removal, which turned out to be a mistake. I also bounced\r\naround many functions, sometimes not finishing one before I began another and going back. However, to make the flow of\r\nthis writeup easier to follow I will describe the process of the flattening removal function-by-function and then move on to\r\nthe opaque predicates.\r\nThe original plugin works by finding the most frequent occurence of a comparison between a certain register or stack\r\nvariable, which will be referred to as the dispatcher variable . Each of these comparison block s’ numeric value that is\r\nbeing compared is then saved into a map with its corresponding child block. Next, the code detects mov instructions that\r\nmove a numeric value into that variable, and patches those at the end to be jumps to the corresponding comparison block’s\r\nchild block. Unmodified, it only searched for comparisons using the jz opcode. The first change I made to the plugin was\r\nabout as easy as it gets: I added handling for the jnz cases as well. This change resulted in the unflattening of the\r\nWinMain() function.\r\n1\r\n2\r\n3\r\n4\r\nint visit_minsn(void)\r\n{\r\n// We're looking for jz instructions that compare a number ...\r\nif ((curins-\u003eopcode != m_jz \u0026\u0026 curins-\u003eopcode != m_jnz) || curins-\u003er.t != mop_n)\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 3 of 19\n\n5\r\n6\r\n7\r\n8\r\n9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\nreturn 0;\r\n// ... against our comparison variable ...\r\nif (!equal_mops_ignore_size(*m_CompareVar, curins-\u003el))\r\n{\r\n// ... or, if it's the dispatch block, possibly the assignment variable ...\r\nif (blk-\u003eserial != m_DispatchBlockNo || !equal_mops_ignore_size(*m_AssignVar, curins-\u003el))\r\nreturn 0;\r\n}\r\nint blockNo;\r\nswitch (curins-\u003eopcode)\r\n{\r\ncase m_jz:\r\n// ... and the destination of the jz must be a block\r\nif (curins-\u003ed.t != mop_b)\r\nreturn 0;\r\nblockNo = curins-\u003ed.b;\r\nbreak;\r\ncase m_jnz:\r\nblockNo = blk-\u003esucc(0);\r\nbreak;\r\n}\r\n .......\r\n}\r\nNext, I moved on to mw_integrity_checks() . Ignoring the opaque predicates, the major problem here is the optimization\r\nblocks . As described before, they do not create a direct path to the dispatcher since there are a massive amount of flattened\r\nblocks pointing to the opt block. My initial solution was based on the assumption that all of these optimization blocks only\r\ncontained the instruction which moved the numeric value from whatever variable it was being stored in into the correct\r\ndispatcher variable. I treated each optimization block as its own dispatcher, noted down the variable it was using and iterated\r\nits predecessors, pointing them to the correct blocks.\r\n1\r\n2\r\n3\r\n4\r\n5\r\n6\r\n7\r\n8\r\n9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\nstruct opt_block_info\r\n{\r\nint block_num;\r\nmop_t op; // The operand that is being mov'd into first before the dispatcher variable\r\n};\r\nfor (auto opt_block : cfi.opt_blocks)\r\n{\r\n// For optimization\r\nfor (auto opt_pred : mba-\u003eget_mblock(opt_block.blockNum)-\u003epredset)\r\n{\r\nunflatten(opt_pred, opt_block.op, ...);\r\n}\r\n...\r\n}\r\nThe function then could be decompiled in unflattened form without error. However, the assumption I made turned out to be\r\nincorrect. In less common cases, other functions in the binary had other code besides the single mov in the optimization\r\nblock. By doing what I did, I was losing vital code for the function. Not knowing this at the time, I continued on.\r\nOptimization Woes\r\nThe next couple functions mw_display_crypt_warning() and mw_send_empty_get_req() seemed to unflatten fine with the\r\ncode I had. That means the next function on the list was the third call in the WinMain() , mw_main_routine() . Upon\r\ndecompiling the function, I was greeted with over 3000 lines of obfuscated code.\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 4 of 19\n\nAdditionally, when I tried to decompile the function with my plugin enabled it failed to detect the dispatcher variable being\r\nused. I started debugging to find the problem, and the first thing I did was check the assembly view in IDA to see if there\r\nwas anything peculiar.\r\nLooking at the first block of the function, we can see that the stack base ebp is being moved into the register esi . Then, a\r\nvariable at base of stack + 4 is being written to. The difference is, in our previous functions, ebp was used directly.\r\nHere it is not. I assumed this wouldn’t be a problem because Hex-Rays optimization should recognize that esi + 4 is a\r\nstack variable. This turned out to be incorrect. Rolf’s original project includes a microcode explorer (like the unflattening\r\nplugin, the first of its kind) which can be used to view the microcode as a graph. I opened the microcode explorer and\r\nbrowsed the microcode maturity level I was operating at, MMAT_LOCOPT .\r\nThat is when the issue started to become a bit more clear:\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 5 of 19\n\nThe mov instructions that accessed the stack through esi were not being recognized as stack variables by Hex-Rays.\r\nForward propagation had not been performed, so the instructions were either ldx or stx instead of mov . This was a\r\nmajor problem, because my code was looking specifically for mov instructions. I thought of a few ways to fix this problem.\r\nThe first was to implement a second handling which would look for ldx and stx instructions in the case the optimization\r\nwas not applied. I did not like this idea at all, as it would bloat the code significantly. Another idea I had was to see if I could\r\noperate at a later maturity level like MMAT_CALLS . After looking into this idea, I noticed it introduced another optimization-related problem.\r\n1\r\n2\r\n3\r\n4\r\n5\r\n6\r\n7\r\n8\r\n9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n; 1WAY-BLOCK 1 INBOUNDS: 0 OUTBOUNDS: 2 [START=42A9CE END=42AA08] MINREFS: STK=4C/ARG=150, MAXBSP: 3C\r\n; USE: sp+1C.4,(GLBLOW,GLBHIGH)\r\n; DEF: eax.4,esi.4,sp+14.4,sp+20.8,(cf.1,zf.1,sf.1,of.1,pf.1,edx.4,ecx.4,fps.2,fl.1,c0.1,c2.1,c3.1,df.1,if.1,GLBLOW,sp+C.4,G\r\n; DNU: eax.4,esi.4,sp+14.4,sp+20.8\r\nmov call $GetSystemMetrics\u003cstd:\"int nIndex\" #0.4\u003e.4, %var_2C.4{1} ; 42A9DF u=(GLBLOW,GLBHIGH) d=sp+20.4,(cf.1,zf.1,sf.1,o\r\nmov call $GetSystemMetrics\u003cstd:\"int nIndex\" #1.4\u003e.4, %cy.4{2} ; 42A9E7 u=(GLBLOW,GLBHIGH) d=sp+24.4,(cf.1,zf.1,sf.1,of.1,\r\nmov #0xB503DEBC.4, %var_38.4 ; 42A9EB u= d=sp+14.4\r\nmov #0xB503DEBC.4, eax.4 ; 42A9F3 u= d=eax.4\r\nmov %var_30.4{3}, esi.4{3} ; 42A9F8 u=sp+1C.4 d=esi.4\r\n; 2WAY-BLOCK 2 INBOUNDS: 1 6 9 13 16 18 24 25 27 OUTBOUNDS: 3 10 [START=42AA08 END=42AA0F] MINREFS: STK=2C/ARG=150, MAXBSP:\r\n; USE: eax.4\r\n; VALRANGES: eax.4:(==46BC6480|==7B6E7A9B|==97CD0040|==99ED4E33|==B503DEBC), %0x14.4:==B503DEBC\r\njg eax.4, #0xC4B3E928.4, @10 ; 42AA0D u=eax.4\r\n; 2WAY-BLOCK 3 INBOUNDS: 2 OUTBOUNDS: 4 14 [START=42AA0F END=42AA16] MINREFS: STK=2C/ARG=150, MAXBSP: 10\r\n; USE: eax.4\r\n; VALRANGES: eax.4:(==97CD0040|==99ED4E33|==B503DEBC), %0x14.4:==B503DEBC\r\njle eax.4, #0xAAEC8E2C.4, @14 ; 42AA14 u=eax.4\r\n; 1WAY-BLOCK 4 INBOUNDS: 3 OUTBOUNDS: 5 [START=42AA16 END=42AA21] MINREFS: STK=2C/ARG=150, MAXBSP: 10\r\n; VALRANGES: eax.4:==B503DEBC, %0x14.4:==B503DEBC\r\n; 1WAY-BLOCK 5 INBOUNDS: 4 OUTBOUNDS: 27 [START=42AA21 END=42AA2C] MINREFS: STK=2C/ARG=150, MAXBSP: 10\r\n; VALRANGES: eax.4:==B503DEBC, %0x14.4:==B503DEBC\r\ngoto @27 ; 42AA26 u=\r\n; 2WAY-BLOCK 6 OUTBOUNDS: 7 2 [START=42AA2C END=42AA33] MINREFS: STK=2C/ARG=150, MAXBSP: 10\r\n; USE: eax.4\r\njnz eax.4, #0xC0031B52.4, @2 ; 42AA31 u=eax.4\r\n; 2WAY-BLOCK 7 INBOUNDS: 6 OUTBOUNDS: 8 9 [START=42AA33 END=42AA4B] MINREFS: STK=2C/ARG=150, MAXBSP: 10\r\n; USE: esi.4,(GLBLOW,sp+2C..,GLBHIGH)\r\n; DEF: eax.4,esi.4,(cf.1,zf.1,sf.1,of.1,pf.1,edx.4,ecx.4,fps.2,fl.1,c0.1,c2.1,c3.1,df.1,if.1,GLBLOW,sp+2C..,GLBHIGH)\r\n; DNU: eax.4\r\nsub (esi.4 \u003e\u003el #0xF.1), #0x72.4, esi.4{4} ; 42AA36 u=esi.4 d=esi.4\r\ncall $sub_42A9CE \u003ccdecl:\u003e.0 ; 42AA39 u=(GLBLOW,sp+2C..,GLBHIGH) d=(cf.1,zf.1,sf.1,of.1,pf.1,rax.8,ecx.4,fps.2,fl.1,c0.1,c\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 6 of 19\n\n38\r\n39\r\nmov #0xAAEC8E2D.4, eax.4 ; 42AA3E u= d=eax.4\r\njae esi.4{4}, #0x3800.4, @9 ; 42AA49 u=esi.4\r\nBlock 4 is now completely empty and block 5 has been changed to a goto. This is how it looked before in MMAT_LOCOPT\r\n1\r\n2\r\n3\r\n4\r\n5\r\n6\r\n7\r\n8\r\n9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n; 2WAY-BLOCK 6 INBOUNDS: 5 OUTBOUNDS: 7 23 [START=42AA16 END=42AA21] MINREFS: STK=0/ARG=50, MAXBSP: 10\r\n; USE: eax.4\r\n; DEF: cf.1,zf.1,sf.1,of.1,pf.1\r\n; DNU: cf.1,zf.1,sf.1,of.1,pf.1\r\nsetb eax.4, #0xAAEC8E2D.4, cf.1 ; 42AA16 u=eax.4 d=cf.1\r\nseto eax.4, #0xAAEC8E2D.4, of.1 ; 42AA16 u=eax.4 d=of.1\r\nsetz (eax.4+#0x551371D3.4), #0.4, zf.1 ; 42AA16 u=eax.4 d=zf.1\r\nsetp (eax.4+#0x551371D3.4), #0.4, pf.1 ; 42AA16 u=eax.4 d=pf.1\r\nsets (eax.4+#0x551371D3.4), sf.1 ; 42AA16 u=eax.4 d=sf.1\r\njz eax.4, #0xAAEC8E2D.4, @23 ; 42AA1B u=eax.4\r\n; 2WAY-BLOCK 7 INBOUNDS: 6 OUTBOUNDS: 8 34 [START=42AA21 END=42AA2C] MINREFS: STK=0/ARG=50, MAXBSP: 10\r\n; USE: eax.4\r\n; DEF: cf.1,zf.1,sf.1,of.1,pf.1\r\n; DNU: cf.1,zf.1,sf.1,of.1,pf.1\r\nsetb eax.4, #0xB503DEBC.4, cf.1 ; 42AA21 u=eax.4 d=cf.1\r\nseto eax.4, #0xB503DEBC.4, of.1 ; 42AA21 u=eax.4 d=of.1\r\nsetz (eax.4+#0x4AFC2144.4), #0.4, zf.1 ; 42AA21 u=eax.4 d=zf.1\r\nsetp (eax.4+#0x4AFC2144.4), #0.4, pf.1 ; 42AA21 u=eax.4 d=pf.1\r\nsets (eax.4+#0x4AFC2144.4), sf.1 ; 42AA21 u=eax.4 d=sf.1\r\njz eax.4, #0xB503DEBC.4, @34 ; 42AA26 u=eax.4\r\n; 2WAY-BLOCK 8 INBOUNDS: 7 OUTBOUNDS: 9 4 [START=42AA2C END=42AA33] MINREFS: STK=0/ARG=50, MAXBSP: 10\r\n; USE: eax.4\r\n; DEF: cf.1,zf.1,sf.1,of.1,pf.1\r\n; DNU: cf.1,zf.1,sf.1,of.1,pf.1\r\nsetb eax.4, #0xC0031B52.4, cf.1 ; 42AA2C u=eax.4 d=cf.1\r\nseto eax.4, #0xC0031B52.4, of.1 ; 42AA2C u=eax.4 d=of.1\r\nsetz (eax.4+#0x3FFCE4AE.4), #0.4, zf.1 ; 42AA2C u=eax.4 d=zf.1\r\nsetp (eax.4+#0x3FFCE4AE.4), #0.4, pf.1 ; 42AA2C u=eax.4 d=pf.1\r\nsets (eax.4+#0x3FFCE4AE.4), sf.1 ; 42AA2C u=eax.4 d=sf.1\r\njnz eax.4, #0xC0031B52.4, @4 ; 42AA31 u=eax.4\r\nI decided to consult Rolf Rolles himself to find out what was going on. He explained that this is the result of an optimization\r\ntechnique called value range optimization, and it is possible to disable this technique in the Hex-Rays decompiler settings.\r\nThis meant that my options were either to implement bloated code to handle ldx / stx instructions, or operate in\r\nMMAT_CALLS with the requirement that the user of my plugin start screwing around with the decompiler settings every time\r\nthey want to use it. Both of those options sounded pretty awful, so a third option was devised: Continue to operate in\r\nMMAT_LOCOPT , but implement code which fixes all the ldx / stx instruction to be mov with the proper stack variable.\r\nThe way I implemented this was to first see if forward propagation was not performed by checking if ldx/stx instructions\r\nare being used to access the dispatcher variable. In that case, we note down the register that the stack base was moved into\r\nand iterate all instructions which read or write from there, change the opcode to mov and finally create a new\r\nstkvar_ref_t .\r\nImplementing this was initially confusing, because I noticed that my microcode dump from the code was different than the\r\ndump in the microcode explorer:\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 7 of 19\n\nAccording to Rolf, the reason for this was\r\nWhen you register a block optimizer and check for MMAT_LOCOPT, you’re not getting called exactly at\r\nMMAT_LOCOPT, you’re getting called somewhere afterwards, between then and the next maturity level\r\nMMAT_CALLS. and you might get called more than once, so you might get called in the middle of this analysis\r\nthat is transforming the stx and ldx into mov instructions.\r\nThis means that the microcode explorer and the code, despite being written to dump at the same stage, did not actually do\r\nso. The microcode explorer dump was at a slightly earlier stage of MMAT_LOCOPT that I was unable to operate at, hence why\r\nthe code on the left has forwarded-propagated the mov in the first block (but not the second). After this incident, I mostly\r\nused microcode explorer strictly for initial analysis and relied on the dumps from the block optimizer callback for\r\ndebugging.\r\nThe good news was that the idea worked. After trying again to decompile with the plugin enabled, all of the ldx / stx\r\ninstructions were fixed and the dispatcher variable was found. There was another issue, however, and this one turned out to\r\nthe most difficult and time-consuming for me to fix. I will use separate smaller function which has the same problem to\r\nexplain.\r\nComplex Branches\r\nThis is how the function looks after my final deobfuscator code has been ran:\r\nThe above block of code calls GetAdaptersInfo() and then proceeds to check if the function returned a buffer overflow\r\nerror or returned an empty size and exits if either condition is satisified. Now let’s look at the control flow graph of the\r\ngenerated assembly:\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 8 of 19\n\nSince block #4 can have two possible values for eax ( 0xC92049E7 or 0xCDFAE9E1 ), that means there is no direct path to\r\nthe dispatcher! This example is also a small one. There are cases where there are multiple checks being performed in the\r\nif block which can translate to even bigger chains of blocks such as this function:\r\nThat translates to the following assembly:\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 9 of 19\n\nI call these cases complex branches , and I believe they’re mentioned in the D-810 writeup. So, how did I fix this problem\r\nas you can see in the screenshots of the decompiled code? Since the problem is due to multiple incoming blocks causing\r\nthere to be no direct path to the dispatcher, I decided that all blocks on the path from cluster head to the dispatcher must\r\nonly have a single predecessor. By this time, I had also noticed the problem from earlier regarding the optimization\r\nblocks which was also due to not having a direct dispatcher path. I was about to kill two birds with one stone.\r\nTo accomplish my goal of ensuring each block only has a single predecessor, I first iterate all of the dispatcher block’s\r\npredecessors. Then, if the dispatcher block predecessor we are looking at has more than two of it’s own predecessors (I’m\r\ngoing to refer to these as sub-preds), I iterate those until we reach number of sub-preds - 2 . The reason I stop at count -\r\n2 is because if the predecessor happens to have a fallthrough block, it will always be the last one located at count - 1 .\r\nAdditionally, if the dispatcher block predecessor is conditional, we also don’t want to remove the other branch which would\r\nbe located at count - 2 . For each iteration, I first check if the sub-pred is the child of a conditional block which is also\r\npointing to the same dispatcher predecessor. If so, I point both the conditional parent block and the sub-pred to a new copy\r\nof the dispatcher block predecessor that I insert into the graph. Otherwise, I just point the sub-pred to the block and don’t\r\ntouch the parent block. You can see this process illustrated below in the function we looked at that accesses the adapters:\r\nBefore:\r\n1\r\n2\r\n3\r\n4\r\n5\r\n6\r\n7\r\n8\r\n9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n; 1WAY-BLOCK 27 INBOUNDS: 26 OUTBOUNDS: 231 [START=420324 END=420329] MINREFS: STK=0/ARG=F8, MAXBSP: 84\r\ngoto @231 ; 420324 u=\r\n....\r\n; 2WAY-BLOCK 229 INBOUNDS: 150 OUTBOUNDS: 230 231 [START=420EEB END=420F03] MINREFS: STK=0/ARG=F8, MAXBSP: 84\r\n; USE: sp+DC.4\r\n; DEF: cf.1,zf.1,sf.1,of.1,pf.1,eax.4,ecx.4,sp+DC.4\r\n; DNU: cf.1,zf.1,sf.1,of.1,pf.1,eax.4\r\nmul #0x1920000.4, (%var_14.4 \u003e\u003el #7.1), ecx.4 ; 420EF1 split4 u=sp+DC.4 d=ecx.4\r\nmov ecx.4, %var_14.4 ; 420EF7 u=ecx.4 d=sp+DC.4\r\nmov #0x559A1BB7.4, eax.4 ; 420EFA u= d=eax.4\r\nmov #0.1, cf.1 ; 420EFF u= d=cf.1\r\nmov #0.1, of.1 ; 420EFF u= d=of.1\r\nsetz ecx.4, #0.4, zf.1 ; 420EFF u=ecx.4 d=zf.1\r\nsetp ecx.4, #0.4, pf.1 ; 420EFF u=ecx.4 d=pf.1\r\nsets ecx.4, sf.1 ; 420EFF u=ecx.4 d=sf.1\r\njz ecx.4, #0.4, @231 ; 420F01 u=ecx.4\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 10 of 19\n\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n; 1WAY-BLOCK 230 INBOUNDS: 229 OUTBOUNDS: 231 [START=420F03 END=420F08] MINREFS: STK=0/ARG=F8, MAXBSP: 84\r\n; DEF: eax.4\r\n; DNU: eax.4\r\nmov #0x937A5D68.4, eax.4 ; 420F03 u= d=eax.4\r\n; 1WAY-BLOCK 231 INBOUNDS: 27 ........ 229 230 OUTBOUNDS: 2 [START=420F08 END=420F10] MINREFS: STK=0/ARG=F8, MAXBSP: 84\r\n; USE: eax.4\r\n; DEF: sp+E0.4\r\nmov eax.4, %var_10.4 ; 420F08 u=eax.4 d=sp+E0.4\r\ngoto @2 ; 420F0B u=\r\nAfter\r\n1\r\n2\r\n3\r\n4\r\n5\r\n6\r\n7\r\n8\r\n9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n; 1WAY-BLOCK 27 INBOUNDS: 26 OUTBOUNDS: 232 [START=420324 END=420329] MINREFS: STK=0/ARG=F8, MAXBSP: 84\r\ngoto @232 ; 420324 u=\r\n....\r\n; 2WAY-BLOCK 229 INBOUNDS: 150 OUTBOUNDS: 230 231 [START=420EEB END=420F03] MINREFS: STK=0/ARG=F8, MAXBSP: 84\r\n; USE: sp+DC.4\r\n; DEF: cf.1,zf.1,sf.1,of.1,pf.1,eax.4,ecx.4,sp+DC.4\r\n; DNU: cf.1,zf.1,sf.1,of.1,pf.1,eax.4\r\nmul #0x1920000.4, (%var_14.4 \u003e\u003el #7.1), ecx.4 ; 420EF1 split4 u=sp+DC.4 d=ecx.4\r\nmov ecx.4, %var_14.4 ; 420EF7 u=ecx.4 d=sp+DC.4\r\nmov #0x559A1BB7.4, eax.4 ; 420EFA u= d=eax.4\r\nmov #0.1, cf.1 ; 420EFF u= d=cf.1\r\nmov #0.1, of.1 ; 420EFF u= d=of.1\r\nsetz ecx.4, #0.4, zf.1 ; 420EFF u=ecx.4 d=zf.1\r\nsetp ecx.4, #0.4, pf.1 ; 420EFF u=ecx.4 d=pf.1\r\nsets ecx.4, sf.1 ; 420EFF u=ecx.4 d=sf.1\r\njz ecx.4, #0.4, @231 ; 420F01 u=ecx.4\r\n; 1WAY-BLOCK 230 INBOUNDS: 229 OUTBOUNDS: 231 [START=420F03 END=420F08] MINREFS: STK=0/ARG=F8, MAXBSP: 84\r\n; DEF: eax.4\r\n; DNU: eax.4\r\nmov #0x937A5D68.4, eax.4 ; 420F03 u= d=eax.4\r\n; 1WAY-BLOCK 231 INBOUNDS: 229 230 OUTBOUNDS: 2 [START=420F08 END=420F10] MINREFS: STK=0/ARG=F8, MAXBSP: 84\r\n; USE: eax.4\r\n; DEF: sp+E0.4\r\nmov eax.4, %var_10.4 ; 420F08 u=eax.4 d=sp+E0.4\r\ngoto @2 ; 420F0B u=\r\n; 1WAY-BLOCK 232 INBOUNDS: 27 OUTBOUNDS: 2 [START=420F08 END=420F10] MINREFS: STK=0/ARG=F8, MAXBSP: 84\r\n; USE: eax.4\r\n; DEF: sp+E0.4\r\nmov eax.4, %var_10.4 ; 420F08 u=eax.4 d=sp+E0.4\r\ngoto @2 ; 420F0B u=\r\nTake a look at the after dump and you can see that the optimization block issue has been solved. Block #27 now\r\npoints to a copy of the optimization block that has no other incoming blocks, so it would be safe to patch the goto @2\r\ninstruction to its correct location later on. After all dispatcher predecessors have been confirmed to have no more than two\r\npredecessors, we can handle the complex branches. To do so, I again iterate the dispatcher predecessors. This time, I create a\r\n‘trace’ for each predecessor. The way I implemented tracing was to do a depth-first search from the dispatcher predecessor\r\nall the way up to the clusterhead. This will find every possible path to the clusterhead. I determine if we’ve hit the\r\nclusterhead by checking if its one of those jz/jnz comparison blocks from the beginning of our analysis. If so, the trace\r\nreturns.\r\nOnce this is done, I call another function which takes the trace information and turns it into a basic integer array of all the\r\npossible paths we found using the blocks’ serials.\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 11 of 19\n\n1\r\n2\r\n3\r\n4\r\n255 84 82 81\r\n255 84 83 82 81\r\n255 85 84 82 81\r\n255 85 84 83 82 81\r\nFor reference, here is a control flow graph of the microcode after all dispatcher predecessors have been ensured to have no\r\nmore than two predecessors. You can follow along yourself and see each possible path from the list above!\r\nNow that I have the trace, there are a few cases we need to handle. In the case above, there are four possible paths. When\r\nthis happens, I keep only the bottom two paths. Next, I check where the paths diverge from eachother. In the case of the\r\nbottom two paths, the divergence occurs at index 3. I then copy the blocks from below the divergence of the last path. This\r\nwould mean that blocks 83 , 84 , 85 and 255 are copied. I patch block 83 to be a jump to the newly copied block\r\npath.\r\nHere is how the control flow graph looks after the modifications.\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 12 of 19\n\nAs you can see, the divergence issue has been fixed. Now, just two more passes are required to handle the two predecessors\r\nof blocks 274 and 255 respectively. Below is the final product with no path issues, which is ready to be unflattened!:\r\nFor complex branches that have a few different checks inside of the if block, it will take subsequent passes of the loop to\r\ncomplete, which is why the loop does not exit until all blocks have a direct path to the dispatcher.\r\nThere are a few other cases we need to handle though. Firstly, sometimes we’ll get three possible paths such as here:\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 13 of 19\n\n1\r\n2\r\n3\r\n421 419 418 258\r\n421 420 199 198 197 196\r\n421 420 419 418 258\r\nWhat do we do in that case? Well, by taking a look at the paths above it is apparent that one path looks very different from\r\nthe others. That would be the middle path, and the first immediate indicator is that the end block is different ( 196 vs\r\n258 ). When this happens, I copy the entire block range that the oddball path takes ( 421 , 420 , 199 , 198 , 197 ) and\r\npoint the first block ( 196 ) to that instead. That ensures we can safely work on the remaining two paths.\r\nSecondly, what if we end up with two paths that start with different blocks? Here are some examples I dealt with in\r\nmw_main_routine :\r\n1\r\n2\r\n1276 359 358 357\r\n1276 359 358 457\r\nor\r\nor\r\n1\r\n2\r\n628 627 457\r\n628 627 626\r\nWhen this happens, I ensure that the path I modify is not the fallthrough branch of a conditional block. These would be\r\n1276 359 358 357 , 442 441 , or 628 627 626 above respectively. Since we can’t copy the path down from the\r\ndivergence and point the conditional block there, we simply take the other available path in the path list and modify that one\r\ninstead.\r\nConditional Starts\r\nNow, after all this are we finally ready to get to the actual unflattening part? NOPE! There is yet another problem we need to\r\ndeal with.\r\nTake a look at this function:\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 14 of 19\n\nThe variable v3 is the variable which controls the control flow. However, a variable v2 is being set at the beginning\r\ndepending if the argument arg_api_hash is empty. v2 is not being used until later on, when v3 takes its value. I had\r\nnever seen any other control flow unflattening writeup that ran into such an issue, and I came up with my own solution. Here\r\nis my idea:\r\n1) Identify the two possible conditions and the register (or stack variable in some cases) that the constants are moved into (in\r\nthis case EDI):\r\n2) Replace each constant with a 1 or 0 (representing true/false)\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 15 of 19\n\n3) Where the actual final fix for the conditional start happens is during the actual unflattening stage when we are iterating\r\neach dispatcher predecessor and see this:\r\nWhen we find an assignment to ebx (the variable the dispatcher uses) from the variable that was saved at the beginning\r\nwhen we noticed there was a conditional start ( edi ), we change the affected block to a branch by making it a jz . This\r\njz will check if edi is 0 or not (remember we switched the mov instructions at the beginning to move 1 or 0 instead of the\r\nblock assignment number?)\r\nThen, it will jump to whichever case is bound to 0 or 1. Here is how it looks after:\r\nThis plan seemed like a solid idea. The problem is that I seemed to possibly be encountering a Hex-Rays bug, particularly an\r\nincorrect optimization that prevented the decompilation from being correct. I contacted them about this issue and they\r\nacknowledged it, but I have not heard back. This would not be the first time I ended up breaking Hex-Rays while working\r\non this project, as I identified another incorrect optimzation prior to this which I contacted them about and that they actually\r\ndid fix for IDA 9.0.\r\nUnflattening…finally!\r\nNow that we have addressed all possible issues preventing us from unflattening, it is time to get it done. I do need to\r\nmention as well that I completely redid the DeferredGraphModifier class from the original plugin. Now, it uses a queue\r\nand supports many different operations which can be done to the graph:\r\n1\r\n2\r\n3\r\n4\r\n5\r\nenum graph_modification_type\r\n{\r\nblock_target_change,\r\nblock_fallthrough_change,\r\nblock_nop_insns,\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 16 of 19\n\n6\r\n7\r\n8\r\nnew_block_insertion,\r\nblock_convert_to_branch\r\n};\r\nI strongly suggest doing this and queueing up all your graph modifications. I made the mistake of trying to do a lot of graph\r\nchanges during the loop iteration which caused lots of problems and wasted time.\r\nThe way I did unflattening was to trace the writes to the dispatcher variable until we got to a numeric constant. Then, I\r\nwould remove the entire chain of write instructions later on. An important thing I did was to defer my instruction removals.\r\nWhat I mean by this, is after we patch the goto’s at the end of each flattened block to go to the proper destination, we want\r\nto remove the instruction which moves that numeric value into the dispatcher control variable, right? Well, what if we have a\r\nbranch that sets the control flow variable? This means if conditional block a branches to either block b or c and we are\r\ncurrently unflattening b and trace the writes into a and remove the instruction, then by the time we get to block c we\r\nhave no idea what value it should be using. All instructions that need to be removed now get removed AFTER the\r\nunflattening process is done.\r\nThe end result will be a completely unflattened graph, right? I revisited the main function excited to see the end result.\r\n…and was greeted with disappointment. I spent so much time on the unflattening, I had forgotten all about the opaque\r\npredicates from the beginning! They were still there, and causing all sorts of problems. Determined to get a clean decompiler\r\noutput once and for all, I set my sights on removing them.\r\nOpaque Predicates / Junk Code\r\nI ran into several issues trying to remove the opaque predicates. Firstly, legitimate instructions get interwoven with the\r\nopaque predicates / junk code. Based off of the few functions I had looked at when I originally tried to remove them, it\r\nsufficed to simply erase the entire affected block by filling it with nops. I later found out the consequences of doing this, so I\r\nhad to change my strategy.\r\n1\r\n2\r\n3\r\n4\r\n44. 0 xdu [ds.2:(%var_B4.4{85}+#3.4)].1, %var_4C.4{68} ; 4332CA u=ds.2,sp+18.4,(GLBLOW,GLBHIGH) d=sp+80.4 \u003c--------------\r\n44. 1 mul (%var_BC.4{71}-#0xFA.4){70}, (%var_BC.4{71}-#0xFA.4){70}, eax.4{72} ; 433336 split4 u=sp+10.4 d=eax.4\r\n44. 2 sub (((#0x8A.4*%var_BC.4{71})-#0xC210.4) \u003e\u003el #0xD.1), #0x20.4, %var_BC.4{73} ; 433352 u=sp+10.4 d=sp+10.4\r\n44. 3 jnz eax.4{72}, ((#0x139F.4*eax.4{72})-#1.4), @67 ; 43335D inverted_jx u=eax.4\r\nThe above is a screenshot from MMAT_CALLS showing how legitimate instruction can appear in blocks that have junk\r\ninstructions. You may be wondering why I’m suddenly at MMAT_CALLS after spending the entirety of this project working at\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 17 of 19\n\nMMAT_LOCOPT . Well, I was originally trying to remove the opaques/junk at MMAT_LOCOPT and was running some issues. See\r\nhow each instruction, despite containing a ton of operations only is technically one line? During MMAT_LOCOPT , Hex-Rays\r\nhad not performed any optimizations on the instructions and I was getting all sorts of different instruction patterns which\r\nbroke my signature.\r\n1\r\n2\r\n3\r\n4\r\n5\r\n6\r\n7\r\n8\r\n9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n226. 0 ; 2WAY-BLOCK 226 INBOUNDS: 131 OUTBOUNDS: 227 228 [START=43332C END=43335F] MINREFS: STK=0/ARG=D0, MAXBSP: 8\r\n226. 0 ; USE: sp+10.4\r\n226. 0 ; DEF: cf.1,zf.1,sf.1,of.1,pf.1,rax.8,ecx.4,sp+10.4\r\n226. 0 ; DNU: cf.1,zf.1,sf.1,of.1,pf.1,ecx.4\r\n226. 0 mul (%var_BC.4-#0xFA.4), (%var_BC.4-#0xFA.4), eax.4 ; 433336 split4 u=sp+10.4 d=eax.4\r\n226. 1 sub (#0x139F.4*eax.4), #1.4, edx.4 ; 43334B u=eax.4 d=edx.4\r\n226. 2 sub (((#0x8A.4*%var_BC.4)-#0xC210.4) \u003e\u003el #0xD.1), #0x20.4, %var_BC.4 ; 433352 u=sp+10.4 d=sp+10.4\r\n226. 3 mov #0xB2BA095E.4, ecx.4 ; 433356 u= d=ecx.4\r\n226. 4 setb eax.4, edx.4, cf.1 ; 43335B u=rax.8 d=cf.1\r\n226. 5 seto eax.4, edx.4, of.1 ; 43335B u=rax.8 d=of.1\r\n226. 6 setz eax.4, edx.4, zf.1 ; 43335B u=rax.8 d=zf.1\r\n226. 7 setp eax.4, edx.4, pf.1 ; 43335B u=rax.8 d=pf.1\r\n226. 8 sets (eax.4-edx.4), sf.1 ; 43335B u=rax.8 d=sf.1\r\n226. 9 jz eax.4, edx.4, @228 ; 43335D u=rax.8\r\n226. 9\r\nI decided to keep everything related to the unflattening in MMAT_LOCOPT , and wait until MMAT_CALLS to begin the opaque\r\npredicate removal. Thus, my plugin operates at two separate microcode maturities. While at the MMAT_CALLS stage, I was\r\nable to create a working signature to detect and find opaque predicate blocks.\r\nThe way this worked was to look for conditional blocks which multipled by a 16 bit constant and then subtracted the\r\nconstant 1 . I store all stack variables that were accessed by these blocks and kept track of their count every time another\r\nblock was detected. I patch each opaque predicate to remove its fake branch depending on if it was jz or jnz . Then, I\r\nused the most common stack variable stored from earlier to find other leftover junk instructions that got interwoven with\r\nother blocks and remove them.\r\nAnd what is the end result?\r\nFully deobfuscated code! Our largest function in the binary went from over 3000 lines of code to 429.\r\nThe great thing about the deobfuscator is that since it works on subsequent versions (until they added control flow\r\nindirection), we can easily track the evolution of the malware. For example, the same large function is actually even larger in\r\na different Lumma binary I found at over 5000 lines of code. In this version, the developers added string encryption as well\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 18 of 19\n\nas implemented the option to choose between two possible execution methods: LoadLibraryW and rundll32 for DLLs.\r\nThis feature is missing in the previous version as seen below:\r\nJourney’s End\r\nIn the end, I ended up deobfuscating probably around 50 opaque’d and flattened functions. This project was one of the\r\nhardest yet most rewarding I’ve ever done. There were many times I felt like I was trying to do something that couldn’t be\r\naccomplished, but I was relentless and would not give up. I have to give a huge thanks to Rolf Rolles for not only\r\ncreating the original plugin project that this was based off of, but also being kind enough to answer my questions\r\nabout Hex-Rays internals. Without his incredible knowledge of the microcode API, I don’t know if I’d ever have been\r\nable to finish this project.\r\nThe next addition to this project would probably be a microcode emulator. Lumma Stealer didn’t require one, but there are\r\nother malware families which likely would. Another feature I’d be interested in implementing is a profile system, maybe\r\nsimilar to what D-810 has although I have not looked at it in detail. All in all, there are endless possibilities for this project\r\nand it will surely come in use for analyzing obfuscated samples in the future. I hope you enjoyed reading this writeup as\r\nmuch as I did writing it, and I can’t wait to publish more in the future!\r\nLumma Stealer Sample SHA256: 00F1A9C6185B346F8FDF03E7928FACFC44FC63E6A847EB21FA0ECD7FB94BB7E3\r\nLumma Stealer Sample #2 SHA256:\r\nECABBEAE6218B373F2D3A473D9F6ADD4BA5482EA3B97851C931197FB8993F8EF\r\nSource: https://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nhttps://ryan-weil.github.io/posts/LUMMA-STEALER/\r\nPage 19 of 19",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"origins": [
		"web"
	],
	"references": [
		"https://ryan-weil.github.io/posts/LUMMA-STEALER/"
	],
	"report_names": [
		"LUMMA-STEALER"
	],
	"threat_actors": [],
	"ts_created_at": 1775434925,
	"ts_updated_at": 1775826704,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/4a0a71af45657599716252251a03b3e7008bceeb.pdf",
		"text": "https://archive.orkl.eu/4a0a71af45657599716252251a03b3e7008bceeb.txt",
		"img": "https://archive.orkl.eu/4a0a71af45657599716252251a03b3e7008bceeb.jpg"
	}
}