{
	"id": "1cb01881-3ee9-41d5-baef-e4279533efdb",
	"created_at": "2026-04-06T00:21:22.585213Z",
	"updated_at": "2026-04-10T13:12:52.302684Z",
	"deleted_at": null,
	"sha1_hash": "fe22a42ecbe7b533d933a44b645675e209020471",
	"title": "Resolving Stack Strings with Emulation | 0ffset Training Solutions",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 1549652,
	"plain_text": "Resolving Stack Strings with Emulation | 0ffset Training Solutions\r\nBy 0verfl0w_\r\nPublished: 2024-04-10 · Archived: 2026-04-05 14:01:03 UTC\r\nIt’s not uncommon to come across some kind of string encryption functionality within malware samples, often\r\nmore complex than a simple single-byte XOR operation which can often be brute-forced with simplicity.\r\nBy encrypting strings, malware authors are able to potentially lower the detection rate by anti-malware software,\r\nobscuring strings that may be identified as “malicious”, such as strings indicating malicious functionality, registry\r\nkeys or file paths linked to malicious activity, and so on. Additionally, encrypting strings can go a long way to\r\nslowing down malware analysts, by requiring them to decrypt the strings before properly analysing and building\r\nout detection rules for the sample.\r\nNow building out tools for string decryption from the analyst side can be quite simple once you’ve written a few –\r\nespecially if you’re decrypting the strings within a disassembler that provides an API:\r\nFind cross-references to the string decryption function\r\nFind the encrypted string pushed as a parameter to the function\r\nDecrypt the encrypted string within Python (replicate the algorithm first)\r\nAdd a comment, or overwrite the encrypted string\r\nThe difficulty increases when the algorithm is somewhat “polymorphic”, in that each instance of the decryption\r\nroutine is slightly different. Additionally, if the string is built dynamically compared to being hardcoded, it\r\nrequires the analyst to locate the pieces, and put them together prior to decryption.\r\nThis is what we’re going to be looking at in this post, and in a second follow-on part later on.\r\nWe’re going to be looking at a sample of malware that has a slightly different decryption routine for each string,\r\nand also builds the encrypted string on the stack during execution, prior to decryption. The specific sample is a\r\nversion of Conti ransomware.\r\nHowever, rather than build a string decryptor that relies on IDA Python, we’re going to be building a standalone\r\ndecryptor that uses the Capstone Disassembler and Unicorn Emulation Framework.\r\nWhy?\r\nWell lets take Emotet as an example. In the later iteration of the malware family, when it was still active, it stored\r\na list of roughly 64 C2 server IPs as obfuscated values, built on the stack prior to decryption. Each C2 server was\r\ndeobfuscated within it’s own function, and each function was added to an array. When Emotet needed to retrieve a\r\nC2, it would pick a random function from the array, and execute it to deobfuscate an IP. If the retrieved C2 was\r\nonline, Emotet would continue interacting with it. Otherwise, it would pick another one until it located an active\r\nserver.\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 1 of 14\n\nThis meant a sandbox would only identify a few outgoing connections to IP addresses, rather than all 64 IPs being\r\nobserved. This is where manual extraction would be required, and in most cases you would need to emulate the 64\r\ndeobfuscation functions in order to avoid replicating all of them. Within something like IDA, you could leverage\r\nFLARE-EMU to do such a thing, while harnessing IDA’s disassembly/decompilation capabilities.\r\nThis is great for 1 or 2 samples, but if you want to extract C2 lists from 100+ Emotet samples, you’ll want to\r\navoid opening each in IDA, running a script, and taking the output. You could opt to run IDA without the UI\r\n(headless), but if:\r\nyou don’t own IDA Pro, or\r\nyou’re using a system that is running on an OS you don’t have an IDA license for, or\r\nyou want to avoid relying on external platforms\r\nyou’d want to use something else – ideally Capstone, and Unicorn.\r\nCapstone \u0026 Unicorn\r\nCapstone Disassembler:\r\nFrom Capstone’s website:\r\nCapstone is a lightweight multi-platform, multi-architecture disassembly framework.\r\nOur target is to make Capstone the ultimate disassembly engine for binary analysis and reversing in the security\r\ncommunity.\r\nhttps://www.capstone-engine.org/\r\nCapstone allows you to disassemble hex bytes to assembly representations – think IDA, but without all of the UI\r\nand nice additional features – Capstone just provides the raw information it identified based on the bytes you\r\npassed to it.\r\nUnicorn Emulation Framework:\r\nUnicorn on the other hand could be compared to a debugger, rather than a disassembler:\r\nUnicorn is a lightweight multi-platform, multi-architecture CPU emulator framework.\r\nhttps://www.unicorn-engine.org/\r\nIf you want to execute a snippet of assembly code within Python, Unicorn should be the go-to. Qiling is a good\r\noption, but is more focused on higher level emulation such as executing binary files:\r\nUnicorn is just a CPU emulator, so it focuses on emulating CPU instructions, that can understand emulator\r\nmemory. Beyond that, Unicorn is not aware of higher level concepts, such as dynamic libraries, system calls, I/O\r\nhandling or executable formats like PE, MachO or ELF. As a result, Unicorn can only emulate raw machine\r\ninstructions, without Operating System (OS) context\r\nQiling is designed as a higher level framework, that leverages Unicorn to emulate CPU instructions, but can\r\nunderstand OS: it has executable format loaders (for PE, MachO \u0026 ELF at the moment), dynamic linkers (so we\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 2 of 14\n\ncan load \u0026 relocate shared libraries), syscall \u0026 IO handlers. For this reason, Qiling can run executable binary\r\nwithout requiring its native OS\r\nhttps://github.com/qilingframework/qiling\r\nAdditionally, it’s built on Unicorn, so for full control over basic assembly we’ll do just fine with that.\r\nSo, lets get started!\r\nAnalysis\r\nTL;DR\r\nSample Hash\r\nThe Script\r\nIdentifying the Encryption Routines\r\nThe first part of any string decryption automation process is of course identifying where the string decryption\r\noccurs. For stack string decryption this can be slightly more complex, as you can’t search through the data\r\nsections to find possibly encrypted strings and work backwards from there.\r\nInstead, we’ll need to walk through the code, searching for a lot of MOV instructions where data is being moved\r\ninto stack addresses.\r\nLuckily with Conti it’s very simple to locate this, as we can find it within the WinMain function.\r\nIf we decompile this block, it’s also quickly identifiable within the decompilation as a series of value assignment\r\noperations. IDA also attempts to parse some of the stack string bytes, so if each byte was an ASCII character, it\r\nwould likely display the entire string for you rather than breaking it up into what we see below.\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 3 of 14\n\nNow we’ve identified where a stack string is built, how do we find the decryption routine?\r\nAgain, this is simple enough in Conti as it is right below, however if that wasn’t the case you’d want to search for\r\nXREFs within the function to the var_XX variable that the first/last byte in the stack string was moved into. In this\r\ncase, we’d look for var_53, and if it wasn’t referenced at all, we’d search for var_3A.\r\nLooking at the decryption code, it is pretty basic in that it is a single block with 9 main instructions excluding the\r\ninstructions to copy and overwrite bytes from the encrypted string, and the counter code.\r\n.text:0041DB70 mov al, [esp+esi+68h+var_53]\r\n.text:0041DB74 mov ecx, 31h ; '1'\r\n.text:0041DB79 movzx eax, al\r\n.text:0041DB7C sub ecx, eax\r\n.text:0041DB7E imul eax, ecx, 17h\r\n.text:0041DB81 cdq\r\n.text:0041DB82 idiv edi\r\n.text:0041DB84 lea eax, [edx+7Fh]\r\n.text:0041DB87 cdq\r\n.text:0041DB88 idiv edi\r\n.text:0041DB8A mov [esp+esi+68h+var_53], dl\r\n.text:0041DB8E inc esi\r\n.text:0041DB8F cmp esi, 1Ah\r\n.text:0041DB92 jb short loc_41DB70\r\nThe above can be summarised as follows:\r\nMove encrypted byte into AL\r\nSubtract encrypted byte value from 0x31\r\nMultiple result of above with 0x17, stored in EAX\r\nSigned division of EDX:EAX, with value in EDI\r\nAddition of remainder in EDX with 0x7F, stored in EAX\r\nSigned division of EDX:EAX, with value in EDI\r\nOverwrite encrypted byte with value in DL\r\nIncrement ESI\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 4 of 14\n\nCompare ESI against hardcoded string size\r\nWe can build the following snippet which we can execute in Python and see the output:\r\na = string[i]\r\nstring[i] = ((((0x31 - a) * 0x17) % 0x7F) + 0x7F) % 0x7F)\r\nLooking at the IDA decompilation, it’s pretty much identical:\r\nfor ( i = 0; i \u003c 0x1A; ++i )\r\n v43[i] = (23 * (49 - (unsigned __int8)v43[i]) % 127 + 127) % 127;\r\nNow it would be great if we could just use this to decrypt all the strings, but if we have a look at another example,\r\nwe get the following decompilation:\r\nfor ( i = 0; i \u003c 0x1A; ++i )\r\n v13[i] = (24 * (78 - (unsigned __int8)v13[i]) % 127 + 127) % 127;\r\nAs you can see 23 is now 24, and 49 is now 78.\r\nOk, so what if we simply extract those values? We can write a regex that will set those integers as wildcards, and\r\nwe can then capture those and use them to modify the algorithm?\r\nIf we take a look at the assembly code for this snippet you may notice a difference or two:\r\n.text:00420FA1 mov al, [ebp+esi+var_24B]\r\n.text:00420FA8 mov ecx, 4Eh ; 'N'\r\n.text:00420FAD movzx eax, al\r\n.text:00420FB0 sub ecx, eax\r\n.text:00420FB2 lea eax, [ecx+ecx*2]\r\n.text:00420FB5 shl eax, 3\r\n.text:00420FB8 cdq\r\n.text:00420FB9 idiv ebx\r\n.text:00420FBB lea eax, [edx+7Fh]\r\n.text:00420FBE cdq\r\n.text:00420FBF idiv ebx\r\n.text:00420FC1 mov [ebp+esi+var_24B], dl\r\n.text:00420FC8 inc esi\r\n.text:00420FC9 cmp esi, 1Ah\r\n.text:00420FCC jb short loc_420FA1\r\n.text:00420FCE mov ebx, [ebp+var_250]\r\nInstead of an IMUL operation, we have a LEA call followed by a shift-left – so where did the 24 come from?\r\nWell, IDA basically simplified the algorithm, converting the LEA \u0026 SHL operations to an imul of 24. The issue\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 5 of 14\n\nwe’ve got is we’ve now got to handle different methods of adding, subtracting, multiplying, etc.\r\nWhich is why we’re going to be using Unicorn for emulation, rather than completely relying on Capstone to\r\nstatically disassemble.\r\nNow there is one last thing to take note of – Conti also has stack strings that are passed to a function and\r\ndecrypted; we won’t be looking at these in this part, but will be in the next part. The reason being is we will need\r\nto find cross-references for decryption functions, and then step back to find the stack string that is built. As we\r\naren’t able to easily find cross-references with capstone, we’ll need to leverage another tool such as r2pipe, but\r\nagain that is something for the next post!\r\nFor now, let’s start with building out our automation script, by developing our first regular expression!\r\nBuilding the “Finder” Regular Expression\r\nWhen it comes to building regular expressions for byte patterns, I’ll usually fall on IDA for identifying patterns\r\nthat could be used to accurately locate the correct blocks. Going to Search -\u003e Sequence of Bytes… we can use ??\r\nas wildcards for bytes that may change per iteration. For example, in the code below we can see that ESI is a\r\ncounter and is being compared against 0x1A.\r\n.text:00420FC8 46 inc esi\r\n.text:00420FC9 83 FE 1A cmp esi, 1Ah\r\n.text:00420FCC 72 D3 jb short loc_420FA1\r\nFor identifying similar blocks where the length may change, we can use search for the following sequence of\r\nbytes within IDA\r\n46 83 FE ?? 72\r\nFrom this query we can find 56 possible matches – which is quite low, and we’re also assuming that the ESI\r\nregister is always used as the counter. From looking through the different string decryption blocks, it looks like AL\r\nis always used in an initial MOV operation, moving the encrypted byte to it. These instructions appear to have one\r\nof two possible structures:\r\n8A 84 3D F1 FE FF FF\r\n8A 44 35 CD\r\nSo not really the best byte pattern given the changes – it might be better to focus on the division instructions,\r\nthough they also pick up the string decryption functions rather than just the inline code.\r\n.text:0041AB20 99 cdq\r\n.text:0041AB21 F7 FB idiv ebx\r\n.text:0041AB23 8D 42 7F lea eax, [edx+7Fh]\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 6 of 14\n\n.text:0041AB26 99 cdq\r\n.text:0041AB27 F7 FB idiv ebx\r\nSo, we’re going to instead work with the initial 8A 44 and 8A 84 instructions, and see how far that gets us!\r\nQuickly converting these thoughts into a script, we get the following:\r\ndata = open(\"file.bin\", \"rb\").read()\r\nrule = re.compile(b\"\\x8A\\x44|\\x8A\\x84\")\r\nmatches = rule.finditer(data)\r\nprint (len(list(matches)))\r\n\u003e\u003e\u003e 103\r\nWhile there are likely more than 103 inline encrypted stack strings, our goal is to build a script that can actually\r\ndecrypt those 103 strings that we’ve found, before we can go and perfect it.\r\nSo! Now we can actually find the decryption routine, let’s start working on walking back to locate the first\r\nencrypted byte moved onto the stack.\r\nLocating the First Stack-Byte\r\nThe way we’re going to approach this is quite hacky, as I’m actually not sure if there is a way to iterate over a\r\nblock of code in reverse within Capstone. As a result, we’re basically going to decompile the block, store it as a\r\nlist, and reverse the list.\r\nHow exactly are we going to determine the block size? Especially given we’ve got code blocks that are quite\r\nvaried throughout the binary, with certain instances having additional nop instructions, or slightly longer strings,\r\netc.? Well, if the last part wasn’t hacky, this part will be – we’re just going to take a chunk of ~500 bytes before\r\nthe MOV AL, … instruction, and use that. If we encounter many issues, we can of course increase this size, but\r\nagain, no point perfecting until we have a working script!\r\ndata_block = data[m.start()-500:m.end()]\r\ndisasm_list = list(md.disasm(data_block, 0, len(data_block)))\r\ndisasm_list = reversed(disasm_list)\r\nWe’re also going to want to get the stack offset of the variable referenced in the MOV instruction we’ve just\r\nfound, for cross referencing later on. In order to do this, we can run the following:\r\nstring = data[m.start():m.end()]\r\ndisasm = list(md.disasm(string, 0, len(string)))\r\noffset = disasm[0].operands[1].value.mem.disp\r\nIn the instruction below, the code above would give us the offset -87:\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 7 of 14\n\nmov al, [ebp+esi-87]\r\nNow we have the reversed list of instructions and the offset of the first encrypted byte, we want to start narrowing\r\ndown the size of the data block. We don’t want to emulate all 500 bytes, as 450 may be irrelevant and cause\r\nfurther issues.\r\nTo do that, we’re going to loop through the reversed list of instructions, and check for any instruction that has the\r\nMOV mnemonic, and moves data into memory (the stack). If we locate any matching instructions, we then want\r\nto check if the stack offset within the instruction is equal to the stack offset we found earlier. If it is, we have likely\r\nfound the start of the stack building routine!\r\nfor i in disasm_list:\r\n if i.mnemonic == \"mov\" and i.operands[0].type == 3:\r\n if i.operands[0].value.mem.disp == offset:\r\n print (\"found stack string creation address \u003e \", hex(i.address))\r\nBefore we continue, lets give this a run:\r\nAs you can see, we had an error immediately. This is due to the m.start():m.end() line, as we’re only capturing 2\r\nbytes from the binary at this point. Let’s update the regex to search for 2-3 bytes of additional data for the 8A 44\r\npattern, and 2-6 bytes for the 8A 84 pattern.\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 8 of 14\n\nPerfect!\r\nNow we want to make sure we get rid of all the data we don’t need before the stack string creation address:\r\nsmaller_block = data[\r\n ( ( m.start() - 500 ) + i.address ) : m.end() + 75\r\n]\r\nThe best way to check if this worked as expected is to quickly disassemble it with CyberChef to see if the MOV\r\ninstruction appears at the beginning as expected.\r\nNow you may be wondering, why did we add 75 to the end? Well, currently we only have the bytes up until the\r\nstart of the decryption routine – not the routine itself, which doesn’t really help us. So, we now want to iterate\r\nthrough the smaller data block and search for a “JB” instruction, which is going to allow the decryption code to\r\ndecrypt more than 1 byte of data. This is a simple enough addition:\r\nfor y in md.disasm(smaller_block, 0, len(smaller_block)):\r\n if y.mnemonic == \"jb\":\r\n print (\"found end of loop \u003e %08x\" % y.address)\r\n smaller_block = smaller_block[:y.address+y.size]\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 9 of 14\n\nprint (binascii.hexlify(smaller_block))\r\n break\r\nPutting this together and running it, and then double checking in CyberChef, you should be able to see that so far\r\nit’s running quite well!\r\nWith that, we now have extract the entire block of code needed to decrypt the strings, and I think we’re just about\r\nready to start emulating! Before doing so, we just need to fine tune a few more things…\r\nPrepare for Emulation\r\nOne key thing to take note of is the division elements of each string decryption routine – the value 0x7F is used\r\nacross the sample, in every instance. Additionally, IDIV requires it’s first operand to be a register, not a raw value,\r\nso you’ll only come across IDIV EBX, IDIV ECX, etc. The issue we’ll have if we start emulating right now is\r\nsome of our blocks don’t include the instruction where 0x7F is moved into the register used for the IDIV\r\noperation. Without that initialisation, you’ll likely get this error a lot:\r\nUnhandled CPU exception (UC_ERR_EXCEPTION)\r\nSo with that in mind, let’s build out our emulation code!\r\nI usually base a lot of my emulators off of Unicorn’s own sample code, so we can get setup quite quickly:\r\nfrom unicorn import *\r\nfrom unicorn.x86_const import *\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 10 of 14\n\ndef unicorn_block(block):\r\n # init unicorn\r\n mu = Uc(UC_ARCH_X86, UC_MODE_32)\r\n # setup memory and write code to memory\r\n ADDRESS = 0x1000000\r\n mu.mem_map(ADDRESS, 4 * 1024 * 1024)\r\n mu.mem_write(ADDRESS, block)\r\n # set these registers to 0x7f\r\n mu.reg_write(UC_X86_REG_ESI, 0x7f)\r\n mu.reg_write(UC_X86_REG_EDI, 0x7f)\r\n mu.reg_write(UC_X86_REG_EBX, 0x7f)\r\n mu.reg_write(UC_X86_REG_ECX, 0x7f)\r\n # setup stack and base pointer\r\n mu.reg_write(UC_X86_REG_ESP, ADDRESS + 0x100000)\r\n mu.reg_write(UC_X86_REG_EBP, ADDRESS + 0x200000)\r\n # emulate machine code in infinite time\r\n try:\r\n mu.emu_start(ADDRESS, ADDRESS + len(block))\r\n except Exception as E:\r\n print (E)\r\n # read value in EBP (pointer to base)\r\n ebp = mu.reg_read(UC_X86_REG_EBP)\r\n # read memory from EBP - 0x100000, and strip \\x00 bytes by decoding\r\n data = mu.mem_read(ebp - 0x100000, 0x100000).decode(\"utf-8\")\r\n print (data)\r\n return\r\nWith that sorted, let’s go ahead and run it!\r\nEmulate!\r\nAs you can see, it looks like it works quite well!\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 11 of 14\n\nHowever, if we recall back to the initial regex building, we’d identified a total of 103 matches – yet we’ll only\r\nhave 75 strings; where did the remaining 28 strings go?? Let’s do some debugging.\r\nAdding a few checks and flags, we manage to get our first error prone block:\r\nffffff20c6855cffffff4cc6855dffffff7dc6855effffff23c6855fffffff40c68560ffffff40c68561ffffff758a8555ffffffe8e6d5f\r\nDisassembling this, we get the following:\r\nSo as you can see, it appears our 500 byte chunk cut halfway through a MOV operation, and as a result the first\r\ntwo instructions are invalid – resulting in the failed disassembly. So, lets add in some code to avoid this error.\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 12 of 14\n\nloop = 1\r\nwhile not disasm_list or len(disasm_list) \u003c 10:\r\n data_block = data[ ( m.start() - 500 ) + loop - 1 : m.end() ]\r\n disasm_list = list(md.disasm(data_block, 0, len(data_block)))\r\n loop += 1\r\nHere we’re basically checking to see if the disasm_list variable has more than 10 elements (AKA instructions), to\r\navoid situations where perhaps the instruction at offset 8 might be corrupted.\r\nAnd now after a few tweaks we’ve got 97 extracted strings! Regarding the remaining 6, it’s possible our rule\r\npicked up some false positives, or the string was even larger (the ransom note is encrypted in the binary, so it’s\r\nvery likely that caused our issue, as we’re only reading 500 bytes). So if you’re up for a challenge, I’d say take a\r\nlook at the current script (or develop your own of course!), see where problems might be caused, and see if you\r\ncan decrypt 100% of the inline strings within the sample 😀\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 13 of 14\n\nFor now, that is pretty much it for this post – of course, as mentioned, Conti is an older ransomware sample and\r\nbuilding a tool like this doesn’t make a huge amount of sense as there isn’t a need to decrypt strings en-masse\r\nacross thousands of Conti samples (unless you’re focused on ransom notes and performing linguistic analysis),\r\nthough the goal was to introduce Capstone \u0026 Unicorn, and how it can be used without relying on external tools\r\nduring the automation period.\r\nWith that, you can find the script here, and in the next part we’ll be looking at leveraging r2pipe, Unicorn, and\r\nCapstone to extract not only inline strings, but also encrypted stack strings passed to a function!\r\nSource: https://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nhttps://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/\r\nPage 14 of 14",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"origins": [
		"web"
	],
	"references": [
		"https://www.0ffset.net/reverse-engineering/capstone-resolving-stack-strings/"
	],
	"report_names": [
		"capstone-resolving-stack-strings"
	],
	"threat_actors": [],
	"ts_created_at": 1775434882,
	"ts_updated_at": 1775826772,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/fe22a42ecbe7b533d933a44b645675e209020471.pdf",
		"text": "https://archive.orkl.eu/fe22a42ecbe7b533d933a44b645675e209020471.txt",
		"img": "https://archive.orkl.eu/fe22a42ecbe7b533d933a44b645675e209020471.jpg"
	}
}