{
	"id": "3aef0f5c-17bd-4731-a95a-386844c3a040",
	"created_at": "2026-04-06T00:21:18.34804Z",
	"updated_at": "2026-04-10T03:20:21.706197Z",
	"deleted_at": null,
	"sha1_hash": "65ac25fa2f09838621715217446f4f902a50a44d",
	"title": "GitHub - cecio/EMOTET-2020-Reversing: a State-Machine reversing exercise",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 660494,
	"plain_text": "GitHub - cecio/EMOTET-2020-Reversing: a State-Machine\r\nreversing exercise\r\nBy cecio\r\nArchived: 2026-04-05 18:20:56 UTC\r\nIntro\r\nAround the 20th of December 2020, there was one of the \"usual\" EMOTET email campaign hitting several\r\ncountries. I had the possibility to get some sample and I decided to make this little analysis, to deep dive some\r\nspecific aspects of the malware itself.\r\nIn particular I had a look to how the malware has been written, with an analysis of the interesting techniques used.\r\nThere is a very good analysis done by Fortinet in 2019, where the also the first stage has been analyzed. My\r\nexercise is more focused on the second stage on a recent sample.\r\nIn this repository you will find all the DLLs, scripts and tools used for the analysis, with the annotated Ghidra\r\nproject file, with all the mapping to my findings (API calls, program logic, etc). You can use this as starting point\r\nfor additional investigation on it. Enjoy ;-)\r\nThe Tools\r\nFireEye Speakeasy\r\nGhidra\r\nx64dbg\r\nPE Bear\r\ntime :-)\r\nThe infection chain\r\nEMOTET is usually spread by using e-mail campaign (in this case in Italian language)\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 1 of 13\n\nThis particular sample is coming from what we can call the usual infection chain:\r\n1. delivery of an e-mail with a malicious zipped document\r\n2. once opened, the document runs an obfuscated powershell script and downloads the 2nd stage\r\n3. the 2nd stage (in form of a DLL) is then executed\r\n4. the 2nd stage establish some persistence and try to connect a C2\r\nThe initial triage\r\nAll the files used for this analysis are in the repository. The \"dangerous\" ones are password protected (with the\r\nusual pwd).\r\nThe DLL ( sg.dll ) has the following characteristics:\r\nFile Name: sg.dll\r\nSize: 340480\r\nSHA1: b08e07b1d91f8724381e765d695601ea785d8276\r\nThis DLL exports a single function named RunDLL : once executed, it decrypts \"in-memory\" an additional DLL.\r\nThis one, dumped as dump_1_0418.bin , is the target of my analysis:\r\nFile Name: dump_1_0418.bin\r\nSize: 122880\r\nSHA1: 57cd8eac09714effa7b6f70b34039bbace4a3e23\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 2 of 13\n\nAn initial overview of the dumped DLL, shows immediately that we don't have any string visible in it, no imports\r\nand a first look to the disassembly shows a heavily obfuscated code. We need to do some work here.\r\nI fired up Ghidra and started to snoop around. Starting from the only exported function RunDLL you quickly end\r\nup to FUN_10009716 where you can spot a main loop with a kind of \"State-Machine\":\r\nIt looks like that a given double-word (stored in ECX ) is controlling what the program is doing. But this looks\r\nconvoluted and not very easy to unroll, since nothing is really in clear. For example, if you try to isolate the library\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 3 of 13\n\nAPI call in x64dbg, you will face something like this:\r\nEvery single API call is done in this way: there is a bunch of MOV, XOR, SHIFT and PUSH followed by a call to\r\nxxx606F (first red box), which decode in EAX the address of the function (called by the second red box). The\r\nnumber of PUSH just before the CALL EAX are the parameters, which could be worth to inspect.\r\nThe same \"state\" approach is also used in several sub-functions, not only in the main loop. So, everything looks\r\ntime consuming, and I'd like to find a way to get the high level picture of it.\r\nSpeakeasy\r\nThis tool is a little gem: Speakeasy can emulate the execution of user and kernel mode malware, allowing you to\r\ninteract with the emulated code by using quick Python scripts. What I'd like to do was to map every single state of\r\nthe machine ( ECX value of the main loop), to something more meaningful, like DLL API calls.\r\nI had to work a bit to get what I wanted:\r\nthe emulation was failing in more than one point, with some invalid read. I investigated a bit the reason,\r\nand I saw that sometimes the CALL EAX done in some location was not valid ( EAX set to 0). I decided to\r\nget the easy way and just skip these calls\r\nI had to modify the call to a specific API ( CryptStringToBinary )\r\nI mapped the machine state\r\nadded a --state switch to control the flow of the emulation. You can use it to explore all the states (ex.\r\n--state 0x167196bc ). You may encounter errors if needed parts are not initialized, but you can\r\nreconstruct the proper flow by looking at the Ghidra decompilation\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 4 of 13\n\nin a second iteration, knowing where strings are decrypted, I added a dump of all the strings in clear (see\r\nfollowing sections)\r\nThen the execution of the final script ( python emu_emotetdll.py -f sg.dll ) gave me something very\r\ninteresting. The list of the imported DLLs (with related addresses):\r\n0x10017a4c: 'kernel32.LoadLibraryW(\"advapi32.dll\")' -\u003e 0x78000000\r\n0x10017a4c: 'kernel32.LoadLibraryW(\"crypt32.dll\")' -\u003e 0x58000000\r\n0x10017a4c: 'kernel32.LoadLibraryW(\"shell32.dll\")' -\u003e 0x69000000\r\n0x10017a4c: 'kernel32.LoadLibraryW(\"shlwapi.dll\")' -\u003e 0x67000000\r\n0x10017a4c: 'kernel32.LoadLibraryW(\"urlmon.dll\")' -\u003e 0x54500000\r\n0x10017a4c: 'kernel32.LoadLibraryW(\"userenv.dll\")' -\u003e 0x76500000\r\n0x10017a4c: 'kernel32.LoadLibraryW(\"wininet.dll\")' -\u003e 0x7bc00000\r\n0x10017a4c: 'kernel32.LoadLibraryW(\"wtsapi32.dll\")' -\u003e 0x63000000\r\n...\r\nand a lot of API calls, mapped to the machine state:\r\n[+] State: 1de2d3e5\r\n0x10010ba0: 'kernel32.GetProcessHeap()' -\u003e 0x7280\r\n0x10018080: 'kernel32.HeapAlloc(0x7280, 0x8, 0x4c)' -\u003e 0x72a0\r\n[+] State: 5c80354\r\n0x10010ba0: 'kernel32.GetProcessHeap()' -\u003e 0x7280\r\n0x10018080: 'kernel32.HeapAlloc(0x7280, 0x8, 0x20)' -\u003e 0x72f0\r\n0x10017a4c: 'kernel32.LoadLibraryW(\"advapi32.dll\")' -\u003e 0x78000000\r\n0x10010ba0: 'kernel32.GetProcessHeap()' -\u003e 0x7280\r\n0x10014b3a: 'kernel32.HeapFree(0x7280, 0x0, 0x72f0)' -\u003e 0x1\r\n0x10010ba0: 'kernel32.GetProcessHeap()' -\u003e 0x7280\r\n...\r\nThis list was not complete (because I skipped on purpose some failing calls and probably some calls were not\r\ncorrectly intercepted), but it gave me an overall picture of what was going on. Thanks FireEye!\r\nMapping\r\nWith the help of Speakeasy output and a combination of dynamic and static analysis (done with x64gdb and\r\nGhidra), I was able to reconstruct the main flows of the Malware. Consider that these flows are not complete,\r\nthey are high level snapshot of what is going on for some (not all) the \"states\". I'm sure something is missing. This\r\nis the \"main\" flow\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 5 of 13\n\nThen we have the \"Persistency\" flow (the yellow boxes are the interesting ones):\r\nAnd the initial \"C2\" communication flow:\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 6 of 13\n\nNot all the states were explored. I focused on persistence and initial C2. The great thing of this approach is that\r\nyou can now alter the execution flow, by setting the ECX value you want to explore or execute.\r\nI added a lot of details in the Ghidra file, by renaming the API calls and inserting comments. Every number\r\nreported in the graphs (ex 19a) are in the comments, so you can easily track the code section.\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 7 of 13\n\nI renamed the functions with this standard:\r\na single underscore in front of API calls\r\na double underscore in front of internal function calls\r\nInteresting findings: encrypted strings\r\nAll the strings are encrypted in a BLOB, located, in this particular dumped sample, at 0x1C800\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 8 of 13\n\nThe green box is the XOR key and the yellow one is the length of the string. The function used to perform the\r\ndecryption is the __decrypt_buffer_string_FUN_10006aba and __decrypt_headers_footer_FUN_100033f4\r\nEvery single string is decrypted and then removed from memory after usage. This is true even for C format\r\nstrings. So you will not find anything in memory if you try to inspect the mapped sections at runtime.\r\nAs said before, I added a specific section in the Speakeasy script to dump those strings.\r\nInteresting findings: list of C2 servers\r\nIP of C2 are dumped form the same BLOB (in this case at 0x1CA00 ) just after the decryption in step 20a .\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 9 of 13\n\nAs stated in Fortinet Analysis, this list is made of IP (green box) and port (yellow box). You can decode the whole\r\nlist if you pass this part of the binary in the following python code:\r\nimport sys\r\nimport struct\r\nb = bytearray(sys.stdin.buffer.read())\r\nfor x in range(0,len(b),8):\r\n print('%u.%u.%u.%u:%u' % (b[x+3],b[x+2],b[x+1],b[x],struct.unpack('\u003cH',bytes(b[x+4:x+6]))[0]))\r\nYou can find the full list extracted in IoC section.\r\nInteresting findings: persistence\r\nThis particular sample obtain persistency by installing a System Service. This campaign deployed different\r\nversions of the DLL using also different techniques: Run Registry Key is one of them.\r\nThe section installing the service is the 20a (state 0x204C3E9E ). The high level steps are the following:\r\ndecrypt the format string %s.%s\r\ngenerates random chars to build the service name (which results in something like xzyw.qwe )\r\nget one random \"Service Description\" from the existing ones, and use it as description of the new service\r\nInteresting findings: encrypted communications with C2\r\nIn section 8a (state 0x1C904052 ) we can spot out the load of a RSA public key\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 10 of 13\n\nAfter this we have a call to CryptGenKey with algo CALG_AES_128 . So it looks that the sample is going to use a\r\nsymmetric key to encrypt communication.\r\nIn section 20a (state 0x386459ce ) we see how the communication is encrypted:\r\nCryptGenKey\r\nCryptEncrypt of the buffer to send, with the previous key\r\nCryptExportKey encrypted with the RSA public key\r\nthe exported and encrypted symmetric key is then prepended to the buffer sent via HTTP\r\nWrap up\r\nThe analysis is far to be complete, there are a lot of unexplored part of the sample. At the end my goal was to\r\nbuild a procedure to make the analysis easier, even for different or future samples, where it would be faster to\r\nunderstand the overall picture.\r\nAppendix: IoC\r\nC2 IP list:\r\n118.38.110.192:80\r\n181.136.190.86:80\r\n167.71.148.58:443\r\n211.215.18.93:8080\r\n1.234.65.61:80\r\n209.236.123.42:8080\r\n187.162.250.23:443\r\n172.245.248.239:8080\r\n60.93.23.51:80\r\n177.144.130.105:443\r\n93.148.247.169:80\r\n177.144.130.105:8080\r\n110.39.162.2:443\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 11 of 13\n\n87.106.46.107:8080\r\n83.169.21.32:7080\r\n191.223.36.170:80\r\n95.76.153.115:80\r\n110.39.160.38:443\r\n45.16.226.117:443\r\n46.43.2.95:8080\r\n201.75.62.86:80\r\n190.114.254.163:8080\r\n12.162.84.2:8080\r\n46.101.58.37:8080\r\n197.232.36.108:80\r\n185.94.252.27:443\r\n70.32.84.74:8080\r\n202.79.24.136:443\r\n2.80.112.146:80\r\n202.134.4.210:7080\r\n105.209.235.113:8080\r\n187.162.248.237:80\r\n190.64.88.186:443\r\n111.67.12.221:8080\r\n5.196.35.138:7080\r\n50.28.51.143:8080\r\n181.30.61.163:443\r\n103.236.179.162:80\r\n81.215.230.173:443\r\n190.251.216.100:80\r\n51.255.165.160:8080\r\n149.202.72.142:7080\r\n192.175.111.212:7080\r\n178.250.54.208:8080\r\n24.232.228.233:80\r\n190.45.24.210:80\r\n45.184.103.73:80\r\n177.85.167.10:80\r\n212.71.237.140:8080\r\n181.120.29.49:80\r\n170.81.48.2:80\r\n68.183.170.114:8080\r\n35.143.99.174:80\r\n217.13.106.14:8080\r\n168.121.4.238:80\r\n172.104.169.32:8080\r\n111.67.12.222:8080\r\n62.84.75.50:80\r\n77.78.196.173:443\r\n177.23.7.151:80\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 12 of 13\n\n213.52.74.198:80\r\n12.163.208.58:80\r\n1.226.84.243:8080\r\n113.163.216.135:80\r\n188.225.32.231:7080\r\n191.182.6.118:80\r\n81.213.175.132:80\r\n104.131.41.185:8080\r\n152.169.22.67:80\r\n185.183.16.47:80\r\n192.232.229.54:7080\r\n186.146.13.184:443\r\n178.211.45.66:8080\r\n122.201.23.45:443\r\n70.32.115.157:8080\r\n190.24.243.186:80\r\n51.15.7.145:80\r\n46.105.114.137:8080\r\n81.214.253.80:443\r\n192.232.229.53:4143\r\n59.148.253.194:8080\r\n191.241.233.198:80\r\n181.61.182.143:80\r\n190.195.129.227:8090\r\n68.183.190.199:8080\r\n138.97.60.140:8080\r\n138.97.60.141:7080\r\n137.74.106.111:7080\r\n85.214.26.7:8080\r\n71.58.233.254:80\r\n94.176.234.118:443\r\n188.135.15.49:80\r\n80.15.100.37:80\r\n82.76.111.249:443\r\n155.186.9.160:80\r\n189.2.177.210:443\r\nSource: https://github.com/cecio/EMOTET-2020-Reversing\r\nhttps://github.com/cecio/EMOTET-2020-Reversing\r\nPage 13 of 13\n\nAs said before, Interesting I added a specific findings: section in the list of C2 servers Speakeasy script to dump those strings.  \nIP of C2 are dumped form the same BLOB (in this case at 0x1CA00 ) just after the decryption in step 20a .\n   Page 9 of 13",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"references": [
		"https://github.com/cecio/EMOTET-2020-Reversing"
	],
	"report_names": [
		"EMOTET-2020-Reversing"
	],
	"threat_actors": [],
	"ts_created_at": 1775434878,
	"ts_updated_at": 1775791221,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/65ac25fa2f09838621715217446f4f902a50a44d.pdf",
		"text": "https://archive.orkl.eu/65ac25fa2f09838621715217446f4f902a50a44d.txt",
		"img": "https://archive.orkl.eu/65ac25fa2f09838621715217446f4f902a50a44d.jpg"
	}
}