# Analyzing Emotet with Ghidra — Part 1 **[medium.com/@0xd0cf11e/analyzing-emotet-with-ghidra-part-1-4da71a5c8d69](https://medium.com/@0xd0cf11e/analyzing-emotet-with-ghidra-part-1-4da71a5c8d69)** Cafe Babe April 22, 2019 [Cafe Babe](https://medium.com/@0xd0cf11e?source=post_page-----4da71a5c8d69--------------------------------) Apr 19, 2019 6 min read This post I’ll show how I used Ghidra in analyzing a recent sample of Emotet. If you have read this, here is [Part 2.](https://medium.com/@0xd0cf11e/analyzing-emotet-with-ghidra-part-2-9efbea374b14) **SHA256:** The analysis is done on the unpacked binary file. In this post I’m skipping how I unpacked the file, since what I primarily want to show is how I used Ghidra’s python scripting manager to decrypt strings and API calls. Some short descriptions: _What is Ghidra?_ It is an open source reverse engineering tool suite. You can find out more here — _Why Emotet?_ Emotet is a prevalent malware. Started out as a banking trojan. It is persistent and keeps evolving its infection mechanisms. There are other existing analyses done. A search can lead you there — _Why Ghidra and Emotet?_ For starters, I am looking for a new gig (a.k.a unemployed) and hence cannot afford an . Plus I want to continue being a Malware Analyst. Using the free version is still amazing, but I miss not being able to use IDA Python. I did use IDA’s own scripting language IDC but…I like python. Implemented just one of the functions of Emotet . ----- ## Opening up Emotet with Ghidra Ghidra is about creating projects. Following the on-screen instructions, I created a project named “Emotet”. To add files to analyze into the project, simple type or go to . 1. Imported Emotet binary Ghidra displays properties regarding the file that gets imported. Double click on the file name and it opens it up in CodeBrowser which is a tool that disassembles the file. ----- 2. Emotet view in CodeBrowser Under the Symbol Tree (usually on the left or you can go to ), I filtered for “entry” to get to the binary’s entry point. 3. Entry Point of Emotet Under Listing we see the compiled code and on the right is its decompiled code. Since I’ve already analyzed these binaries, some of the sub routine calls and offsets in these images will have been renamed by me. To rename an offset, right-click an offset value and select (or type ). ----- ## Emotet s Function Calls Emotet encrypts its strings and stores its API call names as hashes. So statically viewing this file, is a pain to read. Without going into much detail about Emotet’s payload (that would require another blog entry), I will show how to make this binary a bit more easy to follow. It does require to initially go through each function and figure out the math (possibly using, or whichever debugger so to make it a little less painful). In this case I wanted to figure 2 methods used by Emotet. The first function is a simple xor routine that it uses to decrypt strings. It looked deceiving complex (because of the use of shift operators in the function), only till after running one iteration in that I realized what was happening… . The second function finds which API name matches which hash (I will cover this in [Part 2). This I felt was a bit more clever, but still easy to understand after running in .](https://medium.com/@0xd0cf11e/analyzing-emotet-with-ghidra-part-2-9efbea374b14) Then using Ghidra’s Script Manager, I’ll show how I implemented the python scripts to decrypt the strings and resolve the API calls used in the binary. ## How are the Strings encrypted? In the binary, I’ve noticed a lot of references to the function call at . This call decrypts for the strings. I renamed it to . To find references made to the function, right click the function and select . ----- 4. References to decode_strings 5. Call being made to decode_strings The function takes in 2 arguments that are stored in and (Image 5). is the offset of the encrypted string. is the xor key. The decrypted string gets stored in memory allocated in the heap and the address gets passed to . (Side Track: I have added the string “ecx = offset \n edx = key” as a repeatable comment to the function. Right click the address and select or type ) The first dword at the offset xor’ed with the key returned the length of the string. The next subsequent set of dwords were xor’ed up until the string’s length. Now for the more exiting part, automating this with a python script in Ghidra. ----- ## Using Python to Automate Decryption 6. Script Manager Icon In the top toolbar section of Ghidra, we see this icon in image 6. It takes us to the Script Manager. Else you can select . 7. Script manager The Script Manager displays a list of scripts written in either Java or Python. They come with the installation. The script manager also has some python script examples. So, I filtered for .py scripts to help me understand how to proceed in writing a python script. The Python Interpreter interacts with Ghidra’s Java API through [Jython. The documentation on the Java](https://www.jython.org/archive/21/docs/whatis.html) APIs provided can be found in a zipped file in the docs directory of your Ghidra installation. 8. Create new script icon To create a new python script, select this icon — image 8. Select Python and enter a name you’d like to give to your script. ----- 8. A sample test.py script created Additionally, going through the help docs (under )and reading under, there is a description of the metadata tags that gets generated when creating a new script. I’ve uploaded the script into my github repo and you can follow it here — [https://github.com/0xd0cf11e/ghidra/blob/master/ghidra_emotet_decode_strings.py](https://github.com/0xd0cf11e/ghidra/blob/master/ghidra_emotet_decode_strings.py) 9. Decrypted string displayed as comment The idea behind the script is to display the strings that get decrypted as comments next to the instruction where its offset is moved to (Image 9). 10. Bytes patched in the binary. And as well to patch the bytes in the binary (Image 10). ----- First step, I wanted to find all the code references made to the function. Iterating through each reference, the next step was locating for the opcode instructions and . The instructions weren’t always immediately before the call to the function. So I iterated through a max of 100 instructions to search for the opcodes. After that I was all set to carry out the xor routine and patch the bytes and comment at the instruction offset where was carried out. -----