{
	"id": "2d208857-3004-4844-ab6d-7f01b6954c02",
	"created_at": "2026-04-06T01:30:17.338396Z",
	"updated_at": "2026-04-10T03:20:25.033716Z",
	"deleted_at": null,
	"sha1_hash": "84684d534f22304ebcda3d9bf7c3a3f9c7a0c258",
	"title": "Analyzing a “multilayer” Maldoc: A Beginner’s Guide",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 27186405,
	"plain_text": "Analyzing a “multilayer” Maldoc: A Beginner’s Guide\r\nBy Didier Stevens\r\nPublished: 2022-04-06 · Archived: 2026-04-06 00:18:43 UTC\r\nIn this blog post, we will not only analyze an interesting malicious document, but we will also demonstrate the\r\nsteps required to get you up and running with the necessary analysis tools. There is also a howto video for this\r\nblog post.\r\nI was asked to help with the analysis of a PDF document containing a DOCX file.\r\nThe PDF is REMMITANCE INVOICE.pdf, and can be found on VirusTotal, MalwareBazaar and Malshare (you\r\ndon’t need a subscription to download from MalwareBazaar or Malshare, so everybody that wants to, can follow\r\nalong).\r\nThe sample is interesting for analysis, because it involves 3 different types of malicious documents.\r\nAnd this blog post will also be different from other maldoc analysis blog posts we have written, because we show\r\nhow to do the analysis on a machine with a pristine OS and without any preinstalled analysis tools.\r\nTo follow along, you just need to be familiar with operating systems and their command-line interface.\r\nWe start with a Ubuntu LTS 20.0 virtual machine (make sure that it is up-to-date by issuing the “sudo apt update”\r\nand “sudo apt upgrade” commands). We create a folder for the analysis: /home/testuser1/Malware (we usually\r\ncreate a folder per sample, with the current date in the filename, like this: 20220324_twitter_pdf). testuser1 is the\r\naccount we use, you will have another account name.\r\nInside that folder, we copy the malicious sample. To clearly mark the sample as (potentially) malicious, we give it\r\nthe extension .vir. This also prevents accidental launching/execution of the sample. If you want to know more\r\nabout handling malware samples, take a look at this SANS ISC diary entry.\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 1 of 33\n\nFigure 1: The analysis machine with the PDF sample\r\nThe original name of the PDF document is REMMITANCE INVOICE.pdf, and we renamed it to REMMITANCE\r\nINVOICE.pdf.vir.\r\nTo conduct the analysis, we need tools that I develop and maintain. These are free, open-source tools, designed for\r\nstatic analysis of malware. Most of them are written in Python (a free, open-source programming language).\r\nThese tools can be found here and on GitHub.\r\nPDF Analysis\r\nTo analyze a malicious PDF document like this one, we are not opening the PDF document with a PDF reader like\r\nAdobe Reader. In stead, we are using dedicated tools to dissect the document and find malicious code. This is\r\nknown as static analysis.\r\nOpening the malicious PDF document with a reader, and observing its behavior, is known as dynamic analysis.\r\nBoth are popular analysis techniques, and they are often combined. In this blog post, we are performing static\r\nanalysis.\r\nTo install the tools from GitHub on our machine, we issue the following “git clone” command:\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 2 of 33\n\nFigure 2: The “git clone” command fails to execute\r\nAs can be seen, this command fails, because on our pristine machine, git is not yet installed. Ubuntu is helpful and\r\nsuggest the command to execute to install git:\r\nsudo apt install git\r\nFigure 3: Installing git\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 3 of 33\n\nFigure 4: Installing git\r\nWhen the DidierStevensSuite repository has been cloned, we will find a folder DidierStevensSuite in our working\r\nfolder:\r\nFigure 5: Folder DidierStevensSuite is the result of the clone command\r\nWith this repository of tools, we have different maldoc analysis tools at our disposal. Like PDF analysis tools.\r\npdfid.py and pdf-parser.py are two PDF analysis tools found in Didier Stevens’ Suite. pdfid is a simple triage tool,\r\nthat looks for known keywords inside the PDF file, that are regularly associated with malicious activity. pdf-parser.py is able to parse a PDF file and identify basic building blocks of the PDF language, like objects.\r\nTo run pdfid.py on our Ubuntu machine, we can start the Python interpreter (python3), and give it the pdfid.py\r\nprogram as first parameter, followed by options and parameters specific for pdfid. The first parameter we provide\r\nfor pdfid, is the name of the PDF document to analyze. Like this:\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 4 of 33\n\nFigure 6: pdfid’s analysis report\r\nIn the report provided as output by pdfid, we see a bunch of keywords (first column) and a counter (second\r\ncolumn). This counter simply indicates the frequency of the keyword: how many times does it appear in the\r\nanalyzed PDF document?\r\nAs you can see, many counters are zero: keywords with zero counter do not appear in the analyzed PDF\r\ndocument. To make the report shorter, we can use option -n. This option excludes zero counters (n = no zeroes)\r\nfrom the report, like this:\r\nFigure 7: pdfid’s condensed analysis report\r\nThe keywords that interest us the most, are the ones after the /Page keyword.\r\nKeyword /EmbeddedFile means that the PDF contains an embedded file. This feature can be used for benign and\r\nmalicious purposes. So we need to look into it.\r\nKeyword /OpenAction means that the PDF reader should do something automatically, when the document is\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 5 of 33\n\nopened. Like launching a script.\r\nKeyword /ObjStm means that there are stream objects inside the PDF document. Stream objects are special\r\nobjects, that contain other objects. These contained objects are compressed. pdfid is in nature a simple tool, that is\r\nnot able to recognize and handle compressed data. This has to be done with pdf-parser.py. Whenever you see\r\nstream objects in pdfid’s report (e.g., /ObjStm with counter greater than zero), you have to realize that pdfid is\r\nunable to give you a complete report, and that you need to use pdf-parser to get the full picture. This is what we do\r\nwith the following command:\r\nFigure 8: pdf-parser’s statistical report\r\nOption -a is used to have pdf-parser.py produce a report of all the different elements found inside the PDf\r\ndocument, together with keywords like pdfid.py produces.\r\nOption -O is used to instruct pdf-parser to decompress stream objects (/ObjStm) and include the contained objects\r\ninto the statistical report. If this option is omitted, then pdf-parser’s report will be similar to pdfid’s report. To\r\nknow more about this subject, we recommend this blog post.\r\nIn this report, we see again keywords like /EmbeddedFile. 1 is the counter (e.g., there is one embedded file) and\r\n28 is the index of the PDF object for this embedded file.\r\nNew keywords that did appear, are /JS and /JavaScript. They indicate the presence of scripts (code) in the PDF\r\ndocument. The objects that represent these scripts, are found (compressed) inside the stream objects (/ObjStm).\r\nThat is why they did not appear in pdfid’s report, and why they do in pdf-parser’s report (when option -O is used).\r\nJavaScript inside a PDF document is restricted in its interactions with the operating system resources: it can not\r\naccess the file system, the registry, … .\r\nNevertheless, the included JavaScript can be malicious code (a legitimate reason for the inclusion of JavaScript in\r\na PDF document, is input validation for PDF forms).\r\nBut we will first take a look at the embedded file. We to this by searching for the /EmbeddedFile keyword, like\r\nthis:\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 6 of 33\n\nFigure 9: Searching for embedded files\r\nNotice that the search option -s is not case sensitive, and that you do not need to include the leading slash (/).\r\npdf-parser found one object that represents an embedded file: the object with index 28.\r\nNotice the keywords /Filter /Flatedecode: this means that the embedded file is not included into the PDF\r\ndocument as-is, but that it has been “filtered” first (e.g., transformed). /FlateDecode indicates which\r\ntransformation was applied: “deflation”, e.g., zlib compression.\r\nTo obtain the embedded file in its original form, we need to decompress the contained data (stream), by applying\r\nthe necessary filters. This is done with option -f:\r\nFigure 10: Decompressing the embedded file\r\nThe long string of data (it looks random) produced by pdf-parser when option -f is used, is the decompressed\r\nstream data in Python’s byte string representation. Notice that this data starts with PK: this is a strong indication\r\nthat the embedded file is a ZIP container.\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 7 of 33\n\nWe will now use option -d to dump (write) the contained file to disk. Since it is (potentially) malicious, we use\r\nagain extension .vir.\r\nFigure 11: Extracting the embedded file to disk\r\nFile embedded.vir is the embedded file.\r\nOffice document analysis\r\nSince I was told that the embedded file is an Office document, we use a tool I developed for Office documents:\r\noledump.py\r\nBut if you would not know what type the embedded file is, you would first want to determine this. We will\r\nactually have to do that later, with a downloaded file.\r\nNow we run oledump.py on the embedded file we extracted: embedded.vir\r\nFigure 12: No ole file was found\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 8 of 33\n\nThe output of oledump here is a warning: no ole file was found.\r\nA bit of background can help understand what is happening here. Microsoft Office document files come in 2\r\nmajor formats: ole files and OOXML files.\r\nOle files (official name: Compound File Binary Format) are the “old” file format: the binary format that was\r\ndefault until Office 2007 was released. Documents using this internal format have extensions like .doc, .xls, .ppt,\r\n…\r\nOOXML files (Office Open XML) are the “new” file format. It’s the default since Office 2007. Its internal format\r\nis a ZIP container containing mostly XML files. Other contained file types that can appear are pictures (.png,\r\n.jpeg, …) and ole (for VBA macros for example). OOXML files have extensions like .docx, .xlsx, .docm, .xlsm,\r\n…\r\nOOXML is based on another format: OPC.\r\noledump.py is a tool to analyze ole files. Most malicious Office documents nowadays use VBA macros. VBA\r\nmacros are always stored inside ole files, even with the “new” format OOXML. OOXML documents that contain\r\nmacros (like .docm), have one ole file inside the ZIP container (often named vbaProject.bin) that contains the\r\nactual VBA macros.\r\nNow, let’s get back to the analysis of our embedded file: oledump tells us that it found no ole file inside the ZIP\r\ncontainer (OPC).\r\nThis tells us 1) that the file is a ZIP container, and more precisely, an OPC file (thus most likely an OOXML file)\r\nand 2) that it does not contain VBA macros.\r\nIf the Office document contains no VBA macros, we need to look at the files that are present inside the ZIP\r\ncontainer. This can be done with a dedicated tool for the analysis of ZIP files: zipdump.py\r\nWe just need to pass the embedded file as parameter to zipdump, like this:\r\nFigure 13: Looking inside the ZIP container\r\nEvery line of output produced by zipdump, represents a contained file.\r\nThe presence of folder “word” tells us that this is a Word file, thus extension .docx (because it does not contain\r\nVBA macros).\r\nWhen an OOXML file is created/modified with Microsoft Office, the timestamp of the contained files will always\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 9 of 33\n\nbe 1980-01-01.\r\nIn the result we see here, there are many files that have a different timestamp: this tells us, that this .docx file has\r\nbeen altered with a ZIP tool (like WinZip, 7zip, …) after it was saved with Office.\r\nThis is often an indicator of malicious intend.\r\nIf we are presented with an Office document that has been altered, it is recommended to take a look at the\r\ncontained files that were most recently changed, as this is likely the file that has been tampered for malicious\r\npurposed.\r\nIn our extracted sample, that contained file is the file with timestamp 2022-03-23 (that’s just a day ago, time of\r\nwriting): file document.xml.rels.\r\nWe can use zipdump.py to take a closer look at this file. We do not need to type its full name to select it, we can\r\njust use its index: 14 (this index is produced by zipdump, it is not metadata).\r\nUsing option -s, we can select a particular file for analysis, and with option -a, we can produce a hexadecimal/ascii\r\ndump of the file content. We start with this type of dump, so that we can first inspect the data and assure us that\r\nthe file is indeed XML (it should be pure XML, but since it has been altered, we must be careful).\r\nFigure 14: Hexadecimal/ascii dump of file document.xml.rels\r\nThis does indeed look like XML: thus we can use option -d to dump the file to the console (stdout):\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 10 of 33\n\nFigure 15: Using option -d to dump the file content\r\nThere are many URLs in this output, and XML is readable to us humans, so we can search for suspicious URLs.\r\nBut since this is XML without any newlines, it’s not easy to read. We might easily miss one URL.\r\nTherefor, we will use a tool to help us extract the URLs: re-search.py\r\nre-search.py is a tool that uses regular expressions to search through text files. And it comes with a small\r\nembedded library of regular expressions, for URLs, email addresses, …\r\nIf we want to use the embedded regular expression for URLs, we use option -n url.\r\nLike this:\r\nFigure 16: Extracting URLs\r\nNotice that we use option -u to produce a list of unique URLs (remove duplicates from the output) and that we are\r\npiping 2 commands together. The output of command zipdump is provided as input to command re-search by\r\nusing a pipe (|).\r\nMany tools in Didier Stevens’ Suite accept input from stdin and produce output to stdout: this allows them to be\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 11 of 33\n\npiped together.\r\nMost URLs in the output of re-search have schemas.openxmlformats.org as FQDN: these are normal URLs, to be\r\nexpected in OOXML files. To help filtering out URLs that are expected to be found in OOXML files, re-search\r\nhas an option to filter out these URLs. This is option -F with value officeurls.\r\nFigure 17: Filtered URLs\r\nOne URL remains: this is suspicious, and we should try to download the file for that URL.\r\nBefore we do that, we want to introduce another tool that can be helpful with the analysis of XML files:\r\nxmldump.py. xmldump parses XML files with Python’s built-in XML parser, and can represent the parsed output\r\nin different formats. One format is “pretty printing”: this makes the XML file more readable, by adding newlines\r\nand indentations. Pretty printing is achieved by passing parameter pretty to tool xmldump.py, like this:\r\nFigure 18: Pretty print of file document.xml.rels\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 12 of 33\n\nNotice that the \u003cRelationship\u003e element with the suspicious URL, is the only one with attribute\r\nTargetMode=”External”.\r\nThis is an indication that this is an external template, that is loaded from the suspicious URL when the Office\r\ndocument is opened.\r\nIt is therefore important to retrieve this file.\r\nDownloading a malicious file\r\nWe will download the file with curl. Curl is a very flexible tool to perform all kinds of web requests.\r\nBy default, curl is not installed in Ubuntu:\r\nFigure 19: Curl is missing\r\nBut it can of course be installed:\r\nFigure 20: Installing curl\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 13 of 33\n\nAnd then we can use it to try to download the template. Often, we do not want to download that file using an IP\r\naddress that can be linked to us or our organisation. We often use the Tor network to hide behind. We use option -x\r\n127.0.0.1:9050 to direct curl to use a proxy, namely the Tor service running on our machine. And then we like to\r\nuse option -D to save the headers to disk, and option -o to save the downloaded file to disk with a name of our\r\nchoosing and extension .vir.\r\nNotice that we also number the header and download files, as we know from experience, that often several\r\nattempts will be necessary to download the file, and that we want to keep the data of all attempts.\r\nFigure 21: Downloading with curl over Tor fails\r\nThis fails: the connection is refused. That’s because port 9050 is not open: the Tor service is not installed. We need\r\nto install it first:\r\nFigure 22: Installing Tor\r\nNext, we try again to download over Tor:\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 14 of 33\n\nFigure 23: The download still fails\r\nThe download still fails, but with another error. The CONNECT keyword tells us that curl is trying to use an\r\nHTTP proxy, and Tor uses a SOCKS5 proxy. I used the wrong option: in stead of option -x, I should be using\r\noption –socks5 (-x is for HTTP proxies).\r\nFigure 24: The download seems to succeed\r\nBut taking a closer look at the downloaded file, we see that it is empty:\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 15 of 33\n\nFigure 25: The downloaded file is empty, and the headers indicate status 301\r\nThe content of the headers file indicates status 301: the file was permanently moved.\r\nCurl will not automatically follow redirections. This has to be enabled with option -L, let’s try again:\r\nFigure 26: Using option -L\r\nAnd now we have indeed downloaded a file:\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 16 of 33\n\nFigure 27: Download result\r\nNotice that we are using index 2 for the downloaded files, as to not overwrite the first downloaded files.\r\nDownloading over Tor will not always work: some servers will refuse to serve the file to Tor clients.\r\nAnd downloading with Curl can also fail, because of the User Agent String. The User Agent String is a header that\r\nCurl includes whenever it performs a request: this header indicates that the request was done by curl. Some\r\nservers are configured to only serve files to clients with the “proper” User Agent String, like the ones used by\r\nOffice or common web browsers.\r\nIf you suspect that this is the case, you can use option -A to provide an appropriate User Agent String.\r\nAs the downloaded file is a template, we expect it is an Office document, and we use oledump.py to analyze it:\r\nFigure 28: Analyzing the downloaded file with oledump fails\r\nBut this fails. Oledump does not recognize the file type: the file is not an ole file or an OOXML file.\r\nWe can use Linux command file to try to identify the file type based on its content:\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 17 of 33\n\nFgiure 29: Command file tells us this is pure text\r\nIf we are to believe this output, the file is a pure text file.\r\nLet’s do a hexadecimal/ascii dump with command xxd. Since this will produce many pages of output, we pipe the\r\noutput to the head command, to limit the output to the first 10 lines:\r\nFigure 30: Hexadecimal/ascii dump of the downloaded file\r\nRTF document analysis\r\nThe file starts with {\\rt : this is a deliberately malformed RTF file. Richt Text Format is a file format for Word\r\ndocuments, that is pure text. The format does not support VBA macros. Most of the time, malicious RTF files\r\nperform malicious actions through exploits.\r\nProper RTF files should start with {\\rtf1. The fact that this file starts with {\\rt. is a clear indication that the file has\r\nbeen tampered with (or generated with a maldoc generator): Word will not produce files like this. However,\r\nWord’s RTF parser is forgiving enough to accept files like this.\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 18 of 33\n\nDidier Stevens’ Suite contains a tool to analyze RTF files: rtfdump.py\r\nBy default, running rtfdump.py on an RTF file produces a lot of output:\r\nFigure 31: Parsing the RTF file\r\nThe most important fact we know from this output, is that this is indeed an RTF file, since rtfdmp was able to\r\nparse it.\r\nAs RTF files often contain exploits, they often use embedded objects. Filtering rtfdump’s output for embedded\r\nobjects can be done with option -O:\r\nFigure 32: There are no embedded objects\r\nNo embedded objects were found. Then we need to look at the hexadecimal data: since RTF is a text format,\r\nbinary data is encoded with hexadecimal digits. Looking back at figure 30, we see that the second entry (number\r\n2) contains 8349 hexadecimal digits (h=8349). That’s the first entry we will inspect further.\r\nNotice that 8349 is an uneven number, and that encoding a single byte requires 2 hexadecimal digits. This is an\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 19 of 33\n\nindication that the RTF file is obfuscated, to thwart analysis.\r\nUsing option -s, we can select entry 2:\r\nFigure 33: Selecting the second entry\r\nIf you are familiar with the internals of RTF files, you would notice that the long, uninterrupted sequences of curly\r\nbraces are suspicious: it’s another sign of obfuscation.\r\nLet’s try to decode the hexadecimal data inside entry 2, by using option -H\r\nFigure 34: Hexadecimal decoding\r\nAfter some randomly looking bytes and a series of NULL bytes, we see a lot of FF bytes. This is typical of ole\r\nfiles. Ole files start with a specific set of bytes, known as a magic header: D0 CF 11 E0 A1 B1 1A E1.\r\nWe can not find this sequence in the data, however we find a sequence that looks similar: 0D 0C F1 1E 0A 1B 11\r\nAE 10 (starting at position 0x46)\r\nThis is almost the same as the magic header, but shifted by one hexadecimal digit. This means that the RTF file is\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 20 of 33\n\nobfuscated with a method that has not been foreseen in the deobfuscation routines of rtfdump. Remember that the\r\nnumber of hexadecimal digits is uneven: this is the result. Should rtfdump be able to properly deobfuscate this\r\nRTF file, then the number would be even.\r\nBut that is not a problem: I’ve foreseen this, and there is an option in rtfdump to shift all hexadecimal strings with\r\none digit. This is option -S:\r\nFigure 35: Using option -S to manually deobfuscate the file\r\nWe have different output now. Starting at position 0x47, we now see the correct magic header: D0 CF 11 E0 A1\r\nB1 1A E1\r\nAnd scrolling down, we see the following:\r\nFigure 36: ole file directory entries (UNICODE)\r\nWe see UNICODE strings RootEntry and ole10nAtiVE.\r\nEvery ole file contains a RootEntry.\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 21 of 33\n\nAnd ole10native is an entry for embedded data. It should all be lower case: the mixing of uppercase and lowercase\r\nis another indicator for malicious intend.\r\nAs we have now managed to direct rtfdump to properly decode this embedded olefile, we can use option -i to help\r\nwith the extraction:\r\nFigure 37: Extraction of the olefile fails\r\nUnfortunately, this fails: there is still some unresolved obfuscation. But that is not a problem, we can perform the\r\nextraction manually. For that, we locate the start of the ole file (position 0x47) and use option -c to “cut” it out of\r\nthe decoded data, like this:\r\nFigure 38: Hexadecimal/ascii dump of the embedded ole file\r\nWith option -d, we can perform a dump (binary data) of the ole file and write it to disk:\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 22 of 33\n\nFigure 39: Writing the embedded ole file to disk\r\nWe use oledump to analyze the extracted ole file (ole.vir):\r\nFigure 40: Analysis of the extracted ole file\r\nIt succeeds: it contains one stream.\r\nLet’s select it for further analysis:\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 23 of 33\n\nFigure 41: Content of the stream\r\nThis binary data looks random.\r\nLet’s use option -S to extract strings (this option is like the strings command) from this binary data:\r\nFigure 42: Extracting strings\r\nThere’s nothing recognizable here.\r\nLet’s summarize where we are: we extracted an ole file from an RTF file that was downloaded by a .docx file\r\nembedded in a PDF file. When we say it like this, we can only think that this is malicious.\r\nShellcode analysis\r\nRemember that malicious RTF files very often contain exploits? Exploits often use shellcode. Let’s see if we can\r\nfind shellcode.\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 24 of 33\n\nTo achieve this, we are going to use scdbg, a shellcode emulator developed by David Zimmer.\r\nFirst we are going to write the content of the stream to a file:\r\nFigure 43: Writing the (potential) shellcode to disk\r\nscdbg is an free, open source tool that emulates 32-bit shellcode designed to run on the Windows operating\r\nsystem. Started as a project running on Windows and Linux, it is now further developed for Windows only.\r\nFigure 44: Scdbg\r\nWe download Windows binaries for scdbg:\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 25 of 33\n\nFigure 45: Scdbg binary files\r\nAnd extract executable scdbg.exe to our working directory:\r\nFigure 46: Extracting scdbg.exe\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 26 of 33\n\nFigure 47: Extracting scdbg.exe\r\nAlthough scdbg.exe is a Windows executable, we can run it on Ubuntu via Wine:\r\nFigure 48: Trying to use wine\r\nWine is not installed, but by now, we know how to install tools like this:\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 27 of 33\n\nFigure 49: Installing wine\r\nFigure 50: Tasting wine 😊\r\nWe can now run scdbg.exe like this:\r\nwine scdbg.exe\r\nscdbg requires some options: -f sc.vir to provide it with the file to analyze\r\nShellcode has an entry point: the address from where it starts to execute. By default, scdbg starts to emulate from\r\naddress 0. Since this is an exploit (we have not yet recognized which exploit, but that does not prevent us from\r\ntrying to analyze the shellcode), its entry point will not be address 0. At address 0, we should find a data structure\r\n(that we have not identified) that is exploited.\r\nTo summarize: we don’t know the entry point, but it’s important to know it.\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 28 of 33\n\nSolution: scdbg.exe has an option to try out all possible entry points. Option -findsc.\r\nAnd we add one more option to produce a report: -r.\r\nLet’s try this:\r\nFigure 51: Running scdbg via wine\r\nThis looks good: after a bunch of messages and warnings from Wine that we can ignore, scdbg proposes us with 8\r\n(0 through 7) possible entry points. We select the first one: 0\r\nFigure 52: Trying entry point 0 (address 0x95)\r\nAnd we are successful: scdbg.exe was able to emulate the shellcode, and show the different Windows API calls\r\nperformed by the shellcode. The most important one for us analysts, is URLDownloadToFile. This tells us that the\r\nshellcode downloads a file and writes it to disk (name vbc.exe).\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 29 of 33\n\nNotice that scdbg did emulate the shellcode: it did not actually execute the API calls, no files were downloaded or\r\nwritten to disk.\r\nAlthough we don’t know which exploit we are dealing with, scdbg was able to find the shellcode and emulate it,\r\nproviding us with an overview of the actions executed by the shellcode.\r\nThe shellcode is obfuscated: that is why we did not see strings like the URL and filename when extracting the\r\nstrings (see figure 42). But by emulating the shellcode, scdbg also deobfuscates it.\r\nWe can now use curl again to try to download the file:\r\nFigure 53: Downloading the executable\r\nAnd it is indeed a Windows executable (.NET):\r\nFigure 54: Headers\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 30 of 33\n\nFigure 55: Running command file on the downloaded file\r\nTo determine what we are dealing with, we try to look it up on VirusTotal.\r\nFirst we calculate its hash:\r\nFigure 56: Calculating the MD5 hash\r\nAnd then we look it up through its hash on VirusTotal:\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 31 of 33\n\nFigure 57: VirusTotal report\r\nFrom this report, we conclude that the executable is Snake Keylogger.\r\nIf the file would not be present on VirusTotal, we could upload it for analysis, provided we accept the fact that we\r\ncan potentially alert the criminals that we have discovered their malware.\r\nIn the video for this blog post, there’s a small bonus at the end, where we identify the exploit: CVE-2017-11882.\r\nConclusion\r\nThis is a long blog post, not only because of the different layers of malware in this sample. But also because in\r\nthis blog post, we provide more context and explanations than usual.\r\nWe explained how to install the different tools that we used.\r\nWe explained why we chose each tool, and why we execute each command.\r\nThere are many possible variations of this analysis, and other tools that can be used to achieve similar results. I for\r\nexample, would pipe more commands together.\r\nThe important aspect to static analysis like this one, is to use dedicated tools. Don’t use a PDF reader to open the\r\nPDF, don’t use Office to open the Word document, … Because if you do, you might execute the malicious code.\r\nWe have seen malicious documents like this before, and written blog post for them like this one. The sample we\r\nanalyzed here, has more “layers” than these older maldocs, making the analysis more challenging.\r\nIn that blog post, we also explain how this kind of malicious document “works”, by also showing the JavaScript\r\nand by opening the document inside a sandbox.\r\nIOCs\r\nType Value\r\nPDF sha256: 05dc0792a89e18f5485d9127d2063b343cfd2a5d497c9b5df91dc687f9a1341d\r\nRTF sha256: 165305d6744591b745661e93dc9feaea73ee0a8ce4dbe93fde8f76d0fc2f8c3f\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 32 of 33\n\nEXE sha256: 20a3e59a047b8a05c7fd31b62ee57ed3510787a979a23ce1fde4996514fae803\r\nURL hxxps://vtaurl[.]com/IHytw\r\nURL hxxp://192[.]227[.]196[.]211/FRESH/fresh[.]exe\r\nThese files can be found on VirusTotal, MalwareBazaar and Malshare.\r\nAbout the authors\r\nDidier Stevens is a malware expert working for NVISO. Didier is a SANS Internet Storm Center senior handler\r\nand Microsoft MVP, and has developed numerous popular tools to assist with malware analysis. You can find\r\nDidier on Twitter and LinkedIn.\r\nYou can follow NVISO Labs on Twitter to stay up to date on all our future research and publications.\r\nSource: https://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nhttps://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/\r\nPage 33 of 33",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"references": [
		"https://blog.nviso.eu/2022/04/06/analyzing-a-multilayer-maldoc-a-beginners-guide/"
	],
	"report_names": [
		"analyzing-a-multilayer-maldoc-a-beginners-guide"
	],
	"threat_actors": [],
	"ts_created_at": 1775439017,
	"ts_updated_at": 1775791225,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/84684d534f22304ebcda3d9bf7c3a3f9c7a0c258.pdf",
		"text": "https://archive.orkl.eu/84684d534f22304ebcda3d9bf7c3a3f9c7a0c258.txt",
		"img": "https://archive.orkl.eu/84684d534f22304ebcda3d9bf7c3a3f9c7a0c258.jpg"
	}
}