# Teasing the Secrets From Threat Actors: Malware Configuration Parsing at Scale

**[unit42.paloaltonetworks.com/teasing-secrets-malware-configuration-parsing](https://unit42.paloaltonetworks.com/teasing-secrets-malware-configuration-parsing)**

Mark Lim, Daniel Raygoza, Bob Jung May 3, 2023

By [Mark Lim,](https://unit42.paloaltonetworks.com/author/mark-lim/) [Daniel Raygoza and](https://unit42.paloaltonetworks.com/author/daniel-raygoza/) [Bob Jung](https://unit42.paloaltonetworks.com/author/bob-jung/)

May 3, 2023 at 6:00 AM

[Category: Malware](https://unit42.paloaltonetworks.com/category/malware-2/)

Tags: [Advanced WildFire,](https://unit42.paloaltonetworks.com/tag/advanced-wildfire/) [IcedID,](https://unit42.paloaltonetworks.com/tag/icedid/) [memory detection,](https://unit42.paloaltonetworks.com/tag/memory-detection/) [WildFire](https://unit42.paloaltonetworks.com/tag/wildfire/)

This post is also available in: 日本語 [(Japanese)](https://unit42.paloaltonetworks.jp/teasing-secrets-malware-configuration-parsing/)

## Executive Summary

Configuration data that changes across each instance of deployed malware can be a gold
mine of information about what the bad guys are up to. The problem is that configuration
data in malware is usually difficult to parse statically from the file, by design. Malware authors
know the intelligence value as they provide directives for how the malware should behave.

Malware is like most complex software systems in that there are many advantages for code
reuse and abstraction. Therefore, it is not surprising to see that the concept of software
configuration is pervasive across the various malware families we analyze. After all, it’s pretty


-----

hard to imagine a stereotypical cybercriminal wanting to bother with recompiling their code to
change an IP address or whatever else, when going after different targets.

But the good news is that statically armored configuration data can often easily be found and
parsed directly from memory. We will cover a nice example of an IcedID (information stealer)
configuration, how it was obfuscated and how we’ve extracted it.

Palo Alto Networks customers receive improved detection for the evasions discussed in this
blog through Advanced WildFire. As we continue to parse and extract this information from
malware families at scale, we hope to build out a pool of threat intelligence that will better
help us understand the campaigns and tactics of the various threat actors who are targeting
various organizations.

**Related Unit 42 Topics** **[Memory Detection,](https://unit42.paloaltonetworks.com/tag/memory-detection)** **[Malware](https://unit42.paloaltonetworks.com/category/malware-2/)**

What Are Malware Configurations?

IcedID Analysis

Unpacking IcedID Stage One

Locating the Encrypted Configuration Data Blob

Extracting the Encryption Key

Decrypting the Configuration Data Blob With the Encryption Key

Unpacking the IcedID Stage Two Binary

Locating the Encrypted Configuration Data Blob

Extracting the Encryption Key

Decrypting the Configuration Data Blob With the Encryption Key

Scaling Up

Conclusion

Indicators of Compromise

Additional Resources

## What Are Malware Configurations?

So what exactly do we mean by the term “configuration” when talking about malware?
Outside the context of malware, we think of configuration in terms of defining how systems
should behave. For example, we would consider the rules used to define which networking
routes for a firewall are allowed, or which font size your web browser uses while you read
this, as configurable information.

For malware, this is no different. Malware configurations are just collections of elements that
define how a malware operates, such as the following:

Command-and-control (C2) network addresses
Passwords for remote administrators


-----

File paths in which to drop persistent payloads

The way these elements are embedded in malware components tends to be specific to each
malware family. Also, they might evolve over time as malware undergoes development, or
when malware authors change their build process.

Generally speaking, malware configuration elements tend to be the properties of malware
that the authors want to make easily editable between campaigns and deployments without
requiring manual code edits for each one. Malware configuration elements can also expose
latent behaviors and malware infrastructure that are not typically observable under routine
dynamic analysis.

Malware configurations have intelligence value for security practitioners because they
provide insights into campaigns over time. In some cases, defenders could use them as
actionable artifacts for network detection, or for identifying infected hosts. The successful
extraction and validation of a malware configuration can also be used to reinforce our
confidence when identifying a file as malicious.

Because malware configurations have value to security systems and defenders alike, it is
state-of-practice for modern malware authors to protect their configuration elements using
different techniques. These protections often include a blend of encryption, obfuscation and
[compression. They might also be layered with evasive techniques.](https://unit42.paloaltonetworks.com/sandbox-evasion-memory-detection/)

This protection poses a significant challenge for malware configuration extractors that
operate solely by using static analysis, because all of these protections must be detected
and bypassed before extraction can be performed. Using an advanced dynamic analysis
sandbox combined with intelligent runtime memory analysis makes it possible to bypass
many of these protections and pinpoint the best opportunities to perform extraction.

When we represent and store these configurations using standardized schemas, it enables
us to extract maximum value through automation, machine learning and interactive analysis.
The [DC3-MWCP library defines a schema for many of the most common configuration](https://github.com/dod-cyber-crime-center/DC3-MWCP)
[element types, and it provides a simple library for serialization to JSON.](https://www.json.org/json-en.html)

The [MITRE](https://www.mitre.org/) [MAEC and](https://maecproject.github.io/) [STIX projects also provide us with a more general vocabulary for](https://oasis-open.github.io/cti-documentation/)
representing malware configuration elements. This also allows us to correlate the elements
with observable objects collected during dynamic analysis.

## IcedID Analysis

Let’s look at one IcedID binary and how its configurations are encrypted.

Hash 05a3a84096bcdc2a5cf87d07ede96aff7fd5037679f9585fee9a227c0d9cbf51


-----

This [particular attack chain, shown in Figure 1, was discovered in early November 2022. It](https://twitter.com/Unit42_Intel/status/1588524735368937484?s=20&t=YXkHyDy_wX1vbbynVm9R6A)
delivered IcedID, an information stealer also known as Bokbot, as the final payload. This
[threat is well-known malware that has been attacking people since 2019.](https://unit42.paloaltonetworks.com/atoms/monsterlibra/)

The following diagram shows the infection chain.

Figure

1. IcedID infection chain.
Authors of IcedID took pains to hide their configurations. Recent samples of IcedID stage two
would only be downloaded if the victim’s machine matched the requirements of the threat
actor.

The configurations of IcedID consisted of C2 URLs and their campaign IDs. The C2 URLs
included some that might not be revealed during the execution of the IcedID binaries. The
campaign ID links IcedID samples back to specific threat actors.

We will go through the following steps to extract the configurations found in the IcedID stage
one and two binaries:

1. Unpack the IcedID binary
2. Locate the encrypted configuration data blob
3. Extract the encryption key
4. Decrypt the configuration data blob with the encryption key

### Unpacking IcedID Stage One

IcedID stage one unpacks itself by first allocating memory using the VirtualAlloc function.
This is followed by erasing the allocated memory using the Memset function, as shown in
Figure 2. Finally, it copies the unpacked data to the allocated memory using the Memmove
function.

To dump the unpacked data, we set a breakpoint at Memmove. The second argument of
Memmove contains the address of the unpacked data. Figure 2 also shows the DOS MZ
header of the unpacked IcedID stage one in the right-hand side of the hex dump.


-----

Figure 2. Unpacking IcedID stage one.

### Locating the Encrypted Configuration Data Blob

Next, we located the encrypted configuration data blob using the unpacked stage one
IcedID. While debugging the unpacked IcedID stage one file, we set a breakpoint at the
address that called WinHttpConnect, as shown in Figure 3. The address pointed to by
register RDI contains the string of the C2 URL.

Figure 3. Debugging IcedID stage one.
By [backtracing the code, we located a function that used the decrypted configuration as](https://www.gnu.org/software/libc/manual/html_node/Backtraces.html#:~:text=A%20backtrace%20is%20a%20list,purposes%20of%20logging%20or%20diagnostics.)
shown in Figure 4.

Figure 4. Tracing code in IcedID stage one.
Tracing the code flow back, we found the loop that decrypted the configuration, as shown in
Figure 5.

Figure 5. Configuration decryption loop for IcedID stage one.
The instruction at 0x7FEF33339CD loaded the address of the encrypted configuration data
blob (Encrypted Config) into register RDX.


-----

### Extracting the Encryption Key

The instruction at 0x7FEF33339D4 reads the encryption key. The key is 0x40 bytes offset
from the address of Encrypted_Config. We also learned the configuration is 0x20 bytes long.
An XOR [loop was used to decrypt the configuration.](https://stackoverflow.com/questions/39442293/assembly-code-how-does-xor-in-a-loop-works)

### Decrypting the Configuration Data Blob With the Encryption Key

After gathering the encryption key, the encrypted data blob and the decryption routine, we
can now decrypt the configuration using the following script shown in Figure 6.

Figure 6. Configuration decryption script for IcedID stage one.
The decrypted IcedID stage 1 configuration has the following format, as shown in Figure 7.


-----

Figure 7.

IcedID stage one configuration format.
From the decrypted configuration, we can extract the following IoCs:

C2 URL bayernbadabum[.]com

Campaign ID 1139942657

Now, we will decrypt the configuration for the IcedID stage two binary.

### Unpacking the IcedID Stage Two Binary

As the IcedID stage two binary uses the same packer as stage one, we will not repeat the
unpacking steps here.

### Locating the Encrypted Configuration Data Blob

We set a breakpoint at the address that calls Winhttpconnect, as shown in Figure 8.

Figure 8. Debugging IcedID stage two.
After tracing the code, we located the function that used the decrypted configuration, as
shown in Figure 9.

Figure 9. Tracing code in IcedID stage two.

### Extracting the Encryption Key


-----

Tracing the code flow even further back, we found the function that decrypts the
configuration. The first few instructions located the encrypted configuration blob. The
encrypted blob is 0x25c bytes long. The encryption key is the last 0x10 bytes of the
encrypted configuration blob, as shown in Figure 10.

Figure 10. Loading the encryption key for IcedID stage two.
After retrieving the encryption key, the next step is the loop to decrypt the encrypted blob, as
shown in Figure 11.

Figure 11. Configuration decryption loop for IcedID stage two.

### Decrypting the Configuration Data Blob With the Encryption Key

We replicated the instructions in the decryption loop using Python. After gathering the
encryption key, encrypted data blob and the decryption routine, we can now decrypt the
configuration using the following script (shown in Figure 12).


-----

Figure 12. Configuration decryption script for IcedID stage two. Note: Jquinn147 and
myrtus0x0 published a similar configuration decryption script for IcedID in May 2021, called
[IcedDecrypt (GitHub).](https://github.com/BinaryDefense/IcedDecrypt/blob/main/IcedDecrypt.py)
The decrypted IcedID stage two configuration has the following format, shown in Figure 13.

Figure 13. Configuration

format for IcedID stage two.
From the decrypted configuration, we can extract the following indicators of compromise
(IoCs):


-----

C2 URLs newscommercde[.]com
spkdeutshnewsupp[.]com

germanysupportspk[.]com

nrwmarkettoys[.]com

C2 URI news

Campaign ID 1139942657

We have manually decrypted the configuration for both the IcedID stage one and two
binaries.

## Scaling Up

Now that we’ve discussed the work of figuring out how to target the configuration data in
memory, the next challenge is to figure out how to perform this at scale. The massive scale
of most malware processing systems means that most practitioners looking to build out a
configuration extraction system will need to be careful about adding additional overhead.
This means that we will need a mechanism to intelligently identify only the samples of
interest for each parser, so we’re not unnecessarily running dozens of parsers across
millions of samples.

We think a reasonable approach to this problem involves using intelligent runtime memory
analysis, as it provides us with excellent visibility into the secrets malware authors want to
protect. A typical workflow for our malware configuration extractors includes the following
activities:

Scanning memory and/or other dynamic analysis artifacts
Applying a noise filter on the results to identify the best candidates for extraction
Performing extraction using the best fitting module and storing the results for reporting
and indexing

Generalizing this common workflow presented us with the opportunity to make the following
improvements:

Optimizing the search phase by only scanning analysis data once in most cases
Applying abstractions and reusable code for many common tasks
Limiting the impact of modules with problematic inputs or other bugs
Giving our security researchers visibility into the performance of their modules


-----

The following example shows some of the IoCs from a recent IcedID extractor after being
deployed at scale. Having a nice framework for deploying configuration extractors means
that once you are finished crafting a configuration extraction script, it’s time to kick your feet
up and relax while hundreds of configurations flow into your malware configuration database.

Figure 14. IoCs from IcedID samples.

## Conclusion


-----

Thank you for joining us in this overview of malware configurations and why we are working
hard to parse this information at scale in Advanced WildFire. Reverse engineering variants of
each malware family allow us to build out parsers to extract meaningful and relevant data for
all of them at scale.

There is a staggering amount of diversity among payloads in the malware landscape, which
makes the task of supporting them all more or less impossible. Where possible, we use
metrics-based approaches to prioritize focus on the malware families and variants most
relevant to our customers. In this ongoing area of research, our team will continue to expand
support for new malware families and variants.

Palo Alto Networks customers receive protections from threats such as those discussed in
this post with [Advanced WildFire.](https://www.paloaltonetworks.com/network-security/advanced-wildfire)

## Indicators of Compromise

05a3a84096bcdc2a5cf87d07ede96aff7fd5037679f9585fee9a227c0d9cbf51

## Additional Resources

_Updated May 17, 2023, at 6:00 a.m. PT._

**Get updates from**

**Palo Alto**

**Networks!**

Sign up to receive the latest news, cyber threat intelligence and research from us

[By submitting this form, you agree to our Terms of Use and acknowledge our Privacy](https://www.paloaltonetworks.com/legal-notices/terms-of-use)
Statement.


-----