HTML Smuggling Detection
By Micah Babinski
Published: 2022-12-28 · Archived: 2026-04-05 22:43:16 UTC
15 min read
Dec 28, 2022
Press enter or click to view image in full size
The most famous fictional smuggler that I could think of
Introduction
In this article I’ll delve into HTML smuggling detection, following the detection engineering process I’ve
described over my last two posts. This process includes research, testing, and development of new detection
concepts. Unlike my previous posts however, this time we’ll be observing, profiling, and detecting real QakBot
malware in the lab. With this piece I hope to show how current and aspiring detection engineers can go beyond
simulated, CTF-style challenges to study and detect real-world attacker techniques, flagging them for our SOC,
and allowing our incident response colleagues to neutralize the threat.
Our Itinerary
After some background and defining the objective, I’ll demonstrate how we can execute QakBot HTML
Smuggling malware in the lab, tracking it’s behavior in Splunk. Then, I’ll sequence observable events of this
attack, and identify useful detection points (which I think of as “building blocks,” or links of the attack chain). I’ll
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 1 of 14
also introduce correlation (specifically chain rules), a crucial tool in your threat detection toolbox. To illustrate
what this looks like, I’ll discuss and demonstrate an as-yet-unreleased capability of Sigma, called Sigma
Correlations. I’ll wrap up by testing a new chain rule against 10 additional samples of QakBot HTML Smuggling
malware to see how it performs.
Let’s get started! 🏁🏁🏁
Background
Early on in my cybersecurity journey, John Strand of Black Hills Information Security said something that stuck
with me: “There is no ‘YOU HAVE BEEN HACKED’ log.” Event logs can be confusing and hard to read even on
a good day. Deriving useful, actionable insights from logs is tricky! The challenge sometimes requires deep
analysis and specialized techniques to be successful.
I’ve had that lesson in the back of my mind throughout my career in security, and it was with this in mind that a
few months ago, I started seeing tons of posts like this one from @pr0xylife:
I like the way that pr0xylife and their peers summarize these attacks so succinctly, and I found myself repeating
the sequence of file types out loud — almost like a weird form of poetry!
I wanted to learn more, but didn’t know how to get started. I had reviewed some excellent research on QakBot by
the team at Trellix, so I knew a bit about their techniques. (If you have never heard of QakBot, please read the
Trellix report!) As I started following QakBot activity and digging deeper, I realized that pr0xylife and others were
describing the subtle combinations and variations in the way QakBot could trick its victims and evade our
defenses. Some examples:
It uses HTML Smuggling, where a malicious HTML file containing encoded JavaScript is executed by the
victim’s browser, downloading the next stage of the payload.
It uses password protected zip files to block sandboxing analysis.
It uses a disk image format called an .iso file to evade the Mark-of-the-Web protection, as Red Canary
explains very well.
LNK files disguised to lure users into executing to hidden .CMD and .DLL files.
And on and on with ever more deceptive tricks!
Viewed independently of one another, few of these behaviors will trigger an alert, much less get escalated, in
many SOCs. So it seemed like a good use case for correlation, where multiple security event log entries are
aggregated, sequenced, and timed in relation to each other to yield more sophisticated and high-fidelity alerts. If
this sounds confusing, keep reading! By the end of this article, you should have a much better understanding of
what correlation means and how you can put it to work for threat detection.
The Objective
There are some detections out there for HTML smuggling and other QakBot-connected techniques. These are
mostly based on matching suspicious log events, such as this one from Elastic security:
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 2 of 14
I wanted to see if I could build my own detections, utilizing correlation (as opposed to simple matching) to make
them resilient to subtle shifts in attack behavior.
TBH, I was also sick of QakBot’s dumb chicanery and wanted a reliable way to place it squarely within
our sights.
Me, after seeing yet another QakBot variation
Initial Observation and Analysis of QakBot Malware
To accomplish my objective, I had to go beyond threat intelligence reports from security vendors and Atomic Red
Team tests; I needed my own data created from real malware samples. I turned to an old favorite: malware-traffic-analysis.net by Brad Duncan (@malware_traffic). This site is among the most useful educational resources I have
come across in security. In addition to Wireshark tutorials and hands-on network traffic exercises, the site offers
quality analysis of real-world malware samples, including QakBot HTML smuggling files.
After making snapshots of my lab VMs, I fired up my virtual detection lab and visited a recent entry. Be careful,
the files hosted on this site are unsafe!
I downloaded the artifacts zip file and extracted it to my downloads folder using the WinRAR utility built into my
detection lab:
Press enter or click to view image in full size
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 3 of 14
The IOC notes were extremely helpful in giving context to the included files. Among the notes were the
following, which let me know that the infection chain started with the HTML file, called SCAN_DT6281.html.
2022-12-09 (FRIDAY) - HTML SMUGGLING FOR QAKBOT (QBOT) DISTRIBUTION TAG: AZD
DISTRIBUTION:
- Unknown source, possibly email --> HTML file --> password-protected zip archive --> extracted ISO i
I opened the HTML file in Chrome browser and noticed an immediate zip file download (bottom left below):
Press enter or click to view image in full size
Seems legit
…sure, let’s open up the zip file, why not?
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 4 of 14
The zip file was password protected but could be opened using the password displayed on the HTML page. The
zip file contained a single .img file, which I know to be another mountable disk image file format. Double-clicking this mounted the file as the D drive on my system:
Press enter or click to view image in full size
The shortcut LNK (SCAN_DT6281) and hidden folder (IncomingPay)
I also noticed a hidden directory called IncomingPay, which contained a .lnk file, two text files which contained (I
kid you not) excerpts from the Wikipedia page on Psychology, a .cmd file 🤔, and a .lc file 🤷.
As a security professional and dutiful watcher of corporate security awareness training, all of my 🚩🚩🚩 were
up and waving in the breeze. But this is for science, so I take the bait and double-click the LNK file, just as the
attacker wants the victim to do. 🎣 A command line window appears for a few seconds, then disappears:
Press enter or click to view image in full size
The LNK file target property is interesting:
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 5 of 14
C:\Windows\System32\cmd.exe /c IncomingPay\Issues.cmd A B C D E F G H I J K L M N O P Q R S T U V W X
This tells me that I’ve run the Issues.cmd file located in the IncomingPay directory using CMD, which is
important to know for later. Now back in Splunk, I start running queries to see what is happening on my victim
VM:
Press enter or click to view image in full size
Notice the commands logged immediately after I took the bait!
How do I know what searches to run? I don’t really, but through my experience as an analyst and researcher I
know that there are certain types of evidence that may appear in my logs — process executions, file creations,
network connections, and the like. This is where on-the-job experience as a security analyst, or a good dose of
CTF-style trainings (like those available at CyberDefenders or LetsDefend) come in handy.
Get Micah Babinski’s stories in your inbox
Join Medium for free to get updates from this writer.
Remember me for faster sign in
After refreshing my searches a few times and wondering if I had done something wrong, I noticed a definite,
incontrovertible sign of an intrusion: a burst of recon activity on my victim host:
Press enter or click to view image in full size
I forgot to include in the screenshot, but the parent process for all these commands was wermgr.exe
(what???)
As I was looking at these commands, I noted the time period in which they occurred — all within the span of a
few seconds! Also, the processes were all spawned by an unusual parent — wermgr.exe (the Windows Error
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 6 of 14
Reporting Manager). A good detection opportunity, perhaps? Event the most confused admin or overbuilt piece of
[legit] software will not run all these suspicious recon commands within such a short timeframe, and never as
children of a wermgr.exe. Swinging over to my network connections, I noticed that wermgr.exe had suddenly
become extremely chatty with external IP addresses:
Press enter or click to view image in full size
Uhhhhh………………
Following this initial assessment of my data, I determined that:
The malware sample contained a malicious initial access vector (duh, the rogues gallery of the .html, .zip,
.img, .lnk, .cmd, and .lc files).
The malware would download the zip file after opening the HTML page in a browser, and the zip file
contained a mountable .img file.
The mounted drive contained a malicious shortcut that would result in the execution of additional
commands.
An injected wermgr.exe process spawned an automated burst of recon activity and then connected to
suspicious external IP addresses, presumably to send the attacker information about our system and
network.
In Mitre ATT&CK terms, this all amounts to Initial Access, Defense Evasion, Discovery, and Command and
Control. In other words: big oof.
Press enter or click to view image in full size
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 7 of 14
Rogues Gallery
Breaking Down the Attack into Detection Building Blocks
Having performed an initial execution of the QakBot HTML smuggling technique in the lab and reverted to my
VM snapshots, it was time to dig further into the logs to see what was happening.
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 8 of 14
As I reviewed the various types of events generated by the attack, I began to develop a plain-language narrative
understanding of what was happening:
“An attacker sends a victim an HTML file that purports to contain a report, invoice, or other document
of interest to them. Basically, a phishing lure. When the victim opens the file in their browser,
embedded JavaScript code executes, which downloads or builds a password-protected zip file on the
victim’s system. When the victim unzips the archive, the extraction creates a mountable disk image
file. After mounting the drive, the user sees a shortcut that they believe will take them to the resource
they are trying to find. But the shortcut actually calls a command or scripting interpreter that executes
other malicious files that are hidden on the disk drive, leading to initial access compromise.”
That’s a wordy chunk of prose right there, but it makes sense in my head. If I am going to detect something, I need
to understand it first! Note the bold text: I used my narrative to highlight events that I could use to detect the
attack chain. Based on this review, I analyzed the logs from the attack, trying to isolate the following events:
1. Phishing email sent to victim containing an HTML attachment.
2. Creation of an HTML file in suspicious locations.
3. Opening of a stand-alone HTML file in a browser application.
4. Download/creation of a zip file by the browser application.
5. Opening/extraction of a password-protected zip file.
6. Creation of a mountable disk drive file format (.iso, .img, etc).
7. Mounting a drive.
8. Process execution on an external drive (either from an executable on that drive, or system executable
touching files on that drive).
Whew! That’s…a lot of events.
The Good News
The good news was, there are ways to detect nearly every event listed above! This means I had numerous ideas for
how to query and filter available logs to extract the meaningful events (building blocks) for future detections.
The Bad News
The bad news was, none of these many events is, on its own, malicious. Again, there is no YOU HAVE BEEN
HACKED log: each one of these events could occur in the course of normal business and be completely safe and
benign. The solution would be to correlate these events, in an ordered or unordered sequence, grouping them by a
common attribute like hostname, and triggering an alert when all of these occur within a time window, like an
hour. The problem is, from my analysis and testing (more on that in a moment), chaining or correlating that many
events would result in a brittle detection.
Brittle vs. Resilient
“Brittleness” describes the degree to which a detection idea falls apart in the face of subtle changes to attacker
techniques, variations in actions performed by the victim, problems with logging, or other factors outside of our
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 9 of 14
control. Brittle detections contrast with “resilient” detections, which are flexible and can withstand these subtle
shifts.
It’s not as simple as saying “Brittle = bad and resilient = good.” A brittle detection could be highly-targeted, with a
narrow “aperture.” The advantage could be that, if a brittle detection rule fires, there is a very high probability that
it is a true positive. Resilient detections may have a wider aperture and may match more potential malicious
activity. However, this could lead to false positives and a frustrated SOC if they are not developed with care.
With this in mind, I returned to my list of events from above.
Determining the Building Blocks to Test
In context, I realized that items one through three in the list above may not be good components of my HTML
Smuggling detection. Lack of visibility and a high volume of innocent behavior would hinder item 1. I struck item
two because I had limited ability to test this event, and item three proved unreliable depending on which browser I
used. Plus, while not technically HTML Smuggling, a lot of QakBot activity uses URLs, rather than stand-alone
HTML files, to deliver the initial payload.
Item five (extraction of a password-protected zip file) has lots of potential, and can be detected using this rule
written by Florian Roth and inspired by the research of @SBousseaden. However, I left it out of my scope because
1) that event was not logging in my lab and 2) some documentation indicates that it is only applicable to certain
operating systems.
After exploring my data, enumerating possible detection building blocks for my correlation, and winnowing that
list down based on further analysis, I had the following building blocks:
1. Web Browser Creates (Downloads) Zip Archive File (represents opening the malicious HTML file in a
browser).
2. ISO, VHD, LNK or IMG File Extracted from Zip (extracting the malicious disk image file).
3. Disk Image Mount (mounting the image — this one I pulled directly from Sigma).
4. Suspicious User-Initiated Process Execution on External Drive (clicking the .lnk file which runs or
references files on the external drive).
With these building block concepts in place, I wrote or adapted a query and Sigma rule for each one, then
correlated them into a chain rule.
Sigma Correlations
Correlation allows us to track log events through time. Rather than triggering an alert for each of the four events
listed above, a chain rule correlation allows us to alert only when all four events occur in order on the same host
by the same user, which is much more suspicious. Many SIEM products and their corresponding query languages
support this type of chaining (although some do not).
To support this functionality, Sigma has a Correlations standard in progress that will allow us to write custom
correlation rules in a common format then convert them to whatever SIEM product supports this logic. The draft
standard can be reviewed here:
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 10 of 14
What might this look like? It’s a draft standard, and subject to change, but a simple example of a brute force
chain rule might look like this:
action: correlation
type: temporal
rule:
- many_failed_logins
- successful_login
group-by:
- User
timespan: 1h
ordered: true
When the rule query “many_failed_logins” is matched followed by the “successful_login” rule within a one hour
window, the correlation rule will fire. For a good example of a SIEM product that supports correlations, check out
SumoLogic chain rules.
My draft correlations rule looks like this:
title: HTML Smuggling Activity - Chain Rule
id: 0952f2fa-e29b-4eb5-831c-ce21520c56e3
status: experimental
description: Detects HTML smuggling-style compromise (such as HTML > ZIP > ISO/IMG/VHD > CMD/BAT/VBS
references:
- https://blog.talosintelligence.com/html-smugglers-turn-to-svg-images/#:~:text=HTML%20smuggling%
- https://www.malwarebytes.com/blog/news/2021/11/evasive-maneuvers-html-smuggling-explained
- Original research and analysis performed off of QakBot intelligence gathered at https://github
author: Micah Babinski
date: 2022/12/27
tags:
- attack.s0650
- attack.s0483
- attack.initial_access
- attack.defense_evasion
- attack.execution
- attack.t1564
- attack.t1566.001
- attack.t1566
- attack.t1027
- attack.t1027.006
- attack.t1059
- attack.t1204
- attack.t1204.002
action: correlation
type: temporal
rule:
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 11 of 14
- 1_win_zipfile_drop.yml
- 2_win_susp_file_extraction.yml
- 3_win_security_iso_mount.yml
- 4_win_process_creation_ext_drive.yml
group-by:
- ComputerName
- User
timespan: 1h
ordered: true
falsepositives:
- Unknown
level: high
This looks like many other Sigma rules you may have seen before, but has some unique elements. The action and
type statements let you know this is a correlation rule of the temporal type. The four rules listed show the
component rules in scope (these must be in the same directory as the correlation). The timespan specifies that the
four participating rules must fire within one hour of each other, and the group-by statement defines fields in the
rules which relate them to each other. Finally, the ordered: true statement lets Sigma know that these rules should
occur in sequence.
Again, the Sigma Correlations specification is in development, and is subject to change. Still, this will
be a very useful expansion of Sigma’s capabilities, so I wanted to preview it for you now!
Testing the Correlation
With this detection concept taking shape, and a hypothesis developed in the form of my correlation rule, it was
time to test the detection with a larger sample size. I relied on the MalwareBazaar repository from Abuse.ch,
which provides a helpful library of tagged malware samples. I downloaded 10 QakBot HTML samples, mostly
reported by pr0xylife, ranging in date first seen from July 11 to December 22, 2022. I prepared the samples on my
lab VM, and named each one according to its first seen date and its “humanhash” property (a random, unique
sequence of human-readable words, such as (“dakota-earth-mockingbird-march”):
Press enter or click to view image in full size
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 12 of 14
My 10 lovely HTML smuggling samples all ready to test
To track my results, I made a simple table in Google Sheets:
Press enter or click to view image in full size
I made a clean, pre-test snapshot of my victim VM host to revert back to after each test, and started in on the test.
After testing seven of the ten samples, my sequence of four building blocks had perfect coverage! When I hit the
eighth sample, however, the fourth rule in the chain did not fire, because the .lnk file called RunDLL32.exe
directly, instead of cmd.exe or wscript.exe. This was fine — I simply made an adjustment to the conditions of my
Sigma rule, retested, and got perfect matches! I was delighted with the results and excited to share the detection
use case with the community.
Bonus Round!
Many QakBot phishing attacks do not use .html attachments, but instead use malicious .pdf files with embedded
links, or just good old-fashioned phishing links that point to zip files hosted on a compromised website. To test
whether my detection was flexible enough to catch these instances as well, I tested them on a couple of recent
examples from the IOC repository maintained by @ExecuteMalware, and found that these URL-based driveby
attacks were also caught by the detection. The one documented here even included a .wsf (Windows Script File)
— a file type with which I was unfamiliar but was nonetheless caught by my detection.
Press enter or click to view image in full size
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 13 of 14
Windows Script File (.wsf) Delivered During a Recent QakBot Attack
Hooray for resilience!
Conclusion
Congratulations! You’ve reached the end of a lengthy post about some dense, complicated topics. You can find all
the rules referenced here. Thank you for reading, and I hope you’ve enjoyed my ramblings on QakBot, HTML
Smuggling, Sigma, and correlations. I am truly excited for Sigma Correlations to launch. It will be a big win for
the detection engineering community and will allow us to share more sophisticated detection use cases.
I was really pleased with how my tests went, particularly when it showed that one of the building blocks in my
chain rule had failed and needed adjustment! After all, why test if you think your work is already perfect? Lastly,
this experience drove home for me what I had already started to believe — that we can’t detect what we don’t
understand. There’s no substitute for first-hand experience with real live malware if you are trying to see what it
does.
Thanks to the Detection Lab project that I’ve enthused about in previous posts, this real-life experience is in reach
for more people than ever. Lastly, thanks to pr0xylife, Brad (malware_traffic), and executemalware for providing
pristine repositories of well-documented QakBot malware samples for us to access, analyze, and understand.
These are truly an educational treasure trove!
Please feel free to send me ideas, comments, or suggestions on how I can improve. I am still new to this, and I
welcome respectful feedback and critique in any form.
As always, happy analyzing! 🧐
Source: https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841
Page 14 of 14