{
	"id": "9fef0eb3-a4a5-4064-9931-7cabdbc9d1f3",
	"created_at": "2026-04-06T00:08:32.255621Z",
	"updated_at": "2026-04-10T03:21:34.462661Z",
	"deleted_at": null,
	"sha1_hash": "bce30e637037119a59cebffa0a1e66bcd5c74ea2",
	"title": "HTML Smuggling Detection",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 2937334,
	"plain_text": "HTML Smuggling Detection\r\nBy Micah Babinski\r\nPublished: 2022-12-28 · Archived: 2026-04-05 22:43:16 UTC\r\n15 min read\r\nDec 28, 2022\r\nPress enter or click to view image in full size\r\nThe most famous fictional smuggler that I could think of\r\nIntroduction\r\nIn this article I’ll delve into HTML smuggling detection, following the detection engineering process I’ve\r\ndescribed over my last two posts. This process includes research, testing, and development of new detection\r\nconcepts. Unlike my previous posts however, this time we’ll be observing, profiling, and detecting real QakBot\r\nmalware in the lab. With this piece I hope to show how current and aspiring detection engineers can go beyond\r\nsimulated, CTF-style challenges to study and detect real-world attacker techniques, flagging them for our SOC,\r\nand allowing our incident response colleagues to neutralize the threat.\r\nOur Itinerary\r\nAfter some background and defining the objective, I’ll demonstrate how we can execute QakBot HTML\r\nSmuggling malware in the lab, tracking it’s behavior in Splunk. Then, I’ll sequence observable events of this\r\nattack, and identify useful detection points (which I think of as “building blocks,” or links of the attack chain). I’ll\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 1 of 14\n\nalso introduce correlation (specifically chain rules), a crucial tool in your threat detection toolbox. To illustrate\r\nwhat this looks like, I’ll discuss and demonstrate an as-yet-unreleased capability of Sigma, called Sigma\r\nCorrelations. I’ll wrap up by testing a new chain rule against 10 additional samples of QakBot HTML Smuggling\r\nmalware to see how it performs.\r\nLet’s get started! 🏁🏁🏁\r\nBackground\r\nEarly on in my cybersecurity journey, John Strand of Black Hills Information Security said something that stuck\r\nwith me: “There is no ‘YOU HAVE BEEN HACKED’ log.” Event logs can be confusing and hard to read even on\r\na good day. Deriving useful, actionable insights from logs is tricky! The challenge sometimes requires deep\r\nanalysis and specialized techniques to be successful.\r\nI’ve had that lesson in the back of my mind throughout my career in security, and it was with this in mind that a\r\nfew months ago, I started seeing tons of posts like this one from @pr0xylife:\r\nI like the way that pr0xylife and their peers summarize these attacks so succinctly, and I found myself repeating\r\nthe sequence of file types out loud — almost like a weird form of poetry!\r\nI wanted to learn more, but didn’t know how to get started. I had reviewed some excellent research on QakBot by\r\nthe team at Trellix, so I knew a bit about their techniques. (If you have never heard of QakBot, please read the\r\nTrellix report!) As I started following QakBot activity and digging deeper, I realized that pr0xylife and others were\r\ndescribing the subtle combinations and variations in the way QakBot could trick its victims and evade our\r\ndefenses. Some examples:\r\nIt uses HTML Smuggling, where a malicious HTML file containing encoded JavaScript is executed by the\r\nvictim’s browser, downloading the next stage of the payload.\r\nIt uses password protected zip files to block sandboxing analysis.\r\nIt uses a disk image format called an .iso file to evade the Mark-of-the-Web protection, as Red Canary\r\nexplains very well.\r\nLNK files disguised to lure users into executing to hidden .CMD and .DLL files.\r\nAnd on and on with ever more deceptive tricks!\r\nViewed independently of one another, few of these behaviors will trigger an alert, much less get escalated, in\r\nmany SOCs. So it seemed like a good use case for correlation, where multiple security event log entries are\r\naggregated, sequenced, and timed in relation to each other to yield more sophisticated and high-fidelity alerts. If\r\nthis sounds confusing, keep reading! By the end of this article, you should have a much better understanding of\r\nwhat correlation means and how you can put it to work for threat detection.\r\nThe Objective\r\nThere are some detections out there for HTML smuggling and other QakBot-connected techniques. These are\r\nmostly based on matching suspicious log events, such as this one from Elastic security:\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 2 of 14\n\nI wanted to see if I could build my own detections, utilizing correlation (as opposed to simple matching) to make\r\nthem resilient to subtle shifts in attack behavior.\r\nTBH, I was also sick of QakBot’s dumb chicanery and wanted a reliable way to place it squarely within\r\nour sights.\r\nMe, after seeing yet another QakBot variation\r\nInitial Observation and Analysis of QakBot Malware\r\nTo accomplish my objective, I had to go beyond threat intelligence reports from security vendors and Atomic Red\r\nTeam tests; I needed my own data created from real malware samples. I turned to an old favorite: malware-traffic-analysis.net by Brad Duncan (@malware_traffic). This site is among the most useful educational resources I have\r\ncome across in security. In addition to Wireshark tutorials and hands-on network traffic exercises, the site offers\r\nquality analysis of real-world malware samples, including QakBot HTML smuggling files.\r\nAfter making snapshots of my lab VMs, I fired up my virtual detection lab and visited a recent entry. Be careful,\r\nthe files hosted on this site are unsafe!\r\nI downloaded the artifacts zip file and extracted it to my downloads folder using the WinRAR utility built into my\r\ndetection lab:\r\nPress enter or click to view image in full size\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 3 of 14\n\nThe IOC notes were extremely helpful in giving context to the included files. Among the notes were the\r\nfollowing, which let me know that the infection chain started with the HTML file, called SCAN_DT6281.html.\r\n2022-12-09 (FRIDAY) - HTML SMUGGLING FOR QAKBOT (QBOT) DISTRIBUTION TAG: AZD\r\nDISTRIBUTION:\r\n- Unknown source, possibly email --\u003e HTML file --\u003e password-protected zip archive --\u003e extracted ISO i\r\nI opened the HTML file in Chrome browser and noticed an immediate zip file download (bottom left below):\r\nPress enter or click to view image in full size\r\nSeems legit\r\n…sure, let’s open up the zip file, why not?\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 4 of 14\n\nThe zip file was password protected but could be opened using the password displayed on the HTML page. The\r\nzip file contained a single .img file, which I know to be another mountable disk image file format. Double-clicking this mounted the file as the D drive on my system:\r\nPress enter or click to view image in full size\r\nThe shortcut LNK (SCAN_DT6281) and hidden folder (IncomingPay)\r\nI also noticed a hidden directory called IncomingPay, which contained a .lnk file, two text files which contained (I\r\nkid you not) excerpts from the Wikipedia page on Psychology, a .cmd file 🤔, and a .lc file 🤷.\r\nAs a security professional and dutiful watcher of corporate security awareness training, all of my 🚩🚩🚩 were\r\nup and waving in the breeze. But this is for science, so I take the bait and double-click the LNK file, just as the\r\nattacker wants the victim to do. 🎣 A command line window appears for a few seconds, then disappears:\r\nPress enter or click to view image in full size\r\nThe LNK file target property is interesting:\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 5 of 14\n\nC:\\Windows\\System32\\cmd.exe /c IncomingPay\\Issues.cmd A B C D E F G H I J K L M N O P Q R S T U V W X\r\nThis tells me that I’ve run the Issues.cmd file located in the IncomingPay directory using CMD, which is\r\nimportant to know for later. Now back in Splunk, I start running queries to see what is happening on my victim\r\nVM:\r\nPress enter or click to view image in full size\r\nNotice the commands logged immediately after I took the bait!\r\nHow do I know what searches to run? I don’t really, but through my experience as an analyst and researcher I\r\nknow that there are certain types of evidence that may appear in my logs — process executions, file creations,\r\nnetwork connections, and the like. This is where on-the-job experience as a security analyst, or a good dose of\r\nCTF-style trainings (like those available at CyberDefenders or LetsDefend) come in handy.\r\nGet Micah Babinski’s stories in your inbox\r\nJoin Medium for free to get updates from this writer.\r\nRemember me for faster sign in\r\nAfter refreshing my searches a few times and wondering if I had done something wrong, I noticed a definite,\r\nincontrovertible sign of an intrusion: a burst of recon activity on my victim host:\r\nPress enter or click to view image in full size\r\nI forgot to include in the screenshot, but the parent process for all these commands was wermgr.exe\r\n(what???)\r\nAs I was looking at these commands, I noted the time period in which they occurred — all within the span of a\r\nfew seconds! Also, the processes were all spawned by an unusual parent — wermgr.exe (the Windows Error\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 6 of 14\n\nReporting Manager). A good detection opportunity, perhaps? Event the most confused admin or overbuilt piece of\r\n[legit] software will not run all these suspicious recon commands within such a short timeframe, and never as\r\nchildren of a wermgr.exe. Swinging over to my network connections, I noticed that wermgr.exe had suddenly\r\nbecome extremely chatty with external IP addresses:\r\nPress enter or click to view image in full size\r\nUhhhhh………………\r\nFollowing this initial assessment of my data, I determined that:\r\nThe malware sample contained a malicious initial access vector (duh, the rogues gallery of the .html, .zip,\r\n.img, .lnk, .cmd, and .lc files).\r\nThe malware would download the zip file after opening the HTML page in a browser, and the zip file\r\ncontained a mountable .img file.\r\nThe mounted drive contained a malicious shortcut that would result in the execution of additional\r\ncommands.\r\nAn injected wermgr.exe process spawned an automated burst of recon activity and then connected to\r\nsuspicious external IP addresses, presumably to send the attacker information about our system and\r\nnetwork.\r\nIn Mitre ATT\u0026CK terms, this all amounts to Initial Access, Defense Evasion, Discovery, and Command and\r\nControl. In other words: big oof.\r\nPress enter or click to view image in full size\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 7 of 14\n\nRogues Gallery\r\nBreaking Down the Attack into Detection Building Blocks\r\nHaving performed an initial execution of the QakBot HTML smuggling technique in the lab and reverted to my\r\nVM snapshots, it was time to dig further into the logs to see what was happening.\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 8 of 14\n\nAs I reviewed the various types of events generated by the attack, I began to develop a plain-language narrative\r\nunderstanding of what was happening:\r\n“An attacker sends a victim an HTML file that purports to contain a report, invoice, or other document\r\nof interest to them. Basically, a phishing lure. When the victim opens the file in their browser,\r\nembedded JavaScript code executes, which downloads or builds a password-protected zip file on the\r\nvictim’s system. When the victim unzips the archive, the extraction creates a mountable disk image\r\nfile. After mounting the drive, the user sees a shortcut that they believe will take them to the resource\r\nthey are trying to find. But the shortcut actually calls a command or scripting interpreter that executes\r\nother malicious files that are hidden on the disk drive, leading to initial access compromise.”\r\nThat’s a wordy chunk of prose right there, but it makes sense in my head. If I am going to detect something, I need\r\nto understand it first! Note the bold text: I used my narrative to highlight events that I could use to detect the\r\nattack chain. Based on this review, I analyzed the logs from the attack, trying to isolate the following events:\r\n1. Phishing email sent to victim containing an HTML attachment.\r\n2. Creation of an HTML file in suspicious locations.\r\n3. Opening of a stand-alone HTML file in a browser application.\r\n4. Download/creation of a zip file by the browser application.\r\n5. Opening/extraction of a password-protected zip file.\r\n6. Creation of a mountable disk drive file format (.iso, .img, etc).\r\n7. Mounting a drive.\r\n8. Process execution on an external drive (either from an executable on that drive, or system executable\r\ntouching files on that drive).\r\nWhew! That’s…a lot of events.\r\nThe Good News\r\nThe good news was, there are ways to detect nearly every event listed above! This means I had numerous ideas for\r\nhow to query and filter available logs to extract the meaningful events (building blocks) for future detections.\r\nThe Bad News\r\nThe bad news was, none of these many events is, on its own, malicious. Again, there is no YOU HAVE BEEN\r\nHACKED log: each one of these events could occur in the course of normal business and be completely safe and\r\nbenign. The solution would be to correlate these events, in an ordered or unordered sequence, grouping them by a\r\ncommon attribute like hostname, and triggering an alert when all of these occur within a time window, like an\r\nhour. The problem is, from my analysis and testing (more on that in a moment), chaining or correlating that many\r\nevents would result in a brittle detection.\r\nBrittle vs. Resilient\r\n“Brittleness” describes the degree to which a detection idea falls apart in the face of subtle changes to attacker\r\ntechniques, variations in actions performed by the victim, problems with logging, or other factors outside of our\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 9 of 14\n\ncontrol. Brittle detections contrast with “resilient” detections, which are flexible and can withstand these subtle\r\nshifts.\r\nIt’s not as simple as saying “Brittle = bad and resilient = good.” A brittle detection could be highly-targeted, with a\r\nnarrow “aperture.” The advantage could be that, if a brittle detection rule fires, there is a very high probability that\r\nit is a true positive. Resilient detections may have a wider aperture and may match more potential malicious\r\nactivity. However, this could lead to false positives and a frustrated SOC if they are not developed with care.\r\nWith this in mind, I returned to my list of events from above.\r\nDetermining the Building Blocks to Test\r\nIn context, I realized that items one through three in the list above may not be good components of my HTML\r\nSmuggling detection. Lack of visibility and a high volume of innocent behavior would hinder item 1. I struck item\r\ntwo because I had limited ability to test this event, and item three proved unreliable depending on which browser I\r\nused. Plus, while not technically HTML Smuggling, a lot of QakBot activity uses URLs, rather than stand-alone\r\nHTML files, to deliver the initial payload.\r\nItem five (extraction of a password-protected zip file) has lots of potential, and can be detected using this rule\r\nwritten by Florian Roth and inspired by the research of @SBousseaden. However, I left it out of my scope because\r\n1) that event was not logging in my lab and 2) some documentation indicates that it is only applicable to certain\r\noperating systems.\r\nAfter exploring my data, enumerating possible detection building blocks for my correlation, and winnowing that\r\nlist down based on further analysis, I had the following building blocks:\r\n1. Web Browser Creates (Downloads) Zip Archive File (represents opening the malicious HTML file in a\r\nbrowser).\r\n2. ISO, VHD, LNK or IMG File Extracted from Zip (extracting the malicious disk image file).\r\n3. Disk Image Mount (mounting the image — this one I pulled directly from Sigma).\r\n4. Suspicious User-Initiated Process Execution on External Drive (clicking the .lnk file which runs or\r\nreferences files on the external drive).\r\nWith these building block concepts in place, I wrote or adapted a query and Sigma rule for each one, then\r\ncorrelated them into a chain rule.\r\nSigma Correlations\r\nCorrelation allows us to track log events through time. Rather than triggering an alert for each of the four events\r\nlisted above, a chain rule correlation allows us to alert only when all four events occur in order on the same host\r\nby the same user, which is much more suspicious. Many SIEM products and their corresponding query languages\r\nsupport this type of chaining (although some do not).\r\nTo support this functionality, Sigma has a Correlations standard in progress that will allow us to write custom\r\ncorrelation rules in a common format then convert them to whatever SIEM product supports this logic. The draft\r\nstandard can be reviewed here:\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 10 of 14\n\nWhat might this look like? It’s a draft standard, and subject to change, but a simple example of a brute force\r\nchain rule might look like this:\r\naction: correlation\r\ntype: temporal\r\nrule:\r\n - many_failed_logins\r\n - successful_login\r\ngroup-by:\r\n - User\r\ntimespan: 1h\r\nordered: true\r\nWhen the rule query “many_failed_logins” is matched followed by the “successful_login” rule within a one hour\r\nwindow, the correlation rule will fire. For a good example of a SIEM product that supports correlations, check out\r\nSumoLogic chain rules.\r\nMy draft correlations rule looks like this:\r\ntitle: HTML Smuggling Activity - Chain Rule\r\nid: 0952f2fa-e29b-4eb5-831c-ce21520c56e3\r\nstatus: experimental\r\ndescription: Detects HTML smuggling-style compromise (such as HTML \u003e ZIP \u003e ISO/IMG/VHD \u003e CMD/BAT/VBS\r\nreferences:\r\n - https://blog.talosintelligence.com/html-smugglers-turn-to-svg-images/#:~:text=HTML%20smuggling%\r\n - https://www.malwarebytes.com/blog/news/2021/11/evasive-maneuvers-html-smuggling-explained\r\n - Original research and analysis performed off of QakBot intelligence gathered at https://github\r\nauthor: Micah Babinski\r\ndate: 2022/12/27\r\ntags:\r\n - attack.s0650\r\n - attack.s0483\r\n - attack.initial_access\r\n - attack.defense_evasion\r\n - attack.execution\r\n - attack.t1564\r\n - attack.t1566.001\r\n - attack.t1566\r\n - attack.t1027\r\n - attack.t1027.006\r\n - attack.t1059\r\n - attack.t1204\r\n - attack.t1204.002\r\naction: correlation\r\ntype: temporal\r\nrule:\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 11 of 14\n\n- 1_win_zipfile_drop.yml\r\n - 2_win_susp_file_extraction.yml\r\n - 3_win_security_iso_mount.yml\r\n - 4_win_process_creation_ext_drive.yml\r\ngroup-by:\r\n - ComputerName\r\n - User\r\ntimespan: 1h\r\nordered: true\r\nfalsepositives:\r\n - Unknown\r\nlevel: high\r\nThis looks like many other Sigma rules you may have seen before, but has some unique elements. The action and\r\ntype statements let you know this is a correlation rule of the temporal type. The four rules listed show the\r\ncomponent rules in scope (these must be in the same directory as the correlation). The timespan specifies that the\r\nfour participating rules must fire within one hour of each other, and the group-by statement defines fields in the\r\nrules which relate them to each other. Finally, the ordered: true statement lets Sigma know that these rules should\r\noccur in sequence.\r\nAgain, the Sigma Correlations specification is in development, and is subject to change. Still, this will\r\nbe a very useful expansion of Sigma’s capabilities, so I wanted to preview it for you now!\r\nTesting the Correlation\r\nWith this detection concept taking shape, and a hypothesis developed in the form of my correlation rule, it was\r\ntime to test the detection with a larger sample size. I relied on the MalwareBazaar repository from Abuse.ch,\r\nwhich provides a helpful library of tagged malware samples. I downloaded 10 QakBot HTML samples, mostly\r\nreported by pr0xylife, ranging in date first seen from July 11 to December 22, 2022. I prepared the samples on my\r\nlab VM, and named each one according to its first seen date and its “humanhash” property (a random, unique\r\nsequence of human-readable words, such as (“dakota-earth-mockingbird-march”):\r\nPress enter or click to view image in full size\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 12 of 14\n\nMy 10 lovely HTML smuggling samples all ready to test\r\nTo track my results, I made a simple table in Google Sheets:\r\nPress enter or click to view image in full size\r\nI made a clean, pre-test snapshot of my victim VM host to revert back to after each test, and started in on the test.\r\nAfter testing seven of the ten samples, my sequence of four building blocks had perfect coverage! When I hit the\r\neighth sample, however, the fourth rule in the chain did not fire, because the .lnk file called RunDLL32.exe\r\ndirectly, instead of cmd.exe or wscript.exe. This was fine — I simply made an adjustment to the conditions of my\r\nSigma rule, retested, and got perfect matches! I was delighted with the results and excited to share the detection\r\nuse case with the community.\r\nBonus Round!\r\nMany QakBot phishing attacks do not use .html attachments, but instead use malicious .pdf files with embedded\r\nlinks, or just good old-fashioned phishing links that point to zip files hosted on a compromised website. To test\r\nwhether my detection was flexible enough to catch these instances as well, I tested them on a couple of recent\r\nexamples from the IOC repository maintained by @ExecuteMalware, and found that these URL-based driveby\r\nattacks were also caught by the detection. The one documented here even included a .wsf (Windows Script File) \r\n— a file type with which I was unfamiliar but was nonetheless caught by my detection.\r\nPress enter or click to view image in full size\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 13 of 14\n\nWindows Script File (.wsf) Delivered During a Recent QakBot Attack\r\nHooray for resilience!\r\nConclusion\r\nCongratulations! You’ve reached the end of a lengthy post about some dense, complicated topics. You can find all\r\nthe rules referenced here. Thank you for reading, and I hope you’ve enjoyed my ramblings on QakBot, HTML\r\nSmuggling, Sigma, and correlations. I am truly excited for Sigma Correlations to launch. It will be a big win for\r\nthe detection engineering community and will allow us to share more sophisticated detection use cases.\r\nI was really pleased with how my tests went, particularly when it showed that one of the building blocks in my\r\nchain rule had failed and needed adjustment! After all, why test if you think your work is already perfect? Lastly,\r\nthis experience drove home for me what I had already started to believe — that we can’t detect what we don’t\r\nunderstand. There’s no substitute for first-hand experience with real live malware if you are trying to see what it\r\ndoes.\r\nThanks to the Detection Lab project that I’ve enthused about in previous posts, this real-life experience is in reach\r\nfor more people than ever. Lastly, thanks to pr0xylife, Brad (malware_traffic), and executemalware for providing\r\npristine repositories of well-documented QakBot malware samples for us to access, analyze, and understand.\r\nThese are truly an educational treasure trove!\r\nPlease feel free to send me ideas, comments, or suggestions on how I can improve. I am still new to this, and I\r\nwelcome respectful feedback and critique in any form.\r\nAs always, happy analyzing! 🧐\r\nSource: https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nhttps://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841\r\nPage 14 of 14",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"references": [
		"https://micahbabinski.medium.com/html-smuggling-detection-5adefebb6841"
	],
	"report_names": [
		"html-smuggling-detection-5adefebb6841"
	],
	"threat_actors": [],
	"ts_created_at": 1775434112,
	"ts_updated_at": 1775791294,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/bce30e637037119a59cebffa0a1e66bcd5c74ea2.pdf",
		"text": "https://archive.orkl.eu/bce30e637037119a59cebffa0a1e66bcd5c74ea2.txt",
		"img": "https://archive.orkl.eu/bce30e637037119a59cebffa0a1e66bcd5c74ea2.jpg"
	}
}