{
	"id": "79f894ea-6a8a-4713-8bd0-2c1bdc724236",
	"created_at": "2026-04-06T00:08:20.32373Z",
	"updated_at": "2026-04-10T03:20:21.398546Z",
	"deleted_at": null,
	"sha1_hash": "81c83be1f229edf110b7ce72ba5a3b15ed640c9c",
	"title": "Using similarity to expand context and map out threat campaigns",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 2069124,
	"plain_text": "Using similarity to expand context and map out threat campaigns\r\nBy Emiliano Martinez\r\nArchived: 2026-04-05 14:02:01 UTC\r\nTL;DR: VirusTotal allows you to search for similar files according to different orthogonal notions\r\n(structure, visual layout, icons, execution behaviour, etc.). File similarity can be combined with the “have:”\r\nsearch modifier in order to gain more context about threats, e.g. what are the emails or URLs that\r\ndistribute them.\r\nThis is the second blog post in our similarity series, the first article focused on how to trigger file similarity\r\nsearches and the different similarity vectors at your disposal. In the context of this series we have also done a\r\nwebinar that can be viewed on-demand, it focuses on using similarity to automatically produce optimal YARA\r\nrules to detect a given malware framework/family/campaign via VTDIFF.\r\nThis situation might sound familiar. As a SOC analyst or Incident Responder you are often confronted with files\r\nyou know nothing about. Your SIEM describes their internal sightings and actions but fails to transmit the bigger\r\npicture. You are constrained by the narrow visibility of your corporate logs. Context is king and the problem is\r\nthat you are fighting threat actors that operate globally with just a piece of the puzzle, your local data.\r\nWhat is this file? Who is behind it? What is their modus operandi? How did it get there? Are there other related\r\ncomponents? What does it do? Are there other variants that could have impacted my organization in the past? Any\r\nthat could impact us in the future? How do I contain it? Your SIEM, case management system, EDR, firewall, IDS\r\netc. don’t answer these questions. You are missing a necessary layer in your defense-in-depth security strategy.\r\nVirusTotal is your saving grace. You jump into VT ENTERPRISE and look up the hash: threat reputation is useful,\r\nbut you need further context. Your task is to identify IoCs that can be used for remediation, e.g. by blocking a\r\ncommand-and-control domain in the network perimeter, as well as artefacts that can be used for proactive threat\r\nhunting purposes, to determine whether there has been a breach and what is its scope. The issue is that sometimes\r\nVirusTotal does not have full context for a specific individual file in terms of sandbox reports, in-the-wild\r\nsightings, relationships, etc. and so your investigation might end here.\r\nHow to do it better\r\nIsolated hashes are of limited value. Many times they are unique per victim or campaign, so a better idea would be\r\nfinding the cluster/family/campaign they belong to in order to unearth remediation IoCs and threat hunting\r\npatterns. Most importantly, you need to leverage those groupings in order to surface command-and-control\r\ndomains, dropzones, distribution URLs, phishing emails, etc. that can be used for mitigation and containment,\r\nand, to build proper understanding and situational awareness.\r\nSimilarity and the “have” search modifier to the rescue. Let’s imagine the initial hash that popped up as an alert in\r\nour environment was a first stage EMOTET dropper, i.e. a document that delivers a malicious payload through\r\nhttps://blog.virustotal.com/2020/11/using-similarity-to-expand-context-and.html\r\nPage 1 of 9\n\nmacros.\r\nThreat reputation allows you to perform an immediate first assessment (alert triage), but other than that there is\r\nlittle context in terms of remediation IoCs and hunting artifacts. We still know nothing about how this file gets\r\ndistributed, i.e. its delivery vector. Similarly, we fully ignore whether this is something spear phished exclusively\r\nagainst our organization or part of a larger campaign. What about the threat network infrastructure? Does it\r\ndownload additional payloads? Does it communicate with a command-and-control?\r\nThe next step in an incident response engagement - and this is what most analysts fail to do - is to jump into the\r\nfile’s cluster (its family/framework/campaign) in order to expand context and surface IoCs. This is just one click\r\naway:\r\nhttps://blog.virustotal.com/2020/11/using-similarity-to-expand-context-and.html\r\nPage 2 of 9\n\nFor documents there is a limited number of approaches to find similar files (other file formats will expose more),\r\nthis said, they are very rich because they are fully orthogonal: structural features, visual layout, local sensitive\r\nfuzzy hashing, execution behaviour similarity. Let’s jump to other similar files based on the document’s visual\r\nlayout by clicking on “Similar by icon/thumbnail” or on the thumbnail itself, located in the top right:\r\nmain_icon_dhash:23232b2b00010000.\r\nThere are too many matches, we would have to iterate over every single one in order to surface particular patterns\r\nthat may allow us to understand the campaign.\r\nFinding phishing emails that distribute the threat\r\nWe can narrow down the search above to match exclusively those files that have been seen as an attachment in\r\nsome email uploaded to VirusTotal:\r\nmain_icon_dhash:23232b2b00010000 AND have:email_parents\r\n(Note that you can also use tag:attachment instead of have:email_parents)\r\nWe can now run through the matching files, open up their Relations tab and jump into the pertinent email parent,\r\nso as to understand the deception techniques being used in the campaign:\r\nhttps://blog.virustotal.com/2020/11/using-similarity-to-expand-context-and.html\r\nPage 3 of 9\n\nThis particular instance poses as some kind of World Health Organization report on COVID. It is important to\r\ninspect all the other emails because not only will they tell us more about the lures, it will also allow us to identify\r\ntargeted industries, geographical spread, activity time spans, etc. For instance, there could be other localized\r\nvariants that could be targeting some other corporate branches. Access to these emails will not only give us greater\r\ninsight into the attacker, it is also something we can leverage tactically in order to improve filtering in our email\r\ngateways.\r\nDiscovering URLs that distribute this threat\r\nWe want to see if this campaign is also being distributed via download URLs. If that´s the case we can block them\r\nin our network perimeter or use them to search across web proxy logs. Let’s ask VirusTotal whether any of the\r\nfiles in the cluster have associated in-the-wild URLs:\r\nmain_icon_dhash:23232b2b00010000 AND have:itw\r\nWe can now jump into the Relations tab in order to export these additional IoCs:\r\nThere are over 3K files with in-the-wild URLs, note that we can automate all of this via the API.\r\nIdentifying command-and-control/exfiltration infrastructure\r\nhttps://blog.virustotal.com/2020/11/using-similarity-to-expand-context-and.html\r\nPage 4 of 9\n\nThe next step is to understand whether any of the machines in our corporate fleet are beaconing out to\r\ninfrastructure tied to this campaign. At the same time, we will probably want to block the CnC and exfiltration\r\npoints in order to mitigate the impact of historical undetected breaches. Let’s filter down the search to focus\r\nexclusively on those files that exhibited network communications when executed in a dynamic analysis sandbox:\r\nmain_icon_dhash:23232b2b00010000 AND have:behaviour_network\r\nMost of the matching files have been analysed by several sandboxes participating in our multi-sandbox effort.\r\nThis gives us unparalleled visibility into the campaign. For an attacker it is easy to evade a single sandbox, it is far\r\nmore complex to do so for 17+ of them at the same time. Each one of them set up in a different geographical\r\nregion, going out to the internet through a different IP address, running different OS versions, with different\r\nsoftware and language packages installed, etc. As a result, we now have very interesting sightings in terms of\r\ninfrastructure:\r\nhttps://blog.virustotal.com/2020/11/using-similarity-to-expand-context-and.html\r\nPage 5 of 9\n\nThese communication points can be very easily triaged. Remember that VirusTotal also characterizes domains, IP\r\naddresses and URLs. Threat reputation for these domains further confirms that they are accurate IoCs:\r\nThe domain relationships (in-the-wild sightings) tell the same story:\r\nhttps://blog.virustotal.com/2020/11/using-similarity-to-expand-context-and.html\r\nPage 6 of 9\n\nWe now have additional IoCs that we can feed into our stack in order to proactively defend our organization from\r\nother variants. As a bonus point, pivoting to other campaign files that have sandbox behaviour reports allows us to\r\nshed more light into other TTPs that we might be tracking via MITRE ATT\u0026CK (e.g. installation, actions on\r\nobjectives, etc.).\r\nGaining context through the community\r\nFurthering on the use of the “have” search modifier, we can also leverage it to find files on which some VT\r\nCommunity user has placed a comment providing more context:\r\nmain_icon_dhash:23232b2b00010000 AND have:comments\r\nCommunity comments often give us interesting details in terms of in-the-wild observations, malware capabilities,\r\nreverse engineering reports, attribution, etc. For example, in this particular case we learn about additional\r\ndistribution URLs:\r\nhttps://blog.virustotal.com/2020/11/using-similarity-to-expand-context-and.html\r\nPage 7 of 9\n\nThis other case helps us understand that this first stage is EMOTET and allows us to jump into a pastebin dump\r\nwith further context about the campaign in terms of related hashes and network infrastructure:\r\nAdditional context\r\nThe “have” modifier accepts many other values, some of the more representative ones are:\r\ncompressed_parents: the files were seen inside a compressed file uploaded to VirusTotal.\r\npcap_parents: the files were seen in a network traffic recording uploaded to VirusTotal.\r\nembedded_(urls/domains/ips): a URL/domain/IP address pattern was extracted from the binary bodies of\r\nthe files.\r\nbehaviour: the files managed to execute in at least one sandbox and produced the pertinent dynamic\r\nanalysis report.\r\nbehaviour_registry: the files executed in a sandbox and interacted with the Windows Registry.\r\ncrowdsource_yara_rule: the files match some YARA rule coming from open source community\r\nrepositories, these rules often provide additional references and descriptions about a threat.\r\nSumming up\r\nVirusTotal aggregates orthogonal means to cluster together groups of related files. Files which may belong to the\r\nsame malware family/framework/campaign/actor. These file similarity vectors range from structural features to\r\ndynamic analysis observations.\r\nhttps://blog.virustotal.com/2020/11/using-similarity-to-expand-context-and.html\r\nPage 8 of 9\n\nWe started off with a single IoC for which we had little context, neither did VirusTotal, beyond basic threat\r\nreputation. By leveraging file similarity we managed to find thousands of other files related to the\r\ncampaign/malware framework. Through the “have” search modifier we then narrowed down our searches to\r\nidentify phishing emails used by the attackers, distribution URLs, additional network infrastructure such as CnCs\r\nand context shared by other threat researchers.\r\nAll of this is tactical intelligence that can be fed into network perimeter defenses, but also context that can be\r\noperationalized and digested into TTPs in order to characterize threat actors. Finally, this blog post presented an\r\nincident response scenario but the very same logic can be applied to threat actor tracking or campaign monitoring\r\nuse cases.\r\nThis post was authored by Emiliano Martinez.\r\nSource: https://blog.virustotal.com/2020/11/using-similarity-to-expand-context-and.html\r\nhttps://blog.virustotal.com/2020/11/using-similarity-to-expand-context-and.html\r\nPage 9 of 9",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"references": [
		"https://blog.virustotal.com/2020/11/using-similarity-to-expand-context-and.html"
	],
	"report_names": [
		"using-similarity-to-expand-context-and.html"
	],
	"threat_actors": [],
	"ts_created_at": 1775434100,
	"ts_updated_at": 1775791221,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/81c83be1f229edf110b7ce72ba5a3b15ed640c9c.pdf",
		"text": "https://archive.orkl.eu/81c83be1f229edf110b7ce72ba5a3b15ed640c9c.txt",
		"img": "https://archive.orkl.eu/81c83be1f229edf110b7ce72ba5a3b15ed640c9c.jpg"
	}
}