{
	"id": "a5e30600-ff15-4986-9191-ea23a0b94330",
	"created_at": "2026-04-06T00:20:07.587722Z",
	"updated_at": "2026-04-10T03:21:12.041359Z",
	"deleted_at": null,
	"sha1_hash": "36c1de54b4eac5f05a056ec7f08169f1fe5d510c",
	"title": "A Death Match of Domain Generation Algorithms",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 829941,
	"plain_text": "A Death Match of Domain Generation Algorithms\r\nBy Yuriy Yuzifovich\r\nPublished: 2022-01-19 · Archived: 2026-04-05 16:06:45 UTC\r\nAuthors: Yuriy Yuzifovich and Hongliang Liu, originally published on December 29, 2017\r\nThe dictionary definition of Domain generation algorithms (DGA) is “algorithms seen in various families of\r\nmalware that are used to periodically generate a large number of domain names that can be used as rendezvous\r\npoints with their command and control servers” (https://en.wikipedia.org/wiki/Domain_generation_algorithm ). In\r\nthe real-life recursive DNS traffic we monitor at Nominum (- now part of Akamai) we observe a lot of ‘strange’\r\nDNS queries, many of them generated by malware DGAs; below are some examples for domains generated by the\r\nDyre banking trojan (aka Dyreza):\r\nt3622c4773260c097e2e9b26705212ab85.ws.\r\nu83ccf36d9f02e9ea79a9d16c0336677e4.to.\r\nv02bec0c090508bc76b3ea81dfc2198a71.in.\r\nwa9e4628c334324e181e40f33f878c153f.hk.\r\nxdcc5481252db5f38d5fc18c9ad3b2f7fd.cn.\r\nyf32d9ac7f0a9f463e8da4736b12d7044a.tk.\r\nMalware creators use algorithmically generated domains as a diversion mechanism: they flood the DNS stream\r\nwith requests for thousands of DGA-based domains but select only a few domains to provide the true C\u0026C\r\nservice, where the malware can find its mothership and communicate for instruction. Meanwhile, poor security\r\nresearchers get overloaded with work trying to discover and block the selected few.\r\nOther than creating a diversion, malware creators use DGA’s because they are harder to detect compared to\r\nhardcoded IPs or domain names; by not hardcoding the location of the C\u0026C in the malware binary itself, the\r\nattacker can better hide and protect the mothership; Attackers keep creating new DGAs, and once again — create\r\nwork overload for security researchers, who need to reverse-engineer binaries, or use different machine\r\nintelligence driven methods, in order to discover the DGAs.\r\nIn this article, we are going to discuss this deathmatch between attackers and security researchers on DGA\r\nbattleground.\r\nTHE DGA BATTLEGROUND\r\nThe general methodology of any DGA is using a deterministic pseudo-random generator (PRNG) to generate a list\r\nof candidate domain names. The seed of a PRNG can be the current date, some magic numbers, an exchange rate,\r\netc. This random generator can be a single uniform distribution generator, e.g. use a combination of bitshift, xor,\r\nhttps://medium.com/@yvyuz/a-death-match-of-domain-generation-algorithms-a5b5dbdc1c6e\r\nPage 1 of 7\n\ndivide, multiply, and modulo operations to generate a string sequence as the domain name (such as in Conficker,\r\nRamnit, and others); it can also be a rule generator, which selects from some knowledge base (such as in\r\nSuppobox). For example, the following DGA algorithm uses the current date as the seed, and a PRNG to generate\r\na char sequence for the DGA domains ( https://en.wikipedia.org/wiki/Domain_generation_algorithm):\r\ngenerate_domain(year, month, day):\r\n“””Generates a domain name for the given date.”””\r\ndomain = “”\r\nfor i in range(16):\r\nyear = ((year ^ 8 * year) \u003e\u003e 11) ^ ((year \u0026 0xFFFFFFF0) \u003c\u003c 17)\r\nmonth = ((month ^ 4 * month) \u003e\u003e 25) ^ 16 * (month \u0026 0xFFFFFFF8)\r\nday = ((day ^ (day \u003c\u003c 13)) \u003e\u003e 19) ^ ((day \u0026 0xFFFFFFFE) \u003c\u003c 12)\r\ndomain += chr(((year ^ month ^ day) % 25) + 97)\r\nreturn domain\r\nPress enter or click to view image in full size\r\nThe key to the PRNG-based DGA methodology is a deterministic random generator, where the DGA sequence is\r\npredictable from both the malware and the attacker, so the attacker can generate and select some of these domains\r\nfor C\u0026C service, and the malware just needs to loop over and reach the chosen C\u0026C.\r\nThis part requires some agreement: both the DGA and the seed must be known by both sides before generating\r\nDGA domains. However, this agreement exchange isn’t only to the malware and the attacker; anyone, like we\r\nsecurity researchers, can replicate it even after the infection. This non-exclusive feature provides the breakthrough\r\npoint for security research: by intercepting both the DGA and the seeds, one can predict the malware DGA\r\ndomains and block them.\r\nTo obtain the DGA algorithm itself, security researchers might need to reverse engineer the malware binary after\r\ncapturing the malicious binary code. Many DGA algorithms are reverse engineered and reported by multiple\r\nprojects and security blogs, such as:\r\nhttps://medium.com/@yvyuz/a-death-match-of-domain-generation-algorithms-a5b5dbdc1c6e\r\nPage 2 of 7\n\nDGArchive https://dgarchive.caad.fkie.fraunhofer.de/\r\n360netlab’s DGA project https://github.com/360netlab/DGA\r\nJohannes Bacher’s reversing efforts https://github.com/baderj/domain_generation_algorithms.\r\nHaving the DGA algorithm and knowing the DGA seed is a sufficient condition to predict DGA domains, but is\r\nnot a necessary one to have the DGA domain list: we can reduce the problem to separate DGA traffic from\r\nlegitimate traffic, and obtain the DGA domain list from the traffic. In DNS traffic, we can model a feature phase\r\nspace where DGA domain queries and other legitimate queries are separable, where the ground truth of DGA\r\n(algorithm and seeds) are not needed, and the task can be abstracted as finding this “golden phase” space.\r\nThere are several advanced machine learning methods to find this phase space and separate the malicious DGA\r\nfrom the legitimate, without reverse engineering the binary. Most of these methods use client IP vs domain visit\r\ngraph features; for example:\r\nOur team’s Domain2vec correlation engine uses representation learning to discover DGA clusters in real-time\r\nDNS traffic. This method builds a sequence model to learn the domain correlation and captures the malware\r\nactivity since the malware needs to loop over DGA names. (See also — “Augmented Intelligence to Scale\r\nHumans Fighting Botnets”, https://www.botconf.eu/2017/augmented-intelligence-to-scale-humans-fighting-botnets/):\r\nPress enter or click to view image in full size\r\nPress enter or click to view image in full size\r\nhttps://medium.com/@yvyuz/a-death-match-of-domain-generation-algorithms-a5b5dbdc1c6e\r\nPage 3 of 7\n\nThese methods have reduced the strong condition to a loose yet more general condition and has solved the\r\ndifficulty of obtaining both DGA and the seeds. In the later part of this post, we will talk about some cases which\r\nonly these loose condition methods can detect.\r\nAfter obtaining the DGA algorithms, the battleground now moves on to the random seed front…\r\nMAGIC NUMBER SEEDS\r\nSome DGAs only use the current date as the seed, and hardcode some numbers in the binary; these DGA’s can be\r\neasily predicted when the algorithm is reverse-engineered, for example, in Conficker families, Nymaim etc.\r\nGet Yuriy Yuzifovich’s stories in your inbox\r\nJoin Medium for free to get updates from this writer.\r\nRemember me for faster sign in\r\nSince the attacker’s goal is to avoid detection, it becomes practical to use magic numbers as dynamic seeds. The\r\nmagic number technique is very common today, and Necurs (the backdoor), Locky (the ransomware) ( https://test-nominum.pantheonsite.io/unlocking-locky/) are good examples of the combined usage of date/time and magic\r\nnumbers. Magic numbers are usually combined with the date in bit-shifting and provide additional variance.\r\nPopular malware like Locky can deploy many variants with different magic numbers each to evade detection (see\r\nalso — https://blogs.forcepoint.com/security-labs/lockys-new-dga-seeding-new-domains).\r\nSince DGA can generate DNS traffic with seeds, astute researchers can get the seeds by using DNS traffic and the\r\nDGA. To capture the dynamic magic number seeds, researchers usually use “replay attack” technique by\r\nhttps://medium.com/@yvyuz/a-death-match-of-domain-generation-algorithms-a5b5dbdc1c6e\r\nPage 4 of 7\n\nreproducing possible DGA domains and validating it in the DNS traffic. OpenDNS, for instance, has a brute force\r\nmethod to search the possible numerical seeds (magic numbers) in the DNS traffic by generating all 2³² sets of\r\nRamnit names ( https://www.slideshare.net/OpenDNS/using-algorithms-to-brute-force-algorithms-a-journey-through-time-and-namespace).\r\nThis method works well not only for ramnit but also for Necurs and other DGAs, especially when the magic\r\nnumbers are small. However, this method is not always useful because generating all 2³² names can be expensive,\r\nand the malware can easily escape it by upgrading to a 2⁶⁴ seed, as already happened with the Murofet’s DGA.\r\nOur team has proposed and implemented a more sophisticated hash collision method, primarily to crack down\r\nLocky’s dynamic seeds ( https://www.botconf.eu/2017/math-gpu-dns-cracking-locky-seeds-in-real-time-without-analyzing-samples/). Instead of using brute force linear test on all seeds with domains in the DNS traffic, this\r\nmethod uses GPU computing and collide the hash value of possible Locky DNS queries with real-time DNS\r\ntraffic for detecting the new seeds.\r\nBeyond magic numbers, magic strings or magic domain names are also used for generating DGA domains.\r\nCurrently, there are not many effective methods to detect these seeds beyond reverse engineering the binary.\r\nOther types of seeds\r\nThe exchange rate of currency can be used as random seeds. Bedep the Ad/Click fraud botnet, for example, use\r\nforeign currency exchange rate as their seed ( https://www.arbornetworks.com/blog/asert/bedeps-dga-trading-foreign-exchange-for-malware-domains/). Some botnets use the most popular hashtag on twitter as the DGA seed,\r\nas reported by Cybereason ( http://go.cybereason.com/rs/996-YZT-709/images/Cybereason-Lab-Analysis-Dissecting-DGAs-Eight-Real-World-DGA-Variants.pdf). The common idea behind these seeds is that they are\r\nhard to reproduce and that the seed may not be a simple number.\r\nGOOD DGA, BAD DGA\r\nThe creators of DGA algorithms want to keep the uniqueness of the DGA so they can distinguish their C\u0026C traffic\r\nfrom legitimate traffic, and also avoid collision with other DGAs. Our research has shown us that some DGAs are\r\nsmarter than others.\r\nDictionary based DGA\r\nA little twist in the way algorithmically generated domains are created in the dictionary based method. As we’ve\r\nseen, security researchers use features in the DNS string to separate malicious DGA traffic from legitimate traffic.\r\nThe modeling work looks at attributes such as randomness, entropy and other lexical string features, which\r\nfrequently generate domains with a ‘random’, ‘non-human readable’ look. (see for example https://www.r-bloggers.com/building-a-dga-classifier-part-1-data-preparation/). Some cleverly designed DGAs such as\r\nSuppobox try to evade this randomness by using dictionary words:\r\nHigh collision DGA\r\nDGAs like Pykspa and Virut are getting lower grades in our notebook: they have strong collisions with other\r\nlegitimate names and other DGAs.\r\nhttps://medium.com/@yvyuz/a-death-match-of-domain-generation-algorithms-a5b5dbdc1c6e\r\nPage 5 of 7\n\nPykspa is a worm whose DGA is reverse-engineered at https://www.johannesbader.ch/2015/03/the-dga-of-pykspa/. This DGA generates thousands of possible DGA domains using common TLDs like com, biz, net, org,\r\ninfo and cc, and its core domain has 6–15 chars. These thousands of domains flood the recursive DNS traffic.\r\nBecause of the common TLD set and the short domain length for these huge amounts of domains, security\r\nresearchers have a hard time to clearly identify and block them, even if they know the DGA + seed to predict. For\r\nexample, some short domains like wgxodod.info. ydnpxkv.info. hrvxccq.org. have a good chance to\r\ncollide with other DGAs (such as Locky), or with legitimate .com names.\r\nVirut is another type of DGA where the domain name only has 6 a-z chars with .com TLD, and the algorithm\r\nitself has a simplistic design, so the chance of a generated domain colliding with a legitimate service is very high.\r\nWe have observed many domains like wenxin.com , which was a legitimate domain, yet it was reported as Virut\r\nby some security researcher ( https://twitter.com/DGAFeedAlerts/status/917181597400600576 ). And by the way,\r\nthe domain `akamai.com` follows the exact pattern of a Virut DGA. But don’t get too concerned…\r\nBlocking these high collision DGA domains in a safe way requires security researchers to combine the domain\r\nprediction method with DNS traffic; Our team has recently implemented a real-time new core domain detection\r\nsystem (for domains never seen before), where only the predicted DGA are blocked only if identified also as a\r\nnew core domain.\r\nNon DGA\r\nIn DNS traffic, we’ve observed many ‘DGA-look-alike’, which are not in fact DGA domains. For example, in\r\nrecent traffic we saw these 7 char .ru domains with very high infection rate:\r\nbhzlyxh.ru.\r\nqsxxzni.ru.\r\ngwjijru.ru.\r\nfyxkmbh.ru.\r\nqwoumzw.ru.\r\nkulfxxy.ru.\r\nnrxboty.ru.\r\n…\r\ninstead of a DGA, the author has hardcoded them in the binary and deployed different lists in different binaries.\r\nThese names are used in Ruskill/Dorkbot as reported at http://tech.cert-hungary.hu/vulnerabilities/CH-14106 and\r\nhttps://github.com/360netlab/DGA/issues/36.\r\nUpdate: there is an ongoing discussion here https://github.com/360netlab/DGA/issues/36#issuecomment-350660012 about the DGA behind Dorkbot, where Johannes Bader has commented that Dorkbot generates these\r\nnames every 10 seconds and uses as decoys.\r\nPress enter or click to view image in full size\r\nhttps://medium.com/@yvyuz/a-death-match-of-domain-generation-algorithms-a5b5dbdc1c6e\r\nPage 6 of 7\n\nSUMMARY\r\nDGA is one of the most effective and most popular tools in the attackers’ toolbox. It is being used by a variety of\r\nmalware families to hide the location of their C\u0026C servers, and by that maintain the robustness of the botnet. At\r\nthe same time, DGAs leave a substantial footprint in the DNS traffic.\r\nIn the deathmatch between DGA creators and security researchers, the attackers do their best to hide the C\u0026C and\r\nto avoid collision with other DGAs and legitimate services, while researchers use both the traditional reverse\r\nengineering and modern machine learning to clearly identify and block these DGAs. This battle is far from over\r\nand will continue to emerge as both sides grow stronger. We will keep you updated, stay tuned…\r\nSource: https://medium.com/@yvyuz/a-death-match-of-domain-generation-algorithms-a5b5dbdc1c6e\r\nhttps://medium.com/@yvyuz/a-death-match-of-domain-generation-algorithms-a5b5dbdc1c6e\r\nPage 7 of 7",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"MITRE"
	],
	"references": [
		"https://medium.com/@yvyuz/a-death-match-of-domain-generation-algorithms-a5b5dbdc1c6e"
	],
	"report_names": [
		"a-death-match-of-domain-generation-algorithms-a5b5dbdc1c6e"
	],
	"threat_actors": [],
	"ts_created_at": 1775434807,
	"ts_updated_at": 1775791272,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/36c1de54b4eac5f05a056ec7f08169f1fe5d510c.pdf",
		"text": "https://archive.orkl.eu/36c1de54b4eac5f05a056ec7f08169f1fe5d510c.txt",
		"img": "https://archive.orkl.eu/36c1de54b4eac5f05a056ec7f08169f1fe5d510c.jpg"
	}
}