{
	"id": "a84e82ee-d342-43ee-891d-1dcbefeb2196",
	"created_at": "2026-04-06T00:07:16.622684Z",
	"updated_at": "2026-04-10T13:11:40.863771Z",
	"deleted_at": null,
	"sha1_hash": "a35c2c740c3e7b60ad9c9afda401c92e256beffc",
	"title": "Combining supervised and unsupervised machine learning for DGA detection",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 2629648,
	"plain_text": "Combining supervised and unsupervised machine learning for\r\nDGA detection\r\nBy ByCamilla MontonenJustin IbarraCraig Chamberlain\r\nPublished: 2020-12-18 · Archived: 2026-04-05 21:56:18 UTC\r\nEditor’s Note — December 21, 2020: This blog has been updated since its original release to include a use\r\ncase that applies this workflow to the SUNBURST attack.\r\nIt is with great excitement that we announce our first-ever supervised ML and security integration! Today, we are\r\nreleasing a supervised ML solution package to detect domain generation algorithm (DGA) activity in your\r\nnetwork data.\r\nIn addition to a fully trained detection model, our release contains ingest pipeline configurations, anomaly\r\ndetection jobs, and detection rules that will make your journey from setup to DGA detection smooth and easy.\r\nNavigate to our detection rules repository to check out how you can get started using supervised machine learning\r\nto detect DGA activity in your network and start your free trial with Elastic Security today. \r\nDGAs: A breakdown\r\nDomain generation algorithms (DGA) are a technique employed by many malware authors to ensure that infection\r\nof a client machine evades defensive measures. The goal of this technique is to hide the communication between\r\nan infected client machine and the command \u0026 control (C \u0026 C or C2) server by using hundreds or thousands of\r\nrandomly generated domain names, which ultimately resolve to the IP address of a C \u0026 C server.\r\nTo more easily visualize what’s occurring in a DGA attack, imagine for a moment you’re a soldier on a battlefield.\r\nLike many soldiers, you have communication gear that uses radio frequencies for communication. Your enemy\r\nmay try to disrupt your communications by jamming your radio frequencies. One way to devise a countermeasure\r\nfor this is by frequency hopping — using a radio system that changes frequencies very quickly during the course\r\nof a transmission. To the enemy, the frequency changes appear to be random and unpredictable, so they are hard to\r\njam.\r\nDGAs are like a frequency-hopping communication channel for malware. They change domains so frequently that\r\nblocking the malware’s C2 communication channel becomes infeasible by means of DNS domain name blocking.\r\nThere are simply too many randomly generated DNS names to try and identify and block them. \r\nThis technique emerged in the world of malware with force in 2009, when the “Conficker” worm began using a\r\nvery large number of randomly generated domain names for communication. The worm’s authors developed this\r\ncountermeasure after a consortium of security researchers interrupted the worm’s C2 channel by shutting down the\r\nDNS domains it was using for communication. DNS mitigation was also performed in the case of the 2017\r\nWannaCry ransomware global outbreak.\r\nhttps://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nPage 1 of 12\n\nBlending in\r\nIf the best place to hide a tree is in a forest, malware operators have long recognized that blending in with normal\r\nweb traffic is one of the best ways to go undetected. An HTTP request with a randomly generated domain name is\r\na hard problem in network security monitoring and detection. The vast amount of HTTP traffic in modern\r\nnetworks makes manual review infeasible. Some malware and bots have unusual user agent strings that can be\r\nalerted on with search rules, but malware authors can easily leverage a user agent string that looks no different\r\nfrom a web browser.\r\nWith the rise of mobile and IoT, user agent strings have become so numerous that manual review for suspicious\r\nactivity is also becoming infeasible. Web proxies have long used categorization to look for URLs that are known\r\nto be suspicious, but DGA domains are so voluminous and short-lived that they are often not categorized. Threat\r\nintelligence feeds can identify IP addresses and HTTP requests that are associated with known malware families\r\nand campaigns, but these are so easily changed by malware operators that such lists are often outdated by the time\r\nwe put them to use in searches.\r\nThe sheer volume of network traffic collected in many organizations and the random nature of DGA-generated\r\ndomains makes detection of this activity a challenge for rule-based techniques — and a perfect fit for our\r\nsupervised machine learning model! Using Inference, Elastic’s DGA detection ML model will examine packetbeat\r\nDNS data as it is being ingested into your Elasticsearch cluster, automatically determining which domains are\r\npotentially malicious. Follow the steps in the next section to get started. \r\nGetting started\r\nTo get started with DGA detection within the security app, we have released a set of features to our publicly\r\navailable rules repository to assist with the importing of machine learning models to the Elastic Stack. This repo\r\nnot only provides our community a place to collaborate on threat detection, but also acts as a place to share the\r\ntools required to test and validate rules.\r\nPlease see our previous blog and webinar for additional information on the initiative. If you don’t already have an\r\nElastic Cloud subscription, you can try it out through our free 14 day cloud trial to start experimenting with the\r\nsupervised ML solution package to detect DGA activity\r\nPart of this rule toolkit is a CLI (command line interface) to not only test rules, but also interact with your stack.\r\nFor instance, we have released various Python libraries to interact with the Kibana API. This was critical in\r\nmaking an easier process for importing the model dependencies to get your rules operational. To start enriching\r\nDNS data and receiving alerts for DGA activity, follow these three steps:\r\nStep one: Importing the model\r\nFirst, you must import the DGA model, painless scripts, and ingest processors into your stack. Currently, DGA\r\nmodels and any unsupervised models for anomaly detection (more to come) are available in the detection-rules\r\nrepo using github releases. To upload, run the following CLI command:\r\nhttps://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nPage 2 of 12\n\npython -m detection_rules es \u003cargs_or_config\u003e experimental setup-dga-model -t \u003crelease-tag\u003e\r\nFollowing the upload, you will need to update your packetbeat configuration, as the model will enrich packetbeat\r\nDNS events with a DGA score. This can easily be done by adding the additional configuration to your\r\nElasticsearch output configuration:\r\noutput.elasticsearch:\r\n hosts: [\"your-hostname:your-port\"]\r\n pipeline: dns_enrich_pipeline\r\nThe supervised model will then analyze and enrich Packetbeat DNS events, which contain these ECS fields:\r\ndns.question.name\r\ndns.question.registered_domain\r\nThe model will then add these fields to processed DNS events:\r\nField name Description\r\nml_is_dga.malicious_prediction\r\nA value of “1” indicates the DNS domain is predicted to be the result of\r\nmalicious DGA activity. A value of “0” indicates it is predicted to be\r\nbenign. \r\nml_is_dga.malicious_probability\r\nA probability score, between 0 and 1, that the DNS domain is the result of\r\nmalicious DGA activity.\r\nA sample screenshot of enriched DNS data is shown below:\r\nNote: For more detailed information, please consult the detection-rules readme.\r\nAbout the DGA Rules\r\nhttps://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nPage 3 of 12\n\nNow let’s look at some conditional search rules that detect and alert on DGA activity. Two search rules are\r\nprovided in the package that can be enabled and run in the detection engine in the Elastic Security app:\r\n1. Machine Learning Detected a DNS Request Predicted to be a DGA Domain\r\n2. Machine Learning Detected a DNS Request With a High DGA Probability Score\r\nThe first rule matches any DNS event that has a DGA prediction value of 1, indicating the DNS domain name was\r\nprobably the product of a domain generation algorithm and is therefore suspicious. The rule, found here, simply\r\nlooks for the following condition:\r\nevent.category:network and network.protocol:dns and ml_is_dga.malicious_prediction: 1\r\nThe second rule matches any DNS event that has a DGA probability higher than 0.98, indicating the DNS domain\r\nname was probably the product of a domain generation algorithm and is therefore suspicious. The rule, found\r\nhere, simply looks for the following condition:\r\nevent.category:network and network.protocol:dns and ml_is_dga.malicious_probability \u003e 0.98\r\nLike all rules in the Elastic Detection Engine, they can be forked and customized to suit local conditions. The\r\nprobability score in the second rule can be adjusted up or down if you find that a different probability score works\r\nbetter with your DNS events. Either rule can have its risk score increased if you wish to raise the priority of DGA\r\ndetections in your alert queue. Exceptions can be added to the rules in order to ignore false positives such as\r\ncontent distribution network (CDN) domains that may use pseudorandom domain names.\r\nAnother future possibility we plan to explore is to use event query language (EQL) to look for clusters of anomaly\r\nor search-based alerts using multivariate correlation. For example, if we see a cluster of alerts from a host engaged\r\nin probable DGA activity, confidence increases that we have a significant malware detection that needs attention.\r\nSuch a cluster could consist of DGA alerts combined with other anomaly detection alerts such as a rare process,\r\nnetwork process, domain, or URL. These additional anomaly detections are produced by the library of machine\r\nlearning packages included in the Elastic Security app.\r\nStep two: Importing the rules\r\nThe rules in the DGA package can be imported using the kibana rule-upload feature in the detection-rules CLI (in\r\nthe format of .toml). Since the rules provided in detection-rules repo releases are in .toml format, simply run the\r\nfollowing command to upload a rule from the repo:\r\npython -m detection_rules kibana upload-rule -h\r\nKibana client:\r\nOptions:\r\n --space TEXT Kibana space\r\n -kp, --kibana-password TEXT\r\n -ku, --kibana-user TEXT\r\nhttps://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nPage 4 of 12\n\n--cloud-id TEXT\r\n -k, --kibana-url TEXT\r\nUsage: detection_rules kibana upload-rule [OPTIONS] TOML_FILES...\r\n Upload a list of rule .toml files to Kibana.\r\nOptions:\r\n -h, --help Show this message and exit.\r\n -h, --help Show this message and exit.\r\nStep three: Enable rule and profit\r\nNow that we have the trained supervised ML model imported into the stack, DNS events being enriched, and rules\r\nat our disposal, all that is left to do is confirm that the rule is enabled and wait for alerts! \r\nWhen viewing the rule in the Detection Engine, you can confirm that it is activated as seen below:\r\nAnd now wait for alerts. Once an alert is generated, you can use the Timeline feature to investigate the DNS event\r\nand start your investigation.\r\nhttps://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nPage 5 of 12\n\nHowever, no machine learning model is perfect! Some benign domains will be mistakenly labeled as false\r\npositives. In the next section, we will investigate how to leverage preconfigured anomaly detection jobs and\r\naccompanying rules that ship with this release to tune out false positives.\r\nFalse positives? Anomaly detection to the rescue!\r\nAs with every detection technique, there will always be some false positives. These may come in the form of CDN\r\ntraffic or custom domains that appear to be malicious but that are actually normal in the environment. To make\r\nsure that our DGA detection adapts to each user’s environment, we have created a preconfigured anomaly\r\ndetection job named experimental-high-sum-dga-probability. When enabled, this ML job examines the DGA\r\nscores produced by the supervised DGA model (yes it’s ML, all the way down) and looks for anomalous patterns\r\nof unusually high scores for a particular source IP address. Such events are assigned an anomaly score.\r\nTo maximize the benefit from the anomaly detection job, we are releasing it together with a complementary rule:\r\nPotential DGA Activity. This will create an anomaly based alert in the detection page in the security app.\r\nBoth the preconfigured anomaly detection job and complementary rule are available in the our detection rules repo\r\nreleases. \r\nHow to choose the right configuration for your environment\r\nIt all starts with the supervised DGA model. Every DNS request ingested through Packetbeat is analyzed by the\r\nmodel and assigned a probability that indicates the likely maliciousness of the domain involved in the request. You\r\ncan use the outputs of the supervised model directly in the security app using the conditional logic rules discussed\r\nin the ‘Getting started’ section, or, you can import and enable our preconfigured anomaly detection job and rules\r\nto further customize the detections to the subtleties of your environment. \r\nHow to choose the right configuration for your environment? Start simple. Enable the conditional search rules\r\ndiscussed in the ‘Getting started’ section. These rules act directly on the outputs of the supervised model and will\r\nquickly give you an idea of how much false positive background noise there is in your environment. If you find\r\nhttps://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nPage 6 of 12\n\nthat the conditional search rules operating on the direct outputs of the supervised model produce too many alerts,\r\nyou may benefit from importing and enabling the anomaly detection job. \r\nIn particular, the ML detection rule that operates on the results of the anomaly detection job  may be useful for\r\nfinding sources with aggregate high amounts of DGA activity rather than alerting on individual DGA scores one\r\nby one. If you do not have the ML module running, start up a free trial, or you can try it out in Elastic Cloud.\r\nSample screenshots of the anomaly detection model and associated rules provided with the release are below:\r\nOutput of the experimental-high-sum-dga-probability unsupervised ML job\r\nhttps://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nPage 7 of 12\n\nOutput of the Potential DGA Activity ML rule that acts on output from this unsupervised ML job\r\nAlert created by the Machine Learning Detected a DNS Request With a High DGA Probability Score search rule\r\nAlert created by the Machine Learning Detected a DNS Request Predicted to be a DGA Domain search rule\r\nCase study: Detecting real-world DGA activity in the SUNBURST attack\r\nLet’s try to apply this experimental DGA workflow to the recent SUNBURST campaign. \r\nhttps://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nPage 8 of 12\n\nTo recap, on December 13 SolarWinds released a security advisory regarding a successful supply-chain attack on\r\nthe Orion network management platform. At the time of this writing, the attack affects Orion versions released\r\nbetween March and June of 2020. Likewise, on December 13, FireEye released information about a global\r\ncampaign involving SolarWinds supply-chain compromise that affected some versions of Orion software.\r\nWe previously released a blog post addressing Elastic users and the SolarWinds case, commonly called\r\nSUNBURST. That post highlights that Elastic Security’s malware prevention technology used by both Elastic\r\nEndgame and Elastic endpoint security has been updated with detections for the attacks described in the\r\nSolarWinds disclosure.\r\nSUNBURST was a sophisticated software supply-chain attack that reportedly inserted malware into the\r\nSolarWinds Orion product and distributed it using an auto-update mechanism. The size, scope, and extent of the\r\nincident is still being assessed at the time of this writing. \r\nExisting Elastic Security detections\r\nA set of 1722 DGA-generated domain names used by the SOLARWINDS malware has been shared by a security\r\nresearcher. One of the existing Elastic Security machine learning-based detection rules, DNS Tunneling, produces\r\ntwo anomaly based alerts on the DNS names in this sample. Similar to DNS tunneling, the ratio of child-to-parent\r\ndomains in the SUNBURST name sample is very high. This ML job associated with this rule is coded to analyze\r\nPacketbeat data but it can be cloned and modified to ingest other DNS events in Elastic Common Schema (ECS)\r\nformat. This is the DNS Tunneling ML job:\r\nThis ML job has an associated detection rule named DNS Tunneling:\r\nUsing these Elastic Security rules, these anomaly detections, shown below, can be transformed into detection\r\nalerts and optional notifications in order to get them into appropriate incident triage and response work queues.\r\nHere is what these SUNBURST anomaly detections look like in the Elastic Machine Learning app:\r\nhttps://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nPage 9 of 12\n\nThis is a useful detection, but this job may not detect DGA activity all of the time. In order to strengthen DGA\r\ndetection, we are shipping the experimental DGA detection workflow.\r\nUsing the experimental DGA workflow\r\nWe found that the experimental DGA ML detection workflow detects most of this activity. We ran these\r\nSUNBURST DGA domains through the supervised DGA detection model discussed herein (see above for details\r\nof how to download and run this model and its rules). We found that the model tagged 82% of the names in the\r\nsample as DGA, which would have produced 1420 alerts on the sample set. Here is a screenshot of SUNBURST\r\nDNS names that have been tagged as DGA activity by the supervised model:\r\nThese events can be turned into detection alerts using the detection rule Machine Learning Detected a DNS\r\nRequest Predicted to be a DGA Domain. We can also make a copy of this rule and modify it to match the observed\r\nparent domain used by a particular malware instance like SUNBURST. We can match this set of SUNBURST\r\nDGA events by adding a test to the rule query like this:\r\nnetwork.protocol:dns and ml_is_dga.malicious_prediction: 1 and dns.question.registered_domain: \"avsvm\r\nWe can then give this rule a critical severity level and a high risk score of 99 in order to move it towards the front\r\nof the alert and analysis work queue. Here is a screenshot of alerts generated by this rule modified to call attention\r\nto detection of SUNBURST DGA activity:\r\nhttps://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nPage 10 of 12\n\nWe have included this rule, Machine Learning Detected DGA activity using a known SUNBURST DNS domain, in\r\nthe package. Under real-world infection conditions, a population of high frequency DGA-using malware instances\r\ncould produce enough alerts to trip the max_signals circuit breaker which is set to 100 by default. In that case, we\r\nmight have alerts for some malware instances and not others, depending on which events were first matched by\r\nthe search. \r\nIn order to ensure we identify a greater number of infected hosts engaged in DGA activity, we have increased the\r\nmax_signals value in the DGA search rules to 10,000. Note: This setting cannot be modified in the rule editor, it\r\nmust be modified in an external rule file and then imported. The setting can be observed by viewing a rule file in\r\nan editor.\r\nIn cases where DGA activity is heavy and alerts are numerous, we can also aggregate and sift DGA alerts or\r\nevents in order to count them by hostname or source IP in a data table like this:\r\nWe are also including a sample dashboard for Packetbeat DGA events with visualizations and aggregations,\r\nincluding this data table visualization, which is aggregated by source.ip. Alternatively, you can aggregate by\r\nhost.name if your DNS events contain that field. This file is named dga-dashboard.ndjson and can be imported\r\ninto Kibana by selecting Import on the Saved Objects page which can be found after selecting Stack Management. \r\nHere is a screenshot of this dashboard rendering DGA events in a packetbeat-* index:\r\nhttps://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nPage 11 of 12\n\nWe’re here to help\r\nYou are not alone! If you run into any issues in this process or simply want to know more about our philosophies\r\non threat detection and machine learning, please reach out to us on our community Slack channel, our discussion\r\nforums, or even roll your sleeves up and work with us in our open detection repo. Thank you and enjoy!\r\nSource: https://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nhttps://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection\r\nPage 12 of 12",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"origins": [
		"web"
	],
	"references": [
		"https://www.elastic.co/blog/supervised-and-unsupervised-machine-learning-for-dga-detection"
	],
	"report_names": [
		"supervised-and-unsupervised-machine-learning-for-dga-detection"
	],
	"threat_actors": [],
	"ts_created_at": 1775434036,
	"ts_updated_at": 1775826700,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/a35c2c740c3e7b60ad9c9afda401c92e256beffc.pdf",
		"text": "https://archive.orkl.eu/a35c2c740c3e7b60ad9c9afda401c92e256beffc.txt",
		"img": "https://archive.orkl.eu/a35c2c740c3e7b60ad9c9afda401c92e256beffc.jpg"
	}
}