{
	"id": "e1c40817-51fc-44f1-8ebe-0a6bdcf89be3",
	"created_at": "2026-04-06T00:14:50.531608Z",
	"updated_at": "2026-04-10T03:20:56.937623Z",
	"deleted_at": null,
	"sha1_hash": "6a57d42fde431f6e76cce88dc8dc2d987daff31a",
	"title": "How to Write Simple but Sound Yara Rules - Part 2",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 165417,
	"plain_text": "How to Write Simple but Sound Yara Rules - Part 2\r\nBy Florian Roth\r\nPublished: 2026-03-30 · Archived: 2026-04-05 18:01:42 UTC\r\nMonths ago I wrote a blog article on “How to write simple but sound Yara rules“. Since then the mentioned\r\ntechniques and tools have improved. I’d like to give you a brief update on certain Yara features that I frequently\r\nuse and tools that I use to generate and test my rules.\r\nHandle Very Specific Strings Differently\r\nIn the past I was glad to see very specific strings in samples and sometimes used these strings as the only indicator\r\nfor detection. E.g. whenever I’ve found a certain typo in the PE header fields like “Micorsoft Corportation” I\r\ncheered and thought that this would make a great signature. But – and I have to admit that now – this only makes\r\na nice signature. Great signatures require not only to match on a certain sample in the most condensed way but\r\naims to match on similar samples created by the same author or group.\r\nLook at the following rule:\r\nrule Enfal_Malware_Backdoor {\r\nmeta:\r\ndescription = \"Generic Rule to detect the Enfal Malware\"\r\nauthor = \"Florian Roth\"\r\ndate = \"2015/02/10\"\r\nsuper_rule = 1\r\nhash0 = \"6d484daba3927fc0744b1bbd7981a56ebef95790\"\r\nhash1 = \"d4071272cc1bf944e3867db299b3f5dce126f82b\"\r\nhash2 = \"6c7c8b804cc76e2c208c6e3b6453cb134d01fa41\"\r\nscore = 60\r\nstrings:\r\n$x1 = \"Micorsoft Corportation\" fullword wide\r\n$x2 = \"IM Monnitor Service\" fullword wide\r\n$a1 = \"imemonsvc.dll\" fullword wide\r\n$a2 = \"iphlpsvc.tmp\" fullword\r\n$a3 = \"{53A4988C-F91F-4054-9076-220AC5EC03F3}\" fullword\r\n$s1 = \"urlmon\" fullword\r\n$s2 = \"Registered trademarks and service marks are the property of their\" wide\r\n$s3 = \"XpsUnregisterServer\" fullword\r\n$s4 = \"XpsRegisterServer\" fullword\r\ncondition:\r\nuint16(0) == 0x5A4D and\r\n(\r\n( 1 of ($x*) ) or\r\n( 2 of ($a*) and all of ($s*) )\r\nhttps://www.bsk-consulting.de/2015/10/17/how-to-write-simple-but-sound-yara-rules-part-2/\r\nPage 1 of 8\n\n)\r\n}\r\nWhat I do when I review the 20 strings that are generated by yarGen is that I try to categorize the extracted strings\r\nin 3 different groups:\r\nVery specific strings (one of them is sufficient for successful detection, e.g. IP addresses, payload URLs,\r\nPDB paths, user profile directories)\r\nSpecific strings (strings that look good but may appear in goodware as well, e.g. “wwwlib.dll”)\r\nOther strings (even strings that appear in goodware; without random code from compressed or encrypted\r\ndata; e.g. “ModuleStart”)\r\nThen I create a condition that defines:\r\nA Certain Magic Header (remove it in case of ASCII text like scripts or webshells)\r\n1 of the very specific strings OR\r\nsome of the specific strings combined with many (but not all) of the common strings\r\nHere is another example that does only have very specific strings (x) and common strings (s):\r\nrule Cobra_Trojan_Stage1 {\r\nmeta:\r\ndescription = \"Cobra Trojan - Stage 1\"\r\nauthor = \"Florian Roth\"\r\nreference = \"https://blog.gdatasoftware.com/blog/article/analysis-of-project-cobra.ht\r\ndate = \"2015/02/18\"\r\nhash = \"a28164de29e51f154be12d163ce5818fceb69233\"\r\nstrings:\r\n$x1 = \"KmSvc.DLL\" fullword wide\r\n$x2 = \"SVCHostServiceDll_W2K3.dll\" fullword ascii\r\n$s1 = \"Microsoft Corporation. All rights reserved.\" fullword wide\r\n$s2 = \"srservice\" fullword wide\r\n$s3 = \"Key Management Service\" fullword wide\r\n$s4 = \"msimghlp.dll\" fullword wide\r\n$s5 = \"_ServiceCtrlHandler@16\" fullword ascii\r\n$s6 = \"ModuleStart\" fullword ascii\r\n$s7 = \"ModuleStop\" fullword ascii\r\n$s8 = \"5.2.3790.3959 (srv03.sp2.070216-1710)\" fullword wide\r\ncondition:\r\nuint16(0) == 0x5A4D and filesize \u0026lt; 50000 and 1 of ($x*) and 6 of ($s*)\r\n}\r\nIf you can’t create a rule that is sufficiently specific, I recommend the following methods to restrict the rule:\r\nMagic Header (use it as first element in condition – see performance guidelines, e.g. “uint16(0) ==\r\n0x5A4D”)\r\nhttps://www.bsk-consulting.de/2015/10/17/how-to-write-simple-but-sound-yara-rules-part-2/\r\nPage 2 of 8\n\nFile Size (malware that mimics valid system files, drivers or legitimate software often differs significantly\r\nin size; try to find the valid files online and set a size value in your rule, e.g. “filesize \u003e 200KB and filesize\r\n\u003c 600KB”)\r\nString Location (see the “Location is Everything” section)\r\nExclude strings that occur in false positives (e.g. $fp1 = “McAfeeSig”)\r\nLocation is Everything\r\nOne of the most underestimated features of Yara is the possibility to define a range in which strings occur in order\r\nto match. I used this technique to create a rule that detect metasploit meterpreter payloads quite reliably even if it’s\r\nencoded/cloaked. How that?\r\nIf you see malware code that is hidden in an overlay at the end of a valid executable (e.g. “ab.exe”) and you see\r\nonly strings that are typical function exports or mimics a well-known executable ask the following questions:\r\nIs it normal that these strings are located at this location in the file?\r\nIs it normal that these strings occur more than once in that file?\r\nIs the distance between two strings somehow specific?\r\nhttps://www.bsk-consulting.de/2015/10/17/how-to-write-simple-but-sound-yara-rules-part-2/\r\nPage 3 of 8\n\nMalware Strings\r\nIn case of the unspecific malware code in the PE overlay, try to define a rule that looks for a certain file size (e.g.\r\nfilesize \u003e 800KB) and the malware strings relative to the end of the file (e.g. $s1 in (filesize-500..filesize)).\r\nThe following example shows a unspecified webshell that contains strings that may be modified by an attacker in\r\nfuture versions when applied in a victim’s network. Try always to extract strings that are less likely to be changed.\r\nhttps://www.bsk-consulting.de/2015/10/17/how-to-write-simple-but-sound-yara-rules-part-2/\r\nPage 4 of 8\n\nWebshell Code PHP\r\nThe variable name “$code” is more likely to change than the function combination\r\n“@eval(gzinflate(base64_decode(” at the end of the file. It is possible that valid php code contains\r\n“eval(gzinflate(base64_decode(” somewhere in the code but it is less likely that it occurs in the last 50 bytes of the\r\nfile.\r\nI therefore wrote the following rule:\r\nrule Webshell_b374k_related_1 {\r\nmeta:\r\ndescription = \"Detects b374k related webshell\"\r\nauthor = \"Florian Roth\"\r\nreference = \"https://goo.gl/ZuzV2S\"\r\nscore = 65\r\nhash = \"d5696b32d32177cf70eaaa5a28d1c5823526d87e20d3c62b747517c6d41656f7\"\r\ndate = \"2015-10-17\"\r\nstrings:\r\n$m1 = \"\u003c!--?php\"\r\n $s1 = \"@eval(gzinflate(base64_decode(\" ascii\r\n condition:\r\n $m1 at 0 and $s1 in (filesize-50..filesize) and filesize \u0026lt; 20KB\r\n}\r\nPerformance Guidelines\r\nI collected many ideas by Wesley Shields and Victor M. Alvarez and composed a gist called “Yara Performance\r\nGuidelines”. This guide shows you how to write Yara rules that use less CPU cycles by avoiding CPU intensive\r\nchecks or using new condition checking shortcuts introduced in Yara version 3.4.\r\nYara Performance Guidelines\r\nPE Module\r\nPeople sometimes ask why I don’t use the PE module. The reason is simple: I avoid using modules that are rather\r\nnew and would like to see it thoroughly tested prior using it in my scanners running in productive environments. It\r\nis a great module and a lot of effort went into it. I would always recommend using the PE module in lab\r\nenvironments or sandboxes. In scanners that walk huge directory trees a minor memory leak in one of the modules\r\ncould lead to severe memory shortages. I’ll give it another year to prove its stability and then start using it in my\r\nrules.\r\nyarGen\r\nyarGen has an opcode feature since the last minor version. It is active by default but only useful in cases in which\r\nnot enough strings could be extracted.\r\nI currently use the following parameters to create my rules:\r\nhttps://www.bsk-consulting.de/2015/10/17/how-to-write-simple-but-sound-yara-rules-part-2/\r\nPage 5 of 8\n\npython yarGen.py --noop -z 0 -a \"Florian Roth\" -r \"http://link-to-sample\" /mal/malware\r\nThe problem with the opcode feature is that it requires about 2,5 GB more main memory during rule creation. I’ll\r\nchange it to an optional parameter in the next version.\r\nyarAnalyzer\r\nyarAnalyzer is a rather new tool that focuses on rule coverage. After creating a bigger rule set or a generic rule\r\nthat should match on several samples you’d like to check the coverage of your rules in order to detect overlapping\r\nrules (which is often OK).\r\nyarAnalyzer helps you to get an overview on:\r\nrules that match on more than one sample\r\nsamples that show hits from more than one rule\r\nrules without hits\r\nsamples without hits\r\nyarAnalayzer Screenshot\r\nyarAnalyzer Github Repository\r\nString Extraction and Colorization\r\nTo review the strings in a sample I use a simple shell one-liner that a good friend sent me once.\r\n“strings” version for Linux\r\nhttps://www.bsk-consulting.de/2015/10/17/how-to-write-simple-but-sound-yara-rules-part-2/\r\nPage 6 of 8\n\n#!/bin/bash\r\n(strings -a -td \"$@\" | sed 's/^\\(\\s*[0-9][0-9]*\\) \\(.*\\)$/\\1 A \\2/' ; strings -a -td -el \"$@\" | sed\r\n“gstrings” version for OS X (sudo port install binutils)\r\n#!/bin/bash\r\n(gstrings -a -td \"$@\" | gsed 's/^\\(\\s*[0-9][0-9]*\\) \\(.*\\)$/\\1 A \\2/' ; gstrings -a -td -el \"$@\" | gs\r\nIt produces an output as shown in the above screenshot with green text and the description “Malware Strings”\r\nshowing the offset, ascii (A) or wide (W) and the string at this offset.\r\nFor a colorization of the string check my new tool “prisma” that colorizes random type standard output.\r\nPrisma STDOUT colorization\r\nhttps://www.bsk-consulting.de/2015/10/17/how-to-write-simple-but-sound-yara-rules-part-2/\r\nPage 7 of 8\n\nContact\r\nFollow me on Twitter: @Cyb3rOps\r\nAbout the author:\r\nFlorian Roth\r\nFlorian Roth serves as the Head of Research and Development at Nextron Systems. With a background in IT\r\nsecurity since 2000, he has delved deep into nation-state cyber attacks since 2012. Florian has developed the\r\nTHOR Scanner and actively engages with the community via his Twitter handle @cyb3rops. He has contributed to\r\nopen-source projects, including 'Sigma', a generic SIEM rule format, and 'LOKI', an open-source scanner.\r\nAdditionally, he has shared valuable resources like a mapping of APT groups and operations and an Antivirus\r\nEvent Analysis Cheat Sheet.\r\nSource: https://www.bsk-consulting.de/2015/10/17/how-to-write-simple-but-sound-yara-rules-part-2/\r\nhttps://www.bsk-consulting.de/2015/10/17/how-to-write-simple-but-sound-yara-rules-part-2/\r\nPage 8 of 8",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"ETDA",
		"Malpedia"
	],
	"references": [
		"https://www.bsk-consulting.de/2015/10/17/how-to-write-simple-but-sound-yara-rules-part-2/"
	],
	"report_names": [
		"how-to-write-simple-but-sound-yara-rules-part-2"
	],
	"threat_actors": [],
	"ts_created_at": 1775434490,
	"ts_updated_at": 1775791256,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/6a57d42fde431f6e76cce88dc8dc2d987daff31a.pdf",
		"text": "https://archive.orkl.eu/6a57d42fde431f6e76cce88dc8dc2d987daff31a.txt",
		"img": "https://archive.orkl.eu/6a57d42fde431f6e76cce88dc8dc2d987daff31a.jpg"
	}
}