# How to Write Simple but Sound Yara Rules – Part 2 **[bsk-consulting.de/2015/10/17/how-to-write-simple-but-sound-yara-rules-part-2/](https://www.bsk-consulting.de/2015/10/17/how-to-write-simple-but-sound-yara-rules-part-2/)** October 17, 2015 [Months ago I wrote a blog article on “How to write simple but sound Yara rules“. Since then](https://www.bsk-consulting.de/2015/02/16/write-simple-sound-yara-rules/) the mentioned techniques and tools have improved. I’d like to give you a brief update on certain Yara features that I frequently use and tools that I use to generate and test my rules. ## Handle Very Specific Strings Differently In the past I was glad to see very specific strings in samples and sometimes used these strings as the only indicator for detection. E.g. whenever I’ve found a certain typo in the PE header fields like “Micorsoft Corportation” I cheered and thought that this would make a great signature. But – and I have to admit that now – this only makes a nice signature. Great signatures require not only to match on a certain sample in the most condensed way but aims to match on similar samples created by the same author or group. Look at the following rule: ----- rule Enfal_Malware_Backdoor { meta: description = "Generic Rule to detect the Enfal Malware" author = "Florian Roth" date = "2015/02/10" super_rule = 1 hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790" hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b" hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41" score = 60 strings: $x1 = "Micorsoft Corportation" fullword wide $x2 = "IM Monnitor Service" fullword wide $a1 = "imemonsvc.dll" fullword wide $a2 = "iphlpsvc.tmp" fullword $a3 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword $s1 = "urlmon" fullword $s2 = "Registered trademarks and service marks are the property of their" wide $s3 = "XpsUnregisterServer" fullword $s4 = "XpsRegisterServer" fullword condition: uint16(0) == 0x5A4D and ( ( 1 of ($x*) ) or ( 2 of ($a*) and all of ($s*) ) ) } What I do when I review the 20 strings that are generated by yarGen is that I try to categorize the extracted strings in 3 different groups: **Very specific strings (one of them is sufficient for successful detection, e.g. IP** addresses, payload URLs, PDB paths, user profile directories) **Specific strings (strings that look good but may appear in goodware as well, e.g.** “wwwlib.dll”) **Other strings (even strings that appear in goodware; without random code from** compressed or encrypted data; e.g. “ModuleStart”) Then I create a condition that defines: A Certain Magic Header (remove it in case of ASCII text like scripts or webshells) 1 of the very specific strings OR some of the specific strings combined with many (but not all) of the common strings ----- Here is another example that does only have very specific strings (x) and common strings (s): rule Cobra_Trojan_Stage1 { meta: description = "Cobra Trojan - Stage 1" author = "Florian Roth" reference = "https://blog.gdatasoftware.com/blog/article/analysis-of-project-cobra.html" date = "2015/02/18" hash = "a28164de29e51f154be12d163ce5818fceb69233" strings: $x1 = "KmSvc.DLL" fullword wide $x2 = "SVCHostServiceDll_W2K3.dll" fullword ascii $s1 = "Microsoft Corporation. All rights reserved." fullword wide $s2 = "srservice" fullword wide $s3 = "Key Management Service" fullword wide $s4 = "msimghlp.dll" fullword wide $s5 = "_ServiceCtrlHandler@16" fullword ascii $s6 = "ModuleStart" fullword ascii $s7 = "ModuleStop" fullword ascii $s8 = "5.2.3790.3959 (srv03.sp2.070216-1710)" fullword wide condition: uint16(0) == 0x5A4D and filesize < 50000 and 1 of ($x*) and 6 of ($s*) } If you can’t create a rule that is sufficiently specific, I recommend the following methods to restrict the rule: **Magic Header (use it as first element in condition – see performance guidelines, e.g.** “uint16(0) == 0x5A4D”) **File Size (malware that mimics valid system files, drivers or legitimate software often** differs significantly in size; try to find the valid files online and set a size value in your rule, e.g. “filesize > 200KB and filesize < 600KB") **String Location (see the “Location is Everything” section)** **Exclude strings that occur in false positives (e.g. $fp1 = “McAfeeSig”)** ## Location is Everything One of the most underestimated features of Yara is the possibility to define a range in which strings occur in order to match. I used this technique to create a rule that detect metasploit meterpreter payloads quite reliably even if it’s encoded/cloaked. How that? If you see malware code that is hidden in an overlay at the end of a valid executable (e.g. “ab.exe”) and you see only strings that are typical function exports or mimics a well-known executable ask the following questions: ----- Is it normal that these strings are located at this location in the file? Is it normal that these strings occur more than once in that file? Is the distance between two strings somehow specific? Malware Strings In case of the unspecific malware code in the PE overlay, try to define a rule that looks for a certain file size (e.g. filesize > 800KB) and the malware strings relative to the end of the file (e.g. $s1 in (filesize-500..filesize)). The following example shows a unspecified webshell that contains strings that may be modified by an attacker in future versions when applied in a victim’s network. Try always to ----- extract strings that are less likely to be changed. Webshell Code PHP The variable name “$code” is more likely to change than the function combination “@eval(gzinflate(base64_decode(” at the end of the file. It is possible that valid php code contains “eval(gzinflate(base64_decode(” somewhere in the code but it is less likely that it occurs in the last 50 bytes of the file. I therefore wrote the following rule: rule Webshell_b374k_related_1 { meta: description = "Detects b374k related webshell" author = "Florian Roth" reference = "https://goo.gl/ZuzV2S" score = 65 hash = "d5696b32d32177cf70eaaa5a28d1c5823526d87e20d3c62b747517c6d41656f7" date = "2015-10-17" strings: $m1 = "