{
	"id": "dd797c29-d768-43f8-898e-d388a0e274b6",
	"created_at": "2026-04-06T01:31:13.83152Z",
	"updated_at": "2026-04-10T03:38:09.789557Z",
	"deleted_at": null,
	"sha1_hash": "5dde650c5fda26a379c6f91bde8696c7f7ff1cab",
	"title": "/var/log/notes",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 408819,
	"plain_text": "/var/log/notes\r\nArchived: 2026-04-06 01:22:26 UTC\r\nBy Jeff White (karttoon)\r\nRegular expressions (regex) are a language construct that allow you to define a search pattern. The flexibility of\r\nthis language allows you to craft search patterns for tons of practical applications, including passive identification\r\nof network traffic. Specifically, they can allow you to pattern match on URL's so that you may quickly identify\r\nmalicious sites frequently used by malware command and control (C2), domain generation algorithms (DGA's),\r\nand other such activities.\r\nI fell in love with using regex as a defensive tool while doing incident response many years ago. The depth of\r\ncontrol they provide naturally lends itself to the forensic, analyst, and responder lines of work. This blog may be\r\nold hat to most blue teamers out there, but if not, hopefully it serves as an educational resource on how you can\r\nuse data to build PCRE's for network defense.\r\nOver the course of this blog, I'll cover developing Perl compatible regex (PCRE) for the Emotet banking malware\r\ndownload URL's and develop PCRE's that encompass multiple campaigns that can then be used on a proxy device\r\n(blocking), in a SIEM (identification), or whatever system you have that supports utilizing these expressions.\r\nEmotet is a great candidate for review as it has varying domain structures that are ripe for pattern matching. I'll\r\nwalk you through how I develop these PCRE's, along with refining them, and then finally how they can be vetted\r\nfor false-positives (FP) to make them ready for production.\r\nThroughout the blog, I'll be using a Python script I wrote called pcre_check to assit with the analysis. Essentially,\r\nall the tool does is take a parameter for a file containing your PCRE's, a parameter for a file containing the URL's,\r\nand then some flags for how to display the pattern matches and misses. This is helpful for the rapid development\r\nof PCRE's because, more often than naught, you find yourself in the midst of developing these when the shit has\r\nhit the fan...or at least I always did.\r\nI'll be focusing solely on URL's in this example; however, on the off chance you're not familiar with regex, keep in\r\nmind that a myriad of tools, all the way down at the byte level and up to the application level that I'll be covering\r\nhere can utilize regex. You should absolutely learn the basics at least as it's something that can be a life saver in\r\nyour daily toolbox.\r\nBefore I get too much further in, here are a couple of helpful links, that I find myself constantly visiting, which\r\nyou may find useful if you want to review or build your own PCRE's. I'll try to explain the regex syntax and logic\r\nas I go but I'll assume you know the basic structure of the language. If not, hit the references below.\r\nhttps://regex101.com/ - Lets you test a PCRE (or some other flavors of regex) against a set of strings you\r\nprovide on the fly with color and syntax highlighting. It provides extremely helpful explanations that tell\r\nyou how your PCRE is being evaluated so you can adjust as needed.\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 1 of 23\n\nhttp://www.regular-expressions.info/ - This site probably has everything you ever wanted to know about\r\nthe regex language. A super handy quick reference for when you forget some of the nuances and syntax.\r\nThis will be a long blog, and a little free flowing, as I develop these while enumerating step-by-step. Below are\r\nsome jumps so you can skip around as needed.\r\nCreating your sample corpus\r\nIdentifying patterns and enumeration\r\nRound 1\r\nRound 2\r\nRound 3\r\nRound 4\r\nRound 5\r\nRound 6\r\nRefining the rules\r\nVetting the rules\r\nFinal product\r\nInitial Sample Corpus\r\nThe Emotet banking malware download locations have a lot of different URL structures across their different\r\ncampaigns. It's been popping up on my radar more and more lately so I want to try and enumerate the patterns\r\nhere to further expand what I can catch. That being said, the very first thing I need to do is collect a decent\r\nsamples of the various campaigns so that I can begin to try and match them. Prior to my current $dayjob, I'd\r\napproach this by hitting up multiple blogs from researchers or security companies and compile the URL set. When\r\nI didn't have access to systems that made this task fairly trivial, I would frequently build them from the below\r\nresources.\r\nAlient Vault Open Threat Exchange (OTX) - An awesome aggregation project that lets you pivot around\r\nvarious reports, blogs, and events based on keywords and extract what you need. Below is a screenshot\r\nshowing a search for \"emotet\"; each of those contain IOC's for URL's you can copy out to build your list.\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 2 of 23\n\nMalware don't need Coffee - Kafeine's site is more focused on exploit kits but almost always had a handful\r\nof URL's of interest and sometimes links to raw URL dumps on Github.\r\nMalware Traffic Analysis - MalwareTraffic's site is heavily focused on exploit kits and e-mail based\r\nthreats, but almost always includes domains/URL's as well.\r\nUsually just Googling the threat name, \"Emotet domains\", bring you to sites like this one which have links to\r\nPastebin posts containing loads of samples. The more the better but in general, in my experience, I'd say between\r\n15-30 URL's is usually enough to make a solid base for an individual pattern and then you can tweak it during the\r\nfalse-positive (FP) checking phase.\r\nI've placed 696 Emotet URL's on Github which you can use to follow along or throw in a blocklist.\r\nPattern Recognition / Enumeration\r\nOnce you have a decent sample set, the next step is to analyze the data and look for patterns. I'll show the various\r\nchanges to the PCRE's as I analyze the URL's and you can see how they evolve into the final product after each\r\niteration. To better illustrate this, I'll just focus on the last 20 URL's at a time but normally I'll have open 3\r\nterminals: top window editing the URL file, middle window with pcre_check output, bottom window editing the\r\nPCRE file. This layout allows me to quickly modify and validate changes on the fly and significantly reducing the\r\ntime to turnaround.\r\nBelow is the first run of the script showing that none of the URLs matched and truncated to the last 20.\r\nRound 1\r\n$ python pcre_check.py -p emotet_pcres -u emotet_urls -n [+] NO HITS [+] http://12back.com/dw3wz-ue164-\r\nqqv/ http://4glory.net/p7lrq-s191-iv/ ... http://www.melodywriters.com/INVOICE-864339-98261/\r\nhttp://www.prodzakaz.com.ua/H27560xzwsS/ http://www.stellaimpianti.it/download2467/\r\nhttp://www.stepstonedev.com/field/download7812/ http://www.surreycountycleaners.com/t5wx-x064-mzdb/\r\nhttp://www.voloskof.net/Sn83160EngQs/ http://www.wildweek.com/EDHFR-08-77623-document-May-04-2017/\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 3 of 23\n\nhttp://www.ziyufang.studio/project/wp-content/plugins/nprojects/download5337/ http://wyskocil.de/ORDER-525808-73297/ http://xionglutions.com/NDKBS-51-84402-document-May-03-2017/\r\nhttp://xionglutions.com/wl7dh-uf201-asnw/ http://xyphoid.com/RRT-13279129.dokument/\r\nhttp://xyphoid.com/SCANNED/MM3431UCNPCEZRO/ http://yildiriminsaat.com.tr/JCV-71815736.dokument/\r\nhttp://zahahadidmiami.com/K38258Q/ http://zeroneed.com/FNN-40446899.dokument/\r\nhttp://ziarahsutera.com/5377959590/ http://zonasacra.com/zH83293YizhQ/ http://zvarga.com/15-12-07/CUST-9405847-8348/ http://zypern-aktiv.de/wp-content/plugins/wordfence/img9re-a789-stz/\r\nThere are a couple of things that jump out immediately on the first review.\r\nDomains seem unrelated to the URL path, they are most likely compromised sites.\r\nThe final path can be multiple levels down, so I'll need to account for this.\r\nAt least 6 different variations can be seen out of the gate.\r\nFor each of the PCRE, I've grown accustomed to starting them with the below structure.\r\n^http:\\/\\/[^\\x2F]+\\/\r\nThis matches any line that begins (\"^\") with \"http://\" followed by any characters, except (\"[^ ]\") forward slash\r\n(\"\\x2F\"), up to the first foward slash. This ensures we match the domain regardless of what TLD or subdomains\r\nmay be present.\r\nFor ease of illustration, I'm going to group the variations and break them down individually.\r\n[Group 01]\r\nhttp://www.surreycountycleaners.com/t5wx-x064-mzdb/ http://xionglutions.com/wl7dh-uf201-asnw/\r\nhttp://zypern-aktiv.de/wp-content/plugins/wordfence/img9re-a789-stz/\r\nFor this pattern, we have 4-5 alpha(lower)numeric, dash, 4-5 alpha(lower)numeric, dash, 3-4 alpha(lower). We'll\r\nalso want to account for the last line which has the path multiple levels in. We can accomplish this by putting our \"\r\n[^\\x2F]+\\/\" section in a group and saying the group can repeat one or more times (eg match everything between\r\nthe forward slashes until the last one, where our pattern is).\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]{4,5}-[a-z0-9]{4,5}-[a-z]{3,4}\\/$\r\n[Group 02]\r\nhttp://www.melodywriters.com/INVOICE-864339-98261/ http://wyskocil.de/ORDER-525808-73297/\r\nhttp://zvarga.com/15-12-07/CUST-9405847-8348/\r\nThis next group appears to use a word in caps, dash, 6-7 numbers, dash, 4-5 numbers. We'll need to account for\r\nthe subpaths again as well. In this case, I prefer to group full words instead of using a character range, which helps\r\nfor trying to be false-positive adverse.\r\n^http:\\/\\/([^\\x2F]+\\/)+(INVOICE|ORDER|CUST)-[0-9]{6,7}-[0-9]{4,5}\\/$\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 4 of 23\n\n[Group 03]\r\nhttp://www.prodzakaz.com.ua/H27560xzwsS/ http://www.voloskof.net/Sn83160EngQs/\r\nhttp://xyphoid.com/SCANNED/MM3431UCNPCEZRO/ http://zahahadidmiami.com/K38258Q/\r\nhttp://ziarahsutera.com/5377959590/ http://zonasacra.com/zH83293YizhQ/\r\nI feel this group may end up getting split later. We have one URL which is purely numerical and then two which\r\nhave no lowercase letters. We'll cross that bridge as we look at more samples, if necessary. Another thing to note is\r\nthat this group has a very weak pattern in that it is very generic, which means it will likely match a lot of\r\nlegitimate URL's and not hold up during FP testing. We'll cross that bridge when we get to it as well.\r\nFor now, it's a mix of 7-15 alphanumeric characters.\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{7,15}\\/$\r\n[Group 04]\r\nhttp://xyphoid.com/RRT-13279129.dokument/ http://yildiriminsaat.com.tr/JCV-71815736.dokument/\r\nhttp://zeroneed.com/FNN-40446899.dokument/\r\nThis one, and the next three, all look pretty straight forward: 3 alpha(upper), dash, 8 numbers, period, \"dokument\"\r\nstring.\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{3}-[0-9]{8}\\.dokument\\/$\r\n[Group 05]\r\nhttp://www.wildweek.com/EDHFR-08-77623-document-May-04-2017/ http://xionglutions.com/NDKBS-51-\r\n84402-document-May-03-2017/\r\nSimilarly, very structured (which is good for us): 5 alpha(upper), dash, 2 numbers, dash, 5 numbers, dash,\r\n\"document\" string, dash, \"May\" string, dash, 2 numbers, dash, \"2017\" string. I've defaulted to using \"2017\" as a\r\nstring since it aligns with their usage of it as a date so it seems unlikely to change.\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{5}-[0-9]{2}-[0-9]{5}-document-May-[0-9]{2}-2017\\/$\r\n[Group 06]\r\nhttp://www.stellaimpianti.it/download2467/ http://www.stepstonedev.com/field/download7812/\r\nhttp://www.ziyufang.studio/project/wp-content/plugins/nprojects/download5337/\r\nThe string \"download\", 4 numbers.\r\n^http:\\/\\/([^\\x2F]+\\/)+download[0-9]{4}\\/$\r\nI'll throw these into the emotet_pcres file and see how each performs against our target data set of known-bad\r\nEmotet sites.\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 5 of 23\n\n[+] FOUND [+] Count: 24/696 Comment: Group 01 - [ t5wx-x064-mzdb ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]\r\n{4,5}-[a-z0-9]{4,5}-[a-z]{3,4}\\/$ [+] FOUND [+] Count: 43/696 Comment: Group 02 - [ INVOICE-864339-\r\n98261 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+(INVOICE|ORDER|CUST)-[0-9]{6,7}-[0-9]{4,5}\\/$ [+] FOUND [+]\r\nCount: 177/696 Comment: Group 03 - [ H27560xzwsS ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{7,15}\\/$ [+]\r\nFOUND [+] Count: 30/696 Comment: Group 04 - [ RRT-13279129.dokument ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{3}-[0-9]{8}\\.dokument\\/$ [+] FOUND [+] Count: 24/696 Comment: Group 05 - [ EDHFR-08-77623-\r\ndocument-May-04-2017 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{5}-[0-9]{2}-[0-9]{5}-document-May-[0-9]\r\n{2}-2017\\/$ [+] FOUND [+] Count: 62/696 Comment: Group 06 - [ download2467 ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+download[0-9]{4}\\/$\r\nPretty low across the board except for group 3, which is the one I mentioned is too loose to begin with. From here\r\non out, if I don't list a particular group, it implies there was no change to the PCRE.\r\nRound 2\r\nThe next 20 URL's are below.\r\nhttp://web2present.com/Invoice-538878-14610/ http://webbmfg.com/krupy/gallery2/g2data/LUqc663BAyN333-\r\nHoO/ http://webbsmail.co.uk/DIDE-19-85247-document-May-04-2017/ http://webergy.co.uk/15-14-47/Cust-0910279-3981/ http://webics.org/Cust-951068-69554/ http://websajt.nu/ap6ohc-au152-urttp/\r\nhttp://wescographics.com/17-40-07/Invoice-5558936-1201/ http://whiteroofradio.com/YD796MJO974-NNW/\r\nhttp://wightman.cc/ipa0oab-j490-keap/ http://wilstu.com/hHiDSaaP03Y95TIGpIUS4Aa/\r\nhttp://wingitproductions.org/NUDA-X-52454-DE/ http://wlrents.com/CUST.-Document-YDI-04-GQ389557/\r\nhttp://wnyil.org/wnyil_transfer/Ups__com__WebTracking__tracknum__4DFH74180493688150/ORDER.-\r\nDocument-SY-92-E736730/ http://wolffy.net/17-00-07/Invoice-9545415-1483/ http://wortis.com/CH760Wcv003-\r\nLuh/ http://www.anti-corruption.su/Cust-3708876-8210/ http://www.anti-corruption.su/TNO-59-97413-document-May-04-2017/ http://www.babyo.com.mx/Invoice-583156-73417/ http://www.doodle.tj/yW1NZ-sh00-cH/\r\nhttp://zypern-aktiv.de/wp-content/plugins/wordfence/img9re-a789-stz/\r\nIt looks like we have a few new groups as well. I'll attempt to highlight in red the changes to the PCRE's which\r\nmight make the changes clearer.\r\n[Group 01] - [ t5wx-x064-mzdb ]\r\nhttp://websajt.nu/ap6ohc-au152-urttp/ http://wightman.cc/ipa0oab-j490-keap/ http://www.doodle.tj/yW1NZ-sh00-\r\ncH/ http://zypern-aktiv.de/wp-content/plugins/wordfence/img9re-a789-stz/\r\nYou'll note that the third one now introduces capital letters; it's possible this is a separate campaign but I'll circle\r\nback to this later during review. The main changes will be the addition of the capital letters and adjustment on the\r\nranges, which will likely be the case for the rest of the groups.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]{4,5}-[a-z0-9]{4,5}-[a-z]{3,4}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]\r\n{4,7}-[a-z0-9]{4,5}-[a-z]{2,5}\\/$\r\n[Group 02] - [ INVOICE-864339-98261 ]\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 6 of 23\n\nhttp://web2present.com/Invoice-538878-14610/ http://webergy.co.uk/15-14-47/Cust-0910279-3981/\r\nhttp://webics.org/Cust-951068-69554/ http://wescographics.com/17-40-07/Invoice-5558936-1201/\r\nhttp://wolffy.net/17-00-07/Invoice-9545415-1483/ http://www.anti-corruption.su/Cust-3708876-8210/\r\nhttp://www.babyo.com.mx/Invoice-583156-73417/\r\nNew strings \"Invoice\" and \"Cust\".\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+(INVOICE|ORDER|CUST)-[0-9]{6,7}-[0-9]{4,5}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+\r\n(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\\/$\r\n[Group 03] - [ H27560xzwsS ]\r\nhttp://wilstu.com/hHiDSaaP03Y95TIGpIUS4Aa/\r\nRange adjustment (making this one even more useless).\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{7,15}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{7,23}\\/$\r\n[Group 05] - [ EDHFR-08-77623-document-May-04-2017 ]\r\nhttp://webbsmail.co.uk/DIDE-19-85247-document-May-04-2017/ http://www.anti-corruption.su/TNO-59-97413-\r\ndocument-May-04-2017/\r\nRange adjustment.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{5}-[0-9]{2}-[0-9]{5}-document-May-[0-9]{2}-2017\\/$ NEW:\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{3,5}-[0-9]{2}-[0-9]{5}-document-May-[0-9]{2}-2017\\/$\r\n[Group 07] - [ LUqc663BAyN333-HoO ]\r\nhttp://webbmfg.com/krupy/gallery2/g2data/LUqc663BAyN333-HoO/ http://whiteroofradio.com/YD796MJO974-\r\nNNW/ http://wortis.com/CH760Wcv003-Luh/\r\nThis cluser is defined by one dash towards the end: 11-14 alphanumeric, dash, 3 alpha.\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{11,14}-[a-zA-Z]{3}\\/$\r\n[Group 08] - [ NUDA-X-52454-DE ]\r\nhttp://wingitproductions.org/NUDA-X-52454-DE/\r\nOnly one sample so I'll match it exactly, 4 alpha(upper), dash, 1 alpha(upper), dash, 5 numbers, dash, 2\r\nalpha(upper).\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{4}-[A-Z]{1}-[0-9]{5}-[A-Z]{2}\\/$\r\n[Group 09] - [ CUST.-Document-YDI-04-GQ389557 ]\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 7 of 23\n\nhttp://wlrents.com/CUST.-Document-YDI-04-GQ389557/\r\nhttp://wnyil.org/wnyil_transfer/Ups__com__WebTracking__tracknum__4DFH74180493688150/ORDER.-\r\nDocument-SY-92-E736730/\r\nSimilar to Group 2: same word choice, period, dash, \"Document\" string, dash, 2-3 alpha(upper), dash, 2 numbers,\r\ndash, 7-8 alpha(upper)numeric.\r\n^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER)\\.-Document-[A-Z]{2,3}-[0-9]{2}-[A-Z0-9]{7,8}\\/$\r\nNote that the delta in the output after each group is just something I've included after the fact to show the progress\r\nfor the blog.\r\n[+] FOUND [+] Count: 66/696 (+42) Comment: [Group 01] - [ t5wx-x064-mzdb ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+\r\n[a-zA-Z0-9]{4,7}-[a-z0-9]{4,5}-[a-z]{2,5}\\/$ [+] FOUND [+] Count: 80/696 (+37) Comment: [Group 02] - [\r\nINVOICE-864339-98261 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-\r\n9]{4,5}\\/$ [+] FOUND [+] Count: 190/696 (+13) Comment: [Group 03] - [ H27560xzwsS ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{7,23}\\/$ [+] FOUND [+] Count: 30/696 Comment: [Group 04] - [ RRT-13279129.dokument ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{3}-[0-9]{8}\\.dokument\\/$ [+] FOUND [+] Count:\r\n59/696 (+35) Comment: [Group 05] - [ EDHFR-08-77623-document-May-04-2017 ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{3,5}-[0-9]{2}-[0-9]{5}-document-May-[0-9]{2}-2017\\/$ [+] FOUND [+] Count:\r\n62/696 Comment: [Group 06] - [ download2467 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+download[0-9]{4}\\/$ [+] FOUND\r\n[+] Count: 15/696 Comment: [Group 07] - [ LUqc663BAyN333-HoO ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]\r\n{11,14}-[a-zA-Z]{3}\\/$ [+] FOUND [+] Count: 3/696 Comment: [Group 08] - [ NUDA-X-52454-DE ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{4}-[A-Z]{1}-[0-9]{5}-[A-Z]{2}\\/$ [+] FOUND [+] Count: 20/696 Comment: [Group\r\n09] - [ CUST.-Document-YDI-04-GQ389557 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER)\\.-Document-[A-Z]\r\n{2,3}-[0-9]{2}-[A-Z0-9]{7,8}\\/$\r\nRound 3\r\nThe next set of 20 URL's.\r\nhttp://theocforrent.com/BG-47535325/zp3x-r88-wuh.view/ http://thepogs.net/rs4eG-Md93-FSZV/\r\nhttp://thesubservice.com/ORDER.-Document-9543529814/ http://theuntoldsorrow.co.uk/ORDER.-XI-80-\r\nUY913942/ http://tiger12.com/TGA-48-76252-doc-May-04-2017/ http://timmadden.com.au/qzw1s-wc740-m/\r\nhttp://toppprogramming.com/Cust-8328499631/ http://tpsystem.net/TaVS391hyCaD623-dJ/\r\nhttp://transfinity.co.uk/sam/fathers-day/htdocs/b2m-qp699-jxmln/ http://tridentii.com/OY-30676027.dokument/\r\nhttp://tscoaching.co.uk/l1R-q60-pe/ http://uncover.jp/XwXL806QaDN792-jr/ http://uncover.jp/r-2psl-vo440-\r\nlz.doc/ http://visionsoflightphotography.com/FRMLW-RNT-41482-DE/ http://visuals.com/CUST.-VT-38-\r\nRH422386/ http://voxellab.com/BBM-07-75350-doc-May-04-2017/ http://vspacecreative.co.uk/O2-view-report-818/c1o-jn07-er.view/ http://wayanad.net/xhW017TRfP646-z/ http://wb0rur.com/ZGAG-59-63863-doc-May-05-\r\n2017/ http://www.doodle.tj/yW1NZ-sh00-cH/\r\nOne new variant in this set.\r\n[Group 01] - [ t5wx-x064-mzdb ]\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 8 of 23\n\nhttp://thepogs.net/rs4eG-Md93-FSZV/ http://timmadden.com.au/qzw1s-wc740-m/\r\nhttp://transfinity.co.uk/sam/fathers-day/htdocs/b2m-qp699-jxmln/ http://tscoaching.co.uk/l1R-q60-pe/\r\nhttp://www.doodle.tj/yW1NZ-sh00-cH/\r\nRange adjustment and additiona case changes.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{4,7}-[a-z0-9]{4,5}-[a-z]{2,5}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-\r\n9]{4,7}-[a-zA-Z0-9]{4,5}-[a-zA-Z]{1,5}\\/$\r\n[Group 02] - [ INVOICE-864339-98261 ]\r\nhttp://toppprogramming.com/Cust-8328499631/\r\nThis could be a different campaign as it breaks from the double-dashes but it's so similar to group 2 that I'll leave\r\nit for now and possibly revisit.\r\nThe second dash I'll make optional which should allow the lowest ranges of the numerical sections to match. I'll\r\nuse an optinal capturing group (\"(-)?\") for the second dash. Effectively creating a capture group and then using the\r\n\"?\" value after will cause the group to match between zero and one time, thus becoming optional.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\\/$ NEW:\r\n^http:\\/\\/([^\\x2F]+\\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}(-)?[0-9]{4,5}\\/$\r\n[Group 04] - [ RRT-13279129.dokument ]\r\nhttp://tridentii.com/OY-30676027.dokument/\r\nRange adjustment.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{3}-[0-9]{8}\\.dokument\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{2,3}-[0-9]\r\n{8}\\.dokument\\/$\r\n[Group 05] - [ EDHFR-08-77623-document-May-04-2017 ]\r\nhttp://tiger12.com/TGA-48-76252-doc-May-04-2017/ http://voxellab.com/BBM-07-75350-doc-May-04-2017/\r\nhttp://wb0rur.com/ZGAG-59-63863-doc-May-05-2017/\r\nAdd \"doc\" string to grouping.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{3,5}-[0-9]{2}-[0-9]{5}-document-May-[0-9]{2}-2017\\/$ NEW:\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{3,5}-[0-9]{2}-[0-9]{5}-(document|doc)-May-[0-9]{2}-2017\\/$\r\n[Group 07] - [ LUqc663BAyN333-HoO ]\r\nhttp://tpsystem.net/TaVS391hyCaD623-dJ/ http://uncover.jp/XwXL806QaDN792-jr/\r\nhttp://wayanad.net/xhW017TRfP646-z/\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 9 of 23\n\nRange adjustment.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{11,14}-[a-zA-Z]{3}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{11,15}-\r\n[a-zA-Z]{1,3}\\/$\r\n[Group 08] - [ NUDA-X-52454-DE ]\r\nhttp://visionsoflightphotography.com/FRMLW-RNT-41482-DE/\r\nRange adjustment.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{4}-[A-Z]{1}-[0-9]{5}-[A-Z]{2}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{4,5}-\r\n[A-Z]{1,3}-[0-9]{5}-[A-Z]{2}\\/$\r\n[Group 09] - [ CUST.-Document-YDI-04-GQ389557 ]\r\nhttp://thesubservice.com/ORDER.-Document-9543529814/ http://theuntoldsorrow.co.uk/ORDER.-XI-80-\r\nUY913942/ http://visuals.com/CUST.-VT-38-RH422386/\r\nCouple of things going on here.\r\nNew grouping of words for second part and first entry is only numerical without dashes, which looks similar to\r\nthe new entry for Group 2. To account for these, I'll use optional capturing groups again to build around them. It\r\nmakes the rule slightly less accurate but with the other anchors in it, I think it'll still be fairly unique enough to not\r\nFP.\r\nNEW: ^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER)\\.-Document-[A-Z]{2,3}-[0-9]{2}-[A-Z0-9]{7,8}\\/$ OLD:\r\n^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER)\\.-(Document|XI|VT)((-[A-Z]{2,3})?-[0-9]{2})?-[A-Z0-9]{7,10}\\/$\r\n[Group 10] - [ zp3x-r88-wuh.view ]\r\nhttp://theocforrent.com/BG-47535325/zp3x-r88-wuh.view/ http://uncover.jp/r-2psl-vo440-lz.doc/\r\nhttp://vspacecreative.co.uk/O2-view-report-818/c1o-jn07-er.view/\r\nThe \"doc\" and \"view\" ones may be different campaigns but, again, I'll lump them together for now and will\r\nseparate at the end if necessary: 1-4 alpha(lower)numeric, dash, 3-4 alpha(lower)numeric, dash, optional 5\r\nalpha(lower)numeric, dash, 2-3 alpha(lower), period, group \"view\" or \"doc\" strings.\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]{1,4}-[a-z0-9]{3,4}(-[a-z0-9]{5})?-[a-z]{2,3}\\.(view|doc)\\/$\r\nThe pcre_check output shows decent coverage improvements.\r\n[+] FOUND [+] Count: 93/696 (+27) Comment: [Group 01] - [ t5wx-x064-mzdb ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+\r\n[a-zA-Z0-9]{4,7}-[a-zA-Z0-9]{4,5}-[a-zA-Z]{1,5}\\/$ [+] FOUND [+] Count: 89/696 (+9) Comment: [Group 02]\r\n- [ INVOICE-864339-98261 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}\r\n(-)?[0-9]{4,5}\\/$ [+] FOUND [+] Count: 190/696 Comment: [Group 03] - [ H27560xzwsS ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{7,23}\\/$ [+] FOUND [+] Count: 56/696 (+26) Comment: [Group 04] - [ RRT-http://ropgadget.com/posts/defensive_pcres.html\r\nPage 10 of 23\n\n13279129.dokument ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{2,3}-[0-9]{8}\\.dokument\\/$ [+] FOUND [+] Count:\r\n79/696 (+20) Comment: [Group 05] - [ EDHFR-08-77623-document-May-04-2017 ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{3,5}-[0-9]{2}-[0-9]{5}-(document|doc)-May-[0-9]{2}-2017\\/$ [+] FOUND [+]\r\nCount: 62/696 Comment: [Group 06] - [ download2467 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+download[0-9]{4}\\/$ [+]\r\nFOUND [+] Count: 43/696 (+28) Comment: [Group 07] - [ LUqc663BAyN333-HoO ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{11,15}-[a-zA-Z]{1,3}\\/$ [+] FOUND [+] Count: 10/696 (+7) Comment:\r\n[Group 08] - [ NUDA-X-52454-DE ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{4,5}-[A-Z]{1,3}-[0-9]{5}-[A-Z]{2}\\/$\r\n[+] FOUND [+] Count: 31/696 (+11) Comment: [Group 09] - [ CUST.-Document-YDI-04-GQ389557 ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER)\\.-(Document|XI|VT)((-[A-Z]{2,3})?-[0-9]{2})?-[A-Z0-9]{7,10}\\/$ [+]\r\nFOUND [+] Count: 3/696 Comment: [Group 10] - [ zp3x-r88-wuh.view ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]\r\n{1,4}-[a-z0-9]{3,4}(-[a-z0-9]{5})?-[a-z]{2,3}\\.(view|doc)\\/$\r\nRound 4\r\nThe next 20 sites.\r\nhttp://pinoypiper.com/Sz1Mr-H23-Xw/ http://proiecte-pac.ro/ORDER.-5883789520/ http://proprints.dk/Rech-74779857260/ http://pulmad.ee/B6y-Fb95-NMW/ http://redkitecottages.com/Cust-Document-VMH-46-\r\nTJ804065/ http://reichertgroup.com/d0r-tl410-cxa/ http://sgbusiness.co.uk/YM-57911235-document-May-03-\r\n2017/ http://sign1.no/dhl___status___2668292851/ http://sloan3d.com/Cust-Document-WMV-26-EW054554/\r\nhttp://stacibockman.com/g2c-o179-pocja/ http://streamingair.com/i0A-St59-m/ http://sublevel3.us/G7n-Gh58-y/\r\nhttp://superalumnos.net/php/ORDER.-HW-84-Y947883/ http://technetemarketing.com/CUST.-8520279770/\r\nhttp://teed.ru/YG-47124992/bc7za-l30-v.view/ http://texasbrits.com/m3s-r623-x/ http://thegilbertlawoffice.com/m-9q-d054-gu.doc/ http://thenursesagent.com/ORDER.-9592209302/ http://transfinity.co.uk/sam/fathers-day/htdocs/b2m-qp699-jxmln/ http://tscoaching.co.uk/l1R-q60-pe/\r\nOne new variant sticks out, otherwise business as usual.\r\n[Group 01] - [ t5wx-x064-mzdb ]\r\nhttp://pinoypiper.com/Sz1Mr-H23-Xw/ http://pulmad.ee/B6y-Fb95-NMW/ http://reichertgroup.com/d0r-tl410-\r\ncxa/ http://stacibockman.com/g2c-o179-pocja/ http://streamingair.com/i0A-St59-m/ http://sublevel3.us/G7n-Gh58-y/ http://texasbrits.com/m3s-r623-x/ http://transfinity.co.uk/sam/fathers-day/htdocs/b2m-qp699-jxmln/\r\nhttp://tscoaching.co.uk/l1R-q60-pe/\r\nHalf of the 20 are for this group. Just some small range adjustments.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{4,7}-[a-zA-Z0-9]{4,5}-[a-zA-Z]{1,5}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{3,7}-[a-zA-Z0-9]{3,5}-[a-zA-Z]{1,5}\\/$\r\n[Group 02] - [ INVOICE-864339-98261 ]\r\nhttp://proiecte-pac.ro/ORDER.-5883789520/ http://proprints.dk/Rech-74779857260/\r\nhttp://technetemarketing.com/CUST.-8520279770/ http://thenursesagent.com/ORDER.-9592209302/\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 11 of 23\n\nIt should be apparent now that Group 2 and 9 have a bit of overlap and I was going to wait till the end to course\r\ncorrect; however, I feel it's just too much at this point so I'm going to split it so the ones above, and previously\r\nmatched in both groups, with the \"ORDER\" and \"CUST\" strings followed by 10 digits are a new unique group.\r\nThat means I need to edit Group 2 and 9 to avoid these and the simplest way of doing that is removing the\r\nprevious optional dash, making it absolutely required. See Group 9 and 12 for further iteration details.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}(-)?[0-9]{4,5}\\/$ NEW:\r\n^http:\\/\\/([^\\x2F]+\\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\\/$\r\n[Group 05] - [ EDHFR-08-77623-document-May-04-2017 ]\r\nhttp://sgbusiness.co.uk/YM-57911235-document-May-03-2017/\r\nThis new one breaks from the two parts separated by a dash. I can add the dash to the character list and up the\r\nrange, or I can opt for a optional grouping and up the range. I'm going to do the latter for the reason that it keeps\r\nthe structure in tact; for this, I'm not as worried about FP's due to the ending part of the pattern being fairly unique.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{3,5}-[0-9]{2}-[0-9]{5}-(document|doc)-May-[0-9]{2}-2017\\/$ NEW:\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{2,5}(-[0-9]{2})?-[0-9]{5,10}-(document|doc)-May-[0-9]{2}-2017\\/$\r\n[Group 09] - [ CUST.-Document-YDI-04-GQ389557 ]\r\nhttp://redkitecottages.com/Cust-Document-VMH-46-TJ804065/ http://sloan3d.com/Cust-Document-WMV-26-\r\nEW054554/ http://superalumnos.net/php/ORDER.-HW-84-Y947883/\r\nSimilar to Group 2, I'm going to reverse course on the optional groupings so that the 10 digits are not captured. To\r\naccount for the new variants in Group 9, I'm adding an optional grouping for the period after the first word and for\r\nthe \"Document\" string, then moving the others back into the A-Z grouping that followed.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER)\\.-(Document|XI|VT)((-[A-Z]{2,3})?-[0-9]{2})?-[A-Z0-9]{7,10}\\/$\r\nNEW: ^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER|Cust)(.)?(-Document)?-[A-Z]{2,3}-[0-9]{2}-[A-Z0-9]{7,10}\\/$\r\n[Group 10] - [ zp3x-r88-wuh.view ]\r\nhttp://thegilbertlawoffice.com/m-9q-d054-gu.doc/\r\nRange adjustment.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]{1,4}-[a-z0-9]{3,4}(-[a-z0-9]{5})?-[a-z]{2,3}\\.(view|doc)\\/$ NEW:\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]{1,4}-[a-z0-9]{3,4}(-[a-z0-9]{4,5})?-[a-z]{2,3}\\.(view|doc)\\/$\r\n[Group 11] - [ dhl___status___2668292851 ]\r\nhttp://sign1.no/dhl___status___2668292851/\r\nNot much to work with yet so it's fairly static.\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 12 of 23\n\n^http:\\/\\/([^\\x2F]+\\/)+dhl___status___[0-9]{10}\\/$\r\n[Group 12] - [ ORDER.-5883789520 ]\r\nPrevious set: http://thesubservice.com/ORDER.-Document-9543529814/ http://toppprogramming.com/Cust-8328499631/ Current set: http://proiecte-pac.ro/ORDER.-5883789520/ http://proprints.dk/Rech-74779857260/\r\nhttp://technetemarketing.com/CUST.-8520279770/ http://thenursesagent.com/ORDER.-9592209302/\r\nLooking at the data in Group 2 and 9, this pattern will have: string grouping of \"ORDER\", \"RECH\", \"CUST\",\r\n\"Cust\", optional period, dash, optional \"Document\" string, 10-11 numbers. By the way, \"rech\" is shorthand for\r\n\"rechnung\", which is German for \"bill\" - you see these variations quite a bit in phishing campaigns as they focus\r\non different regions.\r\n^http:\\/\\/([^\\x2F]+\\/)+(ORDER|Rech|CUST|Cust)(.)?(-Document)?-[0-9]{10,11}\\/$\r\nNext iteration below.\r\n[+] FOUND [+] Count: 127/696 (+34) Comment: [Group 01] - [ t5wx-x064-mzdb ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+\r\n[a-zA-Z0-9]{3,7}-[a-zA-Z0-9]{3,5}-[a-zA-Z]{1,5}\\/$ [+] FOUND [+] Count: 80/696 (-9) Comment: [Group 02] -\r\n[ INVOICE-864339-98261 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-\r\n9]{4,5}\\/$ [+] FOUND [+] Count: 190/696 Comment: [Group 03] - [ H27560xzwsS ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{7,23}\\/$ [+] FOUND [+] Count: 56/696 Comment: [Group 04] - [ RRT-13279129.dokument ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{2,3}-[0-9]{8}\\.dokument\\/$ [+] FOUND [+] Count:\r\n86/696 (+7) Comment: [Group 05] - [ EDHFR-08-77623-document-May-04-2017 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+\r\n[A-Z]{2,5}(-[0-9]{2})?-[0-9]{5,10}-(document|doc)-May-[0-9]{2}-2017\\/$ [+] FOUND [+] Count: 62/696\r\nComment: [Group 06] - [ download2467 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+download[0-9]{4}\\/$ [+] FOUND [+]\r\nCount: 43/696 Comment: [Group 07] - [ LUqc663BAyN333-HoO ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]\r\n{11,15}-[a-zA-Z]{1,3}\\/$ [+] FOUND [+] Count: 10/696 Comment: [Group 08] - [ NUDA-X-52454-DE ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{4,5}-[A-Z]{1,3}-[0-9]{5}-[A-Z]{2}\\/$ [+] FOUND [+] Count: 36/696 (+5)\r\nComment: [Group 09] - [ CUST.-Document-YDI-04-GQ389557 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+\r\n(CUST|ORDER|Cust)(.)?(-Document)?-[A-Z]{2,3}-[0-9]{2}-[A-Z0-9]{7,10}\\/$ [+] FOUND [+] Count: 3/696\r\nComment: [Group 10] - [ zp3x-r88-wuh.view ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]{1,4}-[a-z0-9]{3,4}(-[a-z0-9]\r\n{4,5})?-[a-z]{2,3}\\.(view|doc)\\/$ [+] FOUND [+] Count: 3/696 Comment: [Group 11] - [\r\ndhl___status___2668292851 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+dhl___status___[0-9]{10}\\/$ [+] FOUND [+] Count:\r\n31/696 Comment: [Group 12] - [ ORDER.-5883789520 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+\r\n(ORDER|Rech|CUST|Cust)(.)?(-Document)?-[0-9]{10,11}\\/$\r\nRound 5\r\nSince there are only 31 URL's left I'm just going to add them all here and close out this phase.\r\nhttp://akhmerov.com/AuHffUo4L1BcEmca0BW5e4UtI/ http://albrightfinancial.com/gescanntes-Dokument-66764196575/ http://anjep.com/TBWEV-YCAP-91327-DE/ http://arroyave.net/Rech-K-682-GO1130/\r\nhttp://beowulf7.com/kgcee/ http://bitach.com/RIJW-FNFE-86299-DE/ http://bobrow.com/ito-6r-w193-pkr.doc/\r\nhttp://boningue.com/g843enx500-Jh/ http://carriedavenport.com/Scan-58146582290/\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 13 of 23\n\nhttp://davidberman.com/gescanntes-Dokument-85218870046/ http://dentaltravelpoland.co.uk/NUN-63376893/b4fe-nn88-s.view/ http://donnjo.com/Rechnung-IOOY-776-LUV2894/\r\nhttp://frossweddingcollections.co.uk/qdu-7p-wi523-hgnt.doc/ http://froufrouandthomas.co.uk/c644kNg297-uy/\r\nhttp://gabrielramos.com.br/lxu-3h-ip079-zgmg.doc/ http://genxvisual.com/U494KHq064-VK/ http://gestion-arte.com.ar/CLCJY-EMIE-76216-DE/ http://imnet.ro/gcxbh/ http://johncarta.com/jexaag/\r\nhttp://kowalenko.ca/D603ImA780-xxJ/ http://kratiroff.com/Scan-62799108494/ http://lapetitenina.com/eyym/\r\nhttp://magmaprod.com.br/FcmUZ9GGTFaq2SYC5HTuFgc4v7/ http://masmp.com/rby-4c-rp108-sqq.doc/\r\nhttp://missgypsywhitemoon.com.au/ismpce/ http://music111.com/VAQT-DYBC-27274-DE/\r\nhttp://myhorses.ca/lb8TApg9aZI6PP5RWRAIdmfU/ http://onlineme.w04.wh-2.com/LD-36666076/ir5r-mu75-\r\nh.view/ http://phoneworx.co.uk/HLqwOU1uNQ7rWLWkXW6VoMheZf/ http://teed.ru/YG-47124992/bc7za-l30-\r\nv.view/ http://thegilbertlawoffice.com/m-9q-d054-gu.doc/\r\n[Group 03] - [ H27560xzwsS ]\r\nhttp://akhmerov.com/AuHffUo4L1BcEmca0BW5e4UtI/ http://beowulf7.com/kgcee/ http://imnet.ro/gcxbh/\r\nhttp://johncarta.com/jexaag/ http://lapetitenina.com/eyym/\r\nhttp://magmaprod.com.br/FcmUZ9GGTFaq2SYC5HTuFgc4v7/\r\nhttp://myhorses.ca/lb8TApg9aZI6PP5RWRAIdmfU/\r\nhttp://phoneworx.co.uk/HLqwOU1uNQ7rWLWkXW6VoMheZf/\r\nI'll adjust the ranges on this one but you can see from the above that it looks like two distinct campaigns. I have no\r\ndoubt now that there will be more in this grouping but since it's almost over 200 URL's I'll review the entire set at\r\nthe end.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{7,23}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{4,26}\\/$\r\n[Group 07] - [ LUqc663BAyN333-HoO ]\r\nhttp://boningue.com/g843enx500-Jh/ http://froufrouandthomas.co.uk/c644kNg297-uy/\r\nhttp://genxvisual.com/U494KHq064-VK/ http://kowalenko.ca/D603ImA780-xxJ/\r\nRange adjustment.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{11,15}-[a-zA-Z]{1,3}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]\r\n{10,15}-[a-zA-Z]{1,3}\\/$\r\n[Group 08] - [ NUDA-X-52454-DE ]\r\nhttp://anjep.com/TBWEV-YCAP-91327-DE/ http://bitach.com/RIJW-FNFE-86299-DE/ http://gestion-arte.com.ar/CLCJY-EMIE-76216-DE/ http://music111.com/VAQT-DYBC-27274-DE/\r\nRange adjustment. Curious these all end with \"DE\" too, possibly region based given the \"Rech\" stuff seen\r\npreviously; will follow-up after.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{4,5}-[A-Z]{1,3}-[0-9]{5}-[A-Z]{2}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]\r\n{4,5}-[A-Z]{1,4}-[0-9]{5}-[A-Z]{2}\\/$\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 14 of 23\n\n[Group 09] - [ CUST.-Document-YDI-04-GQ389557 ]\r\nhttp://arroyave.net/Rech-K-682-GO1130/ http://donnjo.com/Rechnung-IOOY-776-LUV2894/\r\nAdded \"Rech\" and \"Rechnung\" to initial string grouping along with expanding some ranges.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER|Cust)(.)?(-Document)?-[A-Z]{2,3}-[0-9]{2}-[A-Z0-9]{7,10}\\/$\r\nNEW: ^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z0-9]{6,10}\\/$\r\n[Group 10] - [ zp3x-r88-wuh.view ]\r\nhttp://bobrow.com/ito-6r-w193-pkr.doc/ http://dentaltravelpoland.co.uk/NUN-63376893/b4fe-nn88-s.view/\r\nhttp://frossweddingcollections.co.uk/qdu-7p-wi523-hgnt.doc/ http://gabrielramos.com.br/lxu-3h-ip079-zgmg.doc/\r\nhttp://masmp.com/rby-4c-rp108-sqq.doc/ http://onlineme.w04.wh-2.com/LD-36666076/ir5r-mu75-h.view/\r\nhttp://teed.ru/YG-47124992/bc7za-l30-v.view/ http://thegilbertlawoffice.com/m-9q-d054-gu.doc/\r\nRange adjustment.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]{1,4}-[a-z0-9]{3,4}(-[a-z0-9]{4,5})?-[a-z]{2,3}\\.(view|doc)\\/$ NEW:\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]{1,5}-[a-z0-9]{2,4}(-[a-z0-9]{4,5})?-[a-z]{1,4}\\.(view|doc)\\/$\r\n[Group 12] - [ ORDER.-5883789520 ]\r\nhttp://albrightfinancial.com/gescanntes-Dokument-66764196575/ http://carriedavenport.com/Scan-58146582290/\r\nhttp://davidberman.com/gescanntes-Dokument-85218870046/ http://kratiroff.com/Scan-62799108494/\r\nAdded \"gescanntes\" to initial string grouping (this is Dutch for \"Scanned\") and \"Scan\". Added \"Dokument\" to\r\nsecond optional grouping.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+(ORDER|Rech|CUST|Cust)(.)?(-Document)?-[0-9]{10,11}\\/$ NEW:\r\n^http:\\/\\/([^\\x2F]+\\/)+(ORDER|Rech|CUST|Cust|gescanntes|Scan)(.)?(-Document|-Dokument)?-[0-9]{10,11}\\/$\r\nAlright, now I've cleared all of the remaining matches.\r\n[+] FOUND [+] Count: 127/696 Comment: [Group 01] - [ t5wx-x064-mzdb ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{3,7}-[a-zA-Z0-9]{3,5}-[a-zA-Z]{1,5}\\/$ [+] FOUND [+] Count: 80/696 Comment: [Group 02] - [\r\nINVOICE-864339-98261 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-\r\n9]{4,5}\\/$ [+] FOUND [+] Count: 199/696 (+9) Comment: [Group 03] - [ H27560xzwsS ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{4,26}\\/$ [+] FOUND [+] Count: 56/696 Comment: [Group 04] - [ RRT-13279129.dokument ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{2,3}-[0-9]{8}\\.dokument\\/$ [+] FOUND [+] Count:\r\n86/696 Comment: [Group 05] - [ EDHFR-08-77623-document-May-04-2017 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]\r\n{2,5}(-[0-9]{2})?-[0-9]{5,10}-(document|doc)-May-[0-9]{2}-2017\\/$ [+] FOUND [+] Count: 62/696 Comment:\r\n[Group 06] - [ download2467 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+download[0-9]{4}\\/$ [+] FOUND [+] Count: 47/696\r\n(+4) Comment: [Group 07] - [ LUqc663BAyN333-HoO ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{10,15}-[a-zA-Z]{1,3}\\/$ [+] FOUND [+] Count: 14/696 (+4) Comment: [Group 08] - [ NUDA-X-52454-DE ] PCRE:\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 15 of 23\n\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{4,5}-[A-Z]{1,4}-[0-9]{5}-[A-Z]{2}\\/$ [+] FOUND [+] Count: 38/696 (+2)\r\nComment: [Group 09] - [ CUST.-Document-YDI-04-GQ389557 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+\r\n(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z0-9]{6,10}\\/$ [+] FOUND\r\n[+] Count: 11/696 (+1) Comment: [Group 10] - [ zp3x-r88-wuh.view ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]\r\n{1,5}-[a-z0-9]{2,4}(-[a-z0-9]{4,5})?-[a-z]{1,4}\\.(view|doc)\\/$ [+] FOUND [+] Count: 3/696 Comment: [Group\r\n11] - [ dhl___status___2668292851 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+dhl___status___[0-9]{10}\\/$ [+] FOUND [+]\r\nCount: 35/696 (+4) Comment: [Group 12] - [ ORDER.-5883789520 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+\r\n(ORDER|Rech|CUST|Cust|gescanntes|Scan)(.)?(-Document|-Dokument)?-[0-9]{10,11}\\/$\r\nRound 6\r\nThe next step is to validate the matches with the \"-s\" flag in pcre_check. This will show all of the respective\r\nmatches under each PCRE. For this phase, I just eyeball it to make sure there is no overlap and what's expected in\r\neach group is present.\r\nAll of the PCRE's look solid except Group 3, which I already mentioned would need more TLC, as it overlaps\r\nwith other PCRE's.\r\nFor Group 3, I'm going to visually break these down. I'll put 5 examples under each sub-grouping to show how I\r\nseparated them. Some are very good for matching while others will just have to be left behind. TAKE NOTE BAD\r\nGUYS, BEING GENRIC IS GOOD, UNIQUE SNOWFLAKES ARE THE FIRST AGAINST THE WALL.\r\n[Group 03] - [ dhl/paket/com/pkp/appmanager/8376315127 ]\r\nhttp://8kindsoffun.com/dhl/paket/com/pkp/appmanager/8376315127/\r\nhttp://balletopia.org/dhl/paket/com/pkp/appmanager/7293445574/\r\nhttp://cnwconsultancy.com/dhl/paket/com/pkp/appmanager/0622636111/\r\nhttp://cookieco.com/dhl/paket/com/pkp/appmanager/8333287922/\r\nhttp://cspdx.com/dhl/paket/com/pkp/appmanager/6213914600/\r\nI think thins one would have stood out earlier had it not been clobbered by the previous PCRE. The path is very\r\nunique and ends with 10 digits. This PCRE will replace the old one for Group 3 and the other new ones will start\r\nat Group 13.\r\n^http:\\/\\/([^\\x2F]+\\/)+dhl\\/paket\\/com\\/pkp\\/appmanager\\/[0-9]{10}\\/$\r\n[Group 13] - [ 6572646300 ]\r\nhttp://alfareklama.cz/6572646300/ http://algicom.net/6673413599/ http://bourdin.name/0014489972/\r\nhttp://carbitech.net/dhl/2354409458/ http://dsltech.co.uk/0217183208/ ...\r\nhttp://oscartvazquez.com/DHL24/15382203695/\r\nI'm going to create a PCRE for this one but I don't expect it to live past the FP check. There is one that stands off\r\nfrom the rest here with 11 numbers instead of 10 - it may be that I just don't have enough samples to account for\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 16 of 23\n\nthat campaign. Finally, I'll need to exclude the previous set of matches which also end with 10 digits. To do this,\r\nI'll use a negative lookbehind to ensure once we match 10 digits, \"appmanager\" was not in the URL path.\r\n^http:\\/\\/([^\\x2F]+\\/)+(?\u003c!appmanager\\/)[0-9]{10,11}\\/$\r\n[ alpha(lower) ]\r\nhttp://aifesdespets.fr/kkrxtsmodw/ http://beowulf7.com/kgcee/ http://bunngalow.com/injeutznnb/\r\nhttp://carbofilms.com/cms/wp-content/upgrade/jcnfkvken/ http://dolphinrunvb.com/yozypdznpb/\r\nI don't see any good patterns in this set or the next one.\r\n[ alpha(lower)numeric ]\r\nhttp://benard.ca/z49641l/ http://jaqua.us/hid4kiwcvd84fljkpqpl/ http://krakhud.pl/rguen0ebxndrci41frworbr/\r\nhttp://micromatrices.com/qwh7zxijifxsnxg20mlwa/ http://patu.ch/bgrvm2wqpjw74hz/\r\n[Group 14] [ SCANNED/RZ7498WEXEZB ]\r\nhttp://icaredentalstudio.com/APE88743TZ/ http://lbcd.se/MFV09235UA/\r\nhttp://lucasliftruck.com/SCANNED/RZ7498WEXEZB/ http://meanconsulting.com/K44975X/\r\nhttp://sentios.lt/W95941C/ http://triadesolucoes.com.br/SCANNED/RBA6517MHPKCZDEX/\r\nhttp://xyphoid.com/SCANNED/MM3431UCNPCEZRO/ http://zahahadidmiami.com/K38258Q/\r\nThis group was characterized by alpha(upper)numeric, which normally wouldn't be worth pattern matching, but I\r\ncan see two patterns in the above that may be worth entertaining. For Group 14, I'll match on the URL's with\r\n\"SCANNED\" string in the path and the unique placement of the digits within the string: 2-3 alpha(upper), 4 digits,\r\n6-9 alpha(upper).\r\n^http:\\/\\/([^\\x2F]+\\/)+SCANNED\\/[A-Z]{2,3}[0-9]{4}[A-Z]{6,9}\\/$\r\n[Group 15] [ K44975X ]\r\nhttp://meanconsulting.com/K44975X/ http://sentios.lt/W95941C/ http://zahahadidmiami.com/K38258Q/\r\nFor Group 15, I'll match on 1 alpha(upper), 5 digits, 1 alpha(upper). The non-matched ones in the previous Group\r\n14 may be an expanded part of this campaign but it's such a weak PCRE and prone to FP that I'm not going to\r\nbother with it. It's highly likely to not make the final cut either way.\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{1}[0-9]{5}[A-Z]{1}\\/$\r\n[ alphanumeric long 18-26 ]\r\nhttp://akhmerov.com/AuHffUo4L1BcEmca0BW5e4UtI/ http://arosa.nl/crm/xs2ckmwotgcml95cxdhbo/\r\nhttp://crosslink.ca/nWlKL3PdKyi1goahyZfbNr/ http://ideaswebstudio.com/v3mzbzaink00sndmyz/\r\nhttp://infojass.com/gvtsl7ddrnjkupn50pp/\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 17 of 23\n\nNothing jumps out at me that would make for a good PCRE. It has a similar structure of alpha, digit, alpha but the\r\nranges are very broad which makes it highly prone to FP again.\r\n[ alphanumeric short 7-14 ]\r\nhttp://akirmak.com/QhS33472le/ http://austinaaron.com/eCjH94174LaN/ http://campanus.cz/N6571iwA/\r\nhttp://carolsgardeninn.com/vX94098JvVJ/ http://cdoprojectgraduation.com/eaSz15612O/ ...\r\nhttp://www.alfredomartinez.com.mx/Afz3999lDtz/ http://www.kreodesign.pl/test/O77405ccSC/\r\nhttp://www.prodzakaz.com.ua/H27560xzwsS/ http://www.voloskof.net/Sn83160EngQs/\r\nhttp://zonasacra.com/zH83293YizhQ/\r\nThis next one follows the same pattern I identified for Group 15: 1-5 alphanumeric, 4-5 digits, 1-5 alphanumeric.\r\nI'll just update Group 15 and see how it fairs in the FP check, but for what it's worth, it does match every single\r\nentry in this category which had 30+.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{1}[0-9]{5}[A-Z]{1}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[A-Za-z]{1,4}[0-9]{4,5}\r\n[a-zA-Z]{1,5}\\/$\r\nRefinement\r\nNow that everything is clustered together, I'll do one final visual inspection to see if any other patterns jump out\r\nthat allow us to tighten the rules up and avoid FP's.\r\n[Group 01] - [ t5wx-x064-mzdb ]\r\nhttp://12back.com/dw3wz-ue164-qqv/ http://4glory.net/p7lrq-s191-iv/ http://aconai.fr/v4OZ-PR72-gtS/\r\nhttp://adamkranitz.com/gqj5ijg-y250-ex/ http://allisonhibbard.com/x4b-th601-m/\r\nIn Group 1, we can actually refine this a bit once you see the underlying pattern. Almost every part of this one\r\nchanged so I'll just go back over it: 1-3 alpha, 1 digit, 1-3 alpha, dash, 1-2 alpha, 2-3 digit, dash, 1-5 alpha.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{3,7}-[a-zA-Z0-9]{3,5}-[a-zA-Z]{1,5}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z]{1,3}[0-9]{1}[a-zA-Z]{1,3}-[a-zA-Z]{1,2}[0-9]{2,3}-[a-zA-Z]{1,5}\\/$\r\n[Group 07] - [ LUqc663BAyN333-HoO ]\r\nhttp://agenity.com/EAVx829uahI723-tv/ http://argoinf.com/YFSR334KgXCe907-z/\r\nhttp://artmedieval.net/RK415njzzR555-p/ http://autoradio.com.br/fRq804tvz270-tWa/ http://belief-systems.com/obn247eaC420-Z/\r\nIn Group 7, the first part of the pattern can be refined: 1-4 alphanumeric, 3 digits, 1-5 alphanumeric, 3 digits.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{10,15}-[a-zA-Z]{1,3}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{1,4}\r\n[0-9]{3}[a-zA-Z]{1,5}[0-9]{3}-[a-zA-Z]{1,3}\\/$\r\n[Group 08] - [ NUDA-X-52454-DE ]\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 18 of 23\n\nhttp://altius.co.in/EJZB-T-66361-DE/ http://anjep.com/TBWEV-YCAP-91327-DE/ http://aquarthe.com/AIUO-P-70826-DE/ http://bitach.com/RIJW-FNFE-86299-DE/ http://cliftonsecurities.co.uk/YJTX-NMO-51102-DE/\r\nIn Group 8 they all end with \"DE\" so I'll convert that part to a static string.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{4,5}-[A-Z]{1,4}-[0-9]{5}-[A-Z]{2}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]\r\n{4,5}-[A-Z]{1,4}-[0-9]{5}-DE\\/$\r\n[Group 09] - [ CUST.-Document-YDI-04-GQ389557 ]\r\nhttp://archabits.com/ORDER.-AXN-60-X400251/ http://arrosio.com.ar/ORDER.-Document-SF-41-F318806/\r\nhttp://arroyave.net/Rech-K-682-GO1130/ http://avenueevents.co.uk/Cust-PBP-03-D683320/\r\nhttp://babyo.com.mx/Cust-Document-KEQ-04-FF065857/\r\nIn Group 9, every entry entry ends with 1-3 alpha(upper) followed by 4-6 digits.\r\nOLD: ^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z0-9]{6,10}\\/$ NEW: ^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-\r\n[0-9]{2,3}-[A-Z]{1,3}[0-9]{4,6}\\/$\r\nThe final run for the PCRE's before FP testing.\r\n[+] FOUND [+] Count: 127/696 Comment: [Group 01] - [ t5wx-x064-mzdb ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z]{1,3}[0-9]{1}[a-zA-Z]{1,3}-[a-zA-Z]{1,2}[0-9]{2,3}-[a-zA-Z]{1,5}\\/$ [+] FOUND [+] Count: 80/696\r\nComment: [Group 02] - [ INVOICE-864339-98261 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+\r\n(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\\/$ [+] FOUND [+] Count: 29/696 (changed to\r\nnew pattern) Comment: [Group 03] - [ dhl/paket/com/pkp/appmanager/8376315127 ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+dhl\\/paket\\/com\\/pkp\\/appmanager\\/[0-9]{10}\\/$ [+] FOUND [+] Count: 56/696 Comment:\r\n[Group 04] - [ RRT-13279129.dokument ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{2,3}-[0-9]{8}\\.dokument\\/$ [+]\r\nFOUND [+] Count: 86/696 Comment: [Group 05] - [ EDHFR-08-77623-document-May-04-2017 ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{2,5}(-[0-9]{2})?-[0-9]{5,10}-(document|doc)-May-[0-9]{2}-2017\\/$ [+] FOUND [+]\r\nCount: 62/696 Comment: [Group 06] - [ download2467 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+download[0-9]{4}\\/$ [+]\r\nFOUND [+] Count: 47/696 Comment: [Group 07] - [ LUqc663BAyN333-HoO ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{1,4}[0-9]{3}[a-zA-Z]{1,5}[0-9]{3}-[a-zA-Z]{1,3}\\/$ [+] FOUND [+] Count: 14/696 Comment:\r\n[Group 08] - [ NUDA-X-52454-DE ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{4,5}-[A-Z]{1,4}-[0-9]{5}-DE\\/$ [+]\r\nFOUND [+] Count: 38/696 Comment: [Group 09] - [ CUST.-Document-YDI-04-GQ389557 ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z]{1,3}\r\n[0-9]{4,6}\\/$ [+] FOUND [+] Count: 11/696 Comment: [Group 10] - [ zp3x-r88-wuh.view ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]{1,5}-[a-z0-9]{2,4}(-[a-z0-9]{4,5})?-[a-z]{1,4}\\.(view|doc)\\/$ [+] FOUND [+]\r\nCount: 3/696 Comment: [Group 11] - [ dhl___status___2668292851 ] PCRE:\r\n^http:\\/\\/([^\\x2F]+\\/)+dhl___status___[0-9]{10}\\/$ [+] FOUND [+] Count: 35/696 Comment: [Group 12] - [\r\nORDER.-5883789520 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+(ORDER|Rech|CUST|Cust|gescanntes|Scan)(.)?(-\r\nDocument|-Dokument)?-[0-9]{10,11}\\/$ [+] FOUND [+] Count: 15/696 Comment: [Group 13] - [ 6572646300 ]\r\nPCRE: ^http:\\/\\/([^\\x2F]+\\/)+(?\u003c!appmanager\\/)[0-9]{10,11}\\/$ [+] FOUND [+] Count: 3/696 Comment: [Group\r\n14] [ SCANNED/RZ7498WEXEZB ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+SCANNED\\/[A-Z]{2,3}[0-9]{4}[A-Z]\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 19 of 23\n\n{6,9}\\/$ [+] FOUND [+] Count: 60/696 Comment: [Group 15] [ K44975X ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Za-z]{1,4}[0-9]{4,5}[a-zA-Z]{1,5}\\/$\r\nThat leaves only 30 URL's that I was unable to reliably match - not too shabby! You can find the output of the\r\npcre_check script showing the matches and non-matches HERE.\r\nThe current PCRE list is below.\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z]{1,3}[0-9]{1}[a-zA-Z]{1,3}-[a-zA-Z]{1,2}[0-9]{2,3}-[a-zA-Z]{1,5}\\/$ [Group 01]\r\n- [ t5wx-x064-mzdb ] ^http:\\/\\/([^\\x2F]+\\/)+(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\\/$\r\n[Group 02] - [ INVOICE-864339-98261 ] ^http:\\/\\/([^\\x2F]+\\/)+dhl\\/paket\\/com\\/pkp\\/appmanager\\/[0-9]{10}\\/$\r\n[Group 03] - [ dhl/paket/com/pkp/appmanager/8376315127 ] ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{2,3}-[0-9]\r\n{8}\\.dokument\\/$ [Group 04] - [ RRT-13279129.dokument ] ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{2,5}(-[0-9]{2})?-[0-9]\r\n{5,10}-(document|doc)-May-[0-9]{2}-2017\\/$ [Group 05] - [ EDHFR-08-77623-document-May-04-2017 ]\r\n^http:\\/\\/([^\\x2F]+\\/)+download[0-9]{4}\\/$ [Group 06] - [ download2467 ] ^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]\r\n{1,4}[0-9]{3}[a-zA-Z]{1,5}[0-9]{3}-[a-zA-Z]{1,3}\\/$ [Group 07] - [ LUqc663BAyN333-HoO ]\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{4,5}-[A-Z]{1,4}-[0-9]{5}-DE\\/$ [Group 08] - [ NUDA-X-52454-DE ]\r\n^http:\\/\\/([^\\x2F]+\\/)+(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z]{1,3}\r\n[0-9]{4,6}\\/$ [Group 09] - [ CUST.-Document-YDI-04-GQ389557 ] ^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]{1,5}-[a-z0-9]\r\n{2,4}(-[a-z0-9]{4,5})?-[a-z]{1,4}\\.(view|doc)\\/$ [Group 10] - [ zp3x-r88-wuh.view ]\r\n^http:\\/\\/([^\\x2F]+\\/)+dhl___status___[0-9]{10}\\/$ [Group 11] - [ dhl___status___2668292851 ]\r\n^http:\\/\\/([^\\x2F]+\\/)+(ORDER|Rech|CUST|Cust|gescanntes|Scan)(.)?(-Document|-Dokument)?-[0-9]{10,11}\\/$\r\n[Group 12] - [ ORDER.-5883789520 ] ^http:\\/\\/([^\\x2F]+\\/)+(?\u003c!appmanager\\/)[0-9]{10,11}\\/$ [Group 13] - [\r\n6572646300 ] ^http:\\/\\/([^\\x2F]+\\/)+SCANNED\\/[A-Z]{2,3}[0-9]{4}[A-Z]{6,9}\\/$ [Group 14] [\r\nSCANNED/RZ7498WEXEZB ] ^http:\\/\\/([^\\x2F]+\\/)+[A-Za-z]{1,4}[0-9]{4,5}[a-zA-Z]{1,5}\\/$ [Group 15] [\r\nK44975X ]\r\nRule Vetting\r\nThe last step is to check the PCRE's against a corpus of random URL's and see if they appear strict enough in their\r\nmatching to be used in a production environment. This is critical if you plan to use them for blocking instead of\r\njust identification. I can't stress enough how important this phase is; while it's nice to be alerted on access to one\r\nof these URL's, it's solid gold if you can prevent attacks and C2 from happening in the first place. Of course, with\r\nany blocking action, the caveat is that one wrong block could spell disaster so these need to be as close to perfect\r\nas possible.\r\nIdeally, you want to test against a large amount of URL's from your own environment that most closely resemble\r\nwhat traffic your users generate. Unfortunately that's not always possible, or you don't have users, so you need to\r\neither build your own corpus or find someone who can test the PCRE's for you.\r\nThere isn't much online in the way of random URL lists or logs but I've put together a few possible methods one\r\ncould try to compile a fairly random set of URL's, and then I'll detail my preferred method.\r\nSetup a TOR exit node for TCP/80 and just scrape URL's as they traverse.\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 20 of 23\n\nHit up a site like Lenny Zeltser's blocklist page to get a list of other frequently updated blocklists. These\r\nare mainly malicious though so not quite random.\r\nSearch Pastebin for \"http\" and pull pull out URL's. They have a Pastebin Scraping API ($24/yr) to pull\r\ndown the most recent posts. Sometimes you'll find huge lists with little effort.\r\nUse some open data sets. Definitely not too random and much more limited in scope.\r\nUse Twitter Streaming API and filter Tweets for \"http\".\r\nThe Twitter option works nicely and can generate hundreds of thousands of unique URL's per day. Given enough\r\ntime, you'll have a solid base to test your PCRE's against.\r\nTo do this, you need to register an app with Twitter and get your API keys. Once you have those, I've included a\r\nPython script, twitter_scraper that you can input them into and run in a continous loop with a one-liner like the\r\nbelow.\r\nwhile true; do sleep 5; python twitter_scraper.py \u003e\u003e twitter_urls; done\r\nI've also included 2 million URL's on GitHub, which is just under the 25MB file limit compressed. These are ones\r\nthat I've scraped in the past few days and should help you get started.\r\nTypically I'll check this every so often and filter out things like URL shortening services or other sites that, for one\r\nreason or another, have bubbled up to the top of my domain list. This keeps it filled with fairly unique sites and\r\nhelps improve entropy.\r\nBelow is a GIF of the sites streaming by in real time, showing some of the variety.\r\nOnce we have our list, we can run pcre_check against the URL's and see how our PCRE's fare.\r\n$ python pcre_check.py -u twitter_urls -p emotet_pcres -s [+] FOUND [+] Count: 1290/2000000 Comment:\r\n[Group 13] - [ 6572646300 ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+(?\u003c!appmanager\\/)[0-9]{10,11}\\/$ [-] MATCH [-]\r\nhttp://db.netkeiba.com/horse/1985105175/ ... http://www.northernminer.com/news/lukas-lundin-copper-commodity-choice/1003786598/ http://www.oita-trinita.co.jp/news/20170532318/ ...\r\nhttp://www.schuh.co.uk/womens/irregular-choice-x-disney-how-do-i-look?-pink-flat-shoes/1364153360/ ...\r\nhttp://www.yutaro-miura.com/info/event/2017/0528100324/ http://yapi.ta2o.net/maseli/2017052901/ [+] FOUND\r\n[+] Count: 2595/2000000 Comment: [Group 15] [ K44975X ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Za-z]{1,4}[0-9]\r\n{4,5}[a-zA-Z]{1,5}\\/$ [-] MATCH [-] http://epcaf.com/c2805tw/ ... http://hobbyostrov.ru/automodels/electro-http://ropgadget.com/posts/defensive_pcres.html\r\nPage 21 of 23\n\nmonster-1-10/tra3602g/ ... http://monipla.jp/mfpa/card2017ss/ http://ncode.syosetu.com/N0588Q/ ...\r\nhttp://www.nollieskateboarding.com/fs5050grind/ http://www.profootballweekly.com/2017/05/30/victor-cruz-prepared-to-produce-and-mentor-in-chicago-bears-transitioning-wr-corps/a4613p/\r\nUsing the \"-s\" (show matches) flag in pcre_check will allow you to manually review the false positives. If the sites\r\ndon't look legitimate or match a little too perfectly, you'll want to do a little manual research to make sure they are\r\nin fact FP's and not true positives you didn't know about. I've truncated the results but above shows a few under\r\neach to give you an idea of the kind of output I'm looking for to conclude it's not up-to-par.\r\nAs you can see, Group 13 and 15 have numerous false-positives. This isn't surprising given Group 13 is simply 10\r\ndigits and Group 15 is a small range of alpha, digits, alpha, which continued to repeat itself throughout my\r\nanalysis.\r\nAdditionally, I sent these PCRE's to some fellow miscreant punchers who ran them through over billions of URL's\r\nfrom their environment and received similar output with FP's only for Group 13 and 15.\r\nThe last check I'll perform for this set is to remove the trailing forward slash (\"/\") that was included in the PCRE's.\r\nThe reason for this is that, while my Emotet seed list all included the forward slash, the URL's I'm scraping may\r\nnot have it and I just want to try to further identify any potential issues.\r\n$ python pcre_check.py -p emotet_pcres_mod -u twitter_urls\r\nNadda. Fantastic!\r\nWrapping-up\r\nAll in all, 13 total PCRE's make the cut and cover the seen Emotet download URL's. These will provide good\r\nhistorical forensic capability and good passive blocking for future victims of these campaigns.\r\nWith that, the below is the final list for publishing and available on GitHub, along with all of the above iterations.\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z]{1,3}[0-9]{1}[a-zA-Z]{1,3}-[a-zA-Z]{1,2}[0-9]{2,3}-[a-zA-Z]{1,5}\\/$ karttoon\r\n31MAY2017 - Emotet download - [ t5wx-x064-mzdb ] ^http:\\/\\/([^\\x2F]+\\/)+\r\n(INVOICE|ORDER|CUST|Invoice|Cust)-[0-9]{6,7}-[0-9]{4,5}\\/$ karttoon 31MAY2017 - Emotet download - [\r\nINVOICE-864339-98261 ] ^http:\\/\\/([^\\x2F]+\\/)+dhl\\/paket\\/com\\/pkp\\/appmanager\\/[0-9]{10}\\/$ karttoon\r\n31MAY2017 - Emotet download - [ dhl/paket/com/pkp/appmanager/8376315127 ] ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]\r\n{2,3}-[0-9]{8}\\.dokument\\/$ karttoon 31MAY2017 - Emotet download - [ RRT-13279129.dokument ]\r\n^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{2,5}(-[0-9]{2})?-[0-9]{5,10}-(document|doc)-May-[0-9]{2}-2017\\/$ karttoon\r\n31MAY2017 - Emotet download - [ EDHFR-08-77623-document-May-04-2017 ]\r\n^http:\\/\\/([^\\x2F]+\\/)+download[0-9]{4}\\/$ karttoon 31MAY2017 - Emotet download - [ download2467 ]\r\n^http:\\/\\/([^\\x2F]+\\/)+[a-zA-Z0-9]{1,4}[0-9]{3}[a-zA-Z]{1,5}[0-9]{3}-[a-zA-Z]{1,3}\\/$ karttoon 31MAY2017 -\r\nEmotet download - [ LUqc663BAyN333-HoO ] ^http:\\/\\/([^\\x2F]+\\/)+[A-Z]{4,5}-[A-Z]{1,4}-[0-9]{5}-DE\\/$\r\nkarttoon 31MAY2017 - Emotet download - [ NUDA-X-52454-DE ] ^http:\\/\\/([^\\x2F]+\\/)+\r\n(CUST|ORDER|Cust|Rech|Rechnung)(.)?(-Document)?-[A-Z]{1,4}-[0-9]{2,3}-[A-Z]{1,3}[0-9]{4,6}\\/$ karttoon\r\n31MAY2017 - Emotet download - [ CUST.-Document-YDI-04-GQ389557 ] ^http:\\/\\/([^\\x2F]+\\/)+[a-z0-9]{1,5}-\r\n[a-z0-9]{2,4}(-[a-z0-9]{4,5})?-[a-z]{1,4}\\.(view|doc)\\/$ karttoon 31MAY2017 - Emotet download - [ zp3x-r88-\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 22 of 23\n\nwuh.view ] ^http:\\/\\/([^\\x2F]+\\/)+dhl___status___[0-9]{10}\\/$ karttoon 31MAY2017 - Emotet download - [\r\ndhl___status___2668292851 ] ^http:\\/\\/([^\\x2F]+\\/)+(ORDER|Rech|CUST|Cust|gescanntes|Scan)(.)?(-Document|-\r\nDokument)?-[0-9]{10,11}\\/$ karttoon 31MAY2017 - Emotet download - [ ORDER.-5883789520 ]\r\n^http:\\/\\/([^\\x2F]+\\/)+SCANNED\\/[A-Z]{2,3}[0-9]{4}[A-Z]{6,9}\\/$ karttoon 31MAY2017 - Emotet download [\r\nSCANNED/RZ7498WEXEZB ]\r\nHopefully this was helpful to some and demonstrated the ease in which these can be created to identify malicious\r\npatterns.\r\nThe more the merrier in the sharing community!\r\nCiao!\r\nOlder posts...\r\nSource: http://ropgadget.com/posts/defensive_pcres.html\r\nhttp://ropgadget.com/posts/defensive_pcres.html\r\nPage 23 of 23\n\nhttp://www.schuh.co.uk/womens/irregular-choice-x-disney-how-do-i-look?-pink-flat-shoes/1364153360/ http://www.yutaro-miura.com/info/event/2017/0528100324/   http://yapi.ta2o.net/maseli/2017052901/ ... [+] FOUND \n[+] Count: 2595/2000000 Comment: [Group 15] [ K44975X ] PCRE: ^http:\\/\\/([^\\x2F]+\\/)+[A-Za-z]{1,4}[0-9]\n{4,5}[a-zA-Z]{1,5}\\/$ [-] MATCH [-] http://epcaf.com/c2805tw/ ... http://hobbyostrov.ru/automodels/electro\u0002\n   Page 21 of 23",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"references": [
		"http://ropgadget.com/posts/defensive_pcres.html"
	],
	"report_names": [
		"defensive_pcres.html"
	],
	"threat_actors": [
		{
			"id": "4b076dcb-516e-42fb-9c8f-f153902cd5e9",
			"created_at": "2022-10-25T16:07:23.708745Z",
			"updated_at": "2026-04-10T02:00:04.720108Z",
			"deleted_at": null,
			"main_name": "Hidden Lynx",
			"aliases": [
				"Aurora Panda",
				"Group 8",
				"Heart Typhoon",
				"Hidden Lynx",
				"Operation SMN"
			],
			"source_name": "ETDA:Hidden Lynx",
			"tools": [
				"AGENT.ABQMR",
				"AGENT.AQUP.DROPPER",
				"AGENT.BMZA",
				"AGENT.GUNZ",
				"BlackCoffee",
				"HiKit",
				"MCRAT.A",
				"Mdmbot.E",
				"Moudoor",
				"Naid",
				"PNGRAT",
				"Trojan.Naid",
				"ZoxPNG",
				"gresim"
			],
			"source_id": "ETDA",
			"reports": null
		},
		{
			"id": "dabb6779-f72e-40ca-90b7-1810ef08654d",
			"created_at": "2022-10-25T15:50:23.463113Z",
			"updated_at": "2026-04-10T02:00:05.369301Z",
			"deleted_at": null,
			"main_name": "APT1",
			"aliases": [
				"APT1",
				"Comment Crew",
				"Comment Group",
				"Comment Panda"
			],
			"source_name": "MITRE:APT1",
			"tools": [
				"Seasalt",
				"ipconfig",
				"Cachedump",
				"PsExec",
				"GLOOXMAIL",
				"Lslsass",
				"PoisonIvy",
				"WEBC2",
				"Mimikatz",
				"gsecdump",
				"Pass-The-Hash Toolkit",
				"Tasklist",
				"xCmd",
				"pwdump"
			],
			"source_id": "MITRE",
			"reports": null
		},
		{
			"id": "cf7fc640-acfe-41c4-9f3d-5515d53a3ffb",
			"created_at": "2023-01-06T13:46:38.228042Z",
			"updated_at": "2026-04-10T02:00:02.883048Z",
			"deleted_at": null,
			"main_name": "APT1",
			"aliases": [
				"PLA Unit 61398",
				"Comment Crew",
				"Byzantine Candor",
				"Comment Group",
				"GIF89a",
				"Group 3",
				"TG-8223",
				"Brown Fox",
				"ShadyRAT",
				"G0006",
				"COMMENT PANDA"
			],
			"source_name": "MISPGALAXY:APT1",
			"tools": [],
			"source_id": "MISPGALAXY",
			"reports": null
		},
		{
			"id": "3fad11c6-4336-4b28-a606-f510eca5452e",
			"created_at": "2022-10-25T16:07:24.346573Z",
			"updated_at": "2026-04-10T02:00:04.948823Z",
			"deleted_at": null,
			"main_name": "Turbine Panda",
			"aliases": [
				"APT 26",
				"Black Vine",
				"Bronze Express",
				"Group 13",
				"JerseyMikes",
				"KungFu Kittens",
				"PinkPanther",
				"Shell Crew",
				"Taffeta Typhoon",
				"Turbine Panda",
				"WebMasters"
			],
			"source_name": "ETDA:Turbine Panda",
			"tools": [
				"Agent.dhwf",
				"Agentemis",
				"BleDoor",
				"Cobalt Strike",
				"CobaltStrike",
				"Derusbi",
				"Destroy RAT",
				"DestroyRAT",
				"FF-RAT",
				"FormerFirstRAT",
				"Hurix",
				"Kaba",
				"Korplug",
				"LOLBAS",
				"LOLBins",
				"Living off the Land",
				"Mivast",
				"PlugX",
				"RbDoor",
				"RedDelta",
				"RibDoor",
				"Sakula",
				"Sakula RAT",
				"Sakurel",
				"Sogu",
				"StreamEx",
				"TIGERPLUG",
				"TVT",
				"Thoper",
				"Winnti",
				"Xamtrav",
				"cobeacon",
				"ffrat"
			],
			"source_id": "ETDA",
			"reports": null
		},
		{
			"id": "a7aefdda-98f1-4790-a32d-14cc99de2d60",
			"created_at": "2023-01-06T13:46:38.281844Z",
			"updated_at": "2026-04-10T02:00:02.909711Z",
			"deleted_at": null,
			"main_name": "APT17",
			"aliases": [
				"BRONZE KEYSTONE",
				"G0025",
				"Group 72",
				"G0001",
				"HELIUM",
				"Heart Typhoon",
				"Group 8",
				"AURORA PANDA",
				"Hidden Lynx",
				"Tailgater Team"
			],
			"source_name": "MISPGALAXY:APT17",
			"tools": [],
			"source_id": "MISPGALAXY",
			"reports": null
		},
		{
			"id": "46a151bd-e4c2-46f9-aee9-ee6942b01098",
			"created_at": "2023-01-06T13:46:38.288168Z",
			"updated_at": "2026-04-10T02:00:02.911919Z",
			"deleted_at": null,
			"main_name": "APT19",
			"aliases": [
				"DEEP PANDA",
				"Codoso",
				"KungFu Kittens",
				"Group 13",
				"G0009",
				"G0073",
				"Checkered Typhoon",
				"Black Vine",
				"TEMP.Avengers",
				"PinkPanther",
				"Shell Crew",
				"BRONZE FIRESTONE",
				"Sunshop Group"
			],
			"source_name": "MISPGALAXY:APT19",
			"tools": [],
			"source_id": "MISPGALAXY",
			"reports": null
		},
		{
			"id": "3aaf0755-5c9b-4612-9f0e-e266ef1bdb4b",
			"created_at": "2022-10-25T16:07:23.480196Z",
			"updated_at": "2026-04-10T02:00:04.626125Z",
			"deleted_at": null,
			"main_name": "Comment Crew",
			"aliases": [
				"APT 1",
				"BrownFox",
				"Byzantine Candor",
				"Byzantine Hades",
				"Comment Crew",
				"Comment Panda",
				"G0006",
				"GIF89a",
				"Group 3",
				"Operation Oceansalt",
				"Operation Seasalt",
				"Operation Siesta",
				"Shanghai Group",
				"TG-8223"
			],
			"source_name": "ETDA:Comment Crew",
			"tools": [
				"Auriga",
				"Cachedump",
				"Chymine",
				"CookieBag",
				"Darkmoon",
				"GDOCUPLOAD",
				"GLOOXMAIL",
				"GREENCAT",
				"Gen:Trojan.Heur.PT",
				"GetMail",
				"Hackfase",
				"Hacksfase",
				"Helauto",
				"Kurton",
				"LETSGO",
				"LIGHTBOLT",
				"LIGHTDART",
				"LOLBAS",
				"LOLBins",
				"LONGRUN",
				"Living off the Land",
				"Lslsass",
				"MAPIget",
				"ManItsMe",
				"Mimikatz",
				"MiniASP",
				"Oceansalt",
				"Pass-The-Hash Toolkit",
				"Poison Ivy",
				"ProcDump",
				"Riodrv",
				"SPIVY",
				"Seasalt",
				"ShadyRAT",
				"StarsyPound",
				"TROJAN.COOKIES",
				"TROJAN.FOXY",
				"TabMsgSQL",
				"Tarsip",
				"Trojan.GTALK",
				"WebC2",
				"WebC2-AdSpace",
				"WebC2-Ausov",
				"WebC2-Bolid",
				"WebC2-Cson",
				"WebC2-DIV",
				"WebC2-GreenCat",
				"WebC2-Head",
				"WebC2-Kt3",
				"WebC2-Qbp",
				"WebC2-Rave",
				"WebC2-Table",
				"WebC2-UGX",
				"WebC2-Yahoo",
				"Wordpress Bruteforcer",
				"bangat",
				"gsecdump",
				"pivy",
				"poisonivy",
				"pwdump",
				"zxdosml"
			],
			"source_id": "ETDA",
			"reports": null
		}
	],
	"ts_created_at": 1775439073,
	"ts_updated_at": 1775792289,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/5dde650c5fda26a379c6f91bde8696c7f7ff1cab.pdf",
		"text": "https://archive.orkl.eu/5dde650c5fda26a379c6f91bde8696c7f7ff1cab.txt",
		"img": "https://archive.orkl.eu/5dde650c5fda26a379c6f91bde8696c7f7ff1cab.jpg"
	}
}