# Pointer: Hunting Cobalt Strike globally **[medium.com/@shabarkin/pointer-hunting-cobalt-strike-globally-a334ac50619a](https://medium.com/@shabarkin/pointer-hunting-cobalt-strike-globally-a334ac50619a)** Pavel Shabarkin November 21, 2021 Pav el [Pavel Shabarkin](https://shabarkin.medium.com/?source=post_page-----a334ac50619a--------------------------------) [Follow](https://medium.com/m/signin?actionUrl=https%3A%2F%2Fmedium.com%2F_%2Fsubscribe%2Fuser%2F9284dc7b6da6&operation=register&redirect=https%3A%2F%2Ftowardsaws.com%2Fpointer-hunting-cobalt-strike-globally-a334ac50619a&user=Pavel+Shabarkin&userId=9284dc7b6da6&source=post_page-9284dc7b6da6----a334ac50619a---------------------follow_byline-----------) Sep 16, 2021 14 min read ## Introduction Cobalt Strike is a commercial, full-featured, remote access tool that bills itself as “adversary simulation software designed to execute targeted attacks and emulate the post-exploitation actions of advanced threat actors”. Cobalt Strike’s interactive post-exploit capabilities cover the full range of ATT&CK tactics, all executed within a single, integrated system. In addition to its own capabilities, Cobalt Strike leverages the capabilities of other well-known tools such as Metasploit and Mimikatz. Cobalt Strike is a legitimate security tool used by penetration testers and red teamers to emulate threat actor activity in a network. However, lately, this tool has been hijacked and abused by cybercriminals. Our goal was to develop a tool to help identify default Cobalt Strike servers exposed on the Internet. We strongly believe that understanding and mapping adversaries and their use of Cobalt Strike can improve defenses and boost organization detection & response controls. Blocking, mapping and tracking adversaries is a good start. ----- Pointer logo ## Tool Development A review of existing Cobalt Strike detection tools and public research showed that current tools can only scan a small number of potential Cobalt Strike instances (1–5k hosts). Our goal was to increase the scanning capabilities and validate several million potential Cobalt instances in less than an hour. To achieve the above goal within a reasonable timeframe and on a small budget, it was necessary to adapt and scale the current understanding of the Cobalt Strike hunting methodology. The following content assumes an understanding of what Cobalt Strike is and ----- how to locate and identify Cobalt strike instances. Before going into the details of the tool and their components, let’s take a look at the general architecture. ## Architecture review Scanning a large number of hosts in a reasonable amount of time does not scale and has physical, cost and power limitations. Unless you have a great home lab and the bandwidth to support it, personal computing cannot really solve the scaling problem, so the decision was made to use AWS to affordably scale and achieve the desired goals. ## General architecture review The tool is developed and heavily based on AWS SQS, Lambda and DynamoDB. ----- The Pointer client parses the local json file with a list of IPs, optimally splits them into packets (10–20 IPs), and then adds the packets to be processed to the SQS queue. The SQS queue is setup to invoke a lambda function for each packet in the queue. The lambda function (Pointer server) performs the actual scanning of the provided packet of IPs and saves results to DynamoDB. In cases where Lambda fails or throws an error, packets are returned to the SQS queue and will wait for a retry. If the packet fails a second time, a new Lambda function is launched that logs the failed packet to DynamoDB for further analysis and rescan each IP individually to locate the failed IPs. ----- ## Code Review The scan functionality of the “Pointer server” consists of 4 parts: 1. Port Scanning (Port Workers) 2. HTTP Webservice scan (HTTP Workers) Certificate parsing JARM parsing 3. HTTPS Webservice scan (HTTPS Workers) 4. Beacon Parsing (Beacon Workers) The tool was designed with an asynchronous approach to IP processing. Each scan probe stands as an independent unit, which is then processed by a Worker. The probes include a port scanning, Certificate Issuer parsing, JARM parsing, webservice scanning, and Beacon parsing. Once each probe is completed, the result is sent to the corresponding controller, which writes the result to the global map. After all scan workers are done, the data is sorted and ordered before being combined into the `Target structure. Overall, this reduces the number of delays` since each service(ip:port) has its own scan pipeline. ----- Internal architecture of Lambda function **Detailed review** Initially the lambda function launches Port, HTTP, HTTPS, and Beacon workers. The number of workers depends on the level of internal concurrency (Internal concurrency is the controllable CLI parameter). Each type of worker is portioned accordingly to the required power resources. Portioning has been calculated based on the number of probes each worker performs in average. Each targeted IP address is scanned for 27 predefined ports, this list includes common ports on which Cobalt Strike beacons are hosted. The “launcher” sends service (ip:port) to the Port Workers through `portChannel Golang channel.` ----- Code snippet of the Service Launcher Port workers then scan the individual ports. If a port is open, the worker sends the service to the HTTP Worker and Output controller through `httpChannel and` `outputChannel Golang` channels. If the port is closed the Port Worker exits the function. ----- Code snippet of Port Worker All workers send results through a single Golang channel, `outputChannel, which are then` processed by the output controller and saved to the global map ( Sorter struct). Each result produced by the workers has its own type tag (Ex: ``` "Service|", "Certificate|", "Jarm|", … ), ensuring that the ValidateOutput function ``` can sort the results based on their types. ----- Code snippet of Output Controller The HTTP worker waits for IP and port tuple (service) to be provided by the Port Worker via the ``` httpChannel . If the HTTP Worker receives port 50050 it attempts the following actions: ``` Parse the certificate issuer -> identifying the default self-signed Cobalt certificate Parse the JARM signature -> detecting malicious JARM signatures For other services, it performs a web request to analyse response behaviour. Beacon’s HTTP/HTTPS indicators are controlled by a malleable C2 profile, if the server uses the default malleable C2 profile, it responds with a 404 status code and 0 content-length for requests made to the root web endpoint. (http://domain.com/) ----- If the request to the targeted web service fails, HTTP Worker sends the service through ``` httpsChannel channel further to the HTTPS Worker to perform the web request through ``` HTTPS protocol. Code snippet of the HTTP Worker Being inspired by the “Analyzing Cobalt Strike for Fun and Profit” research and its corresponding tool for cobalt strike beacon parsing (developed using Python), we integrated the similar logic into our tool for beacon parsing (developed using Golang). _The guy, who researched how the beacon is packed, how to parse the beacon, how to decrypt_ _the beacon, and how to work with that in general, you did the good job a big thank you!_ ----- All identified web services that have been configured with default malleable C2 profile are sent to the Beacon Workers. The Beacon Worker attempts to parse the beacon config. If the parsing succeeds, Beacon Worker sends the `CobaltStrikeBeaconStruct struct to the Beacon` controller through `beaconStructChannel channel, and the beacon location URI to the output` controller through `outputChannel channel.` Code snippet of Beacon Worker ----- Code snippet of Beacon Controller When all workers finish the scans, the `Sort method maps all gathered scan results sent to` the output controller into the array of `CobaltStrikeStruct type:` ----- Code snippet of `CobaltStrikeStruct data type` The `Probability field is assigned when the` `Voter function calls the internal method` ``` Vote for each CobaltStrikeStruct object within the array. ``` In case the certificate issuer matches the default Cobalt Strike self-signed certificate, the `Vote` method gives 100% probability that it is the Cobalt Strike server. The same applies if the Beacon Worker successfully parses the beacon config hosted on the web service. Default web service response and malicious JARM signature results cannot give us confidence in assigning the probability rate. Because other web services can respond with 0 content length and 404 status code, and servers can be configured with the same TLS options (if you don’t ----- understand what JARM is). If the `Vote method matches only those two indicators, it assigns` the 70% probability to the object. If none of these meet our requirements, it is probably not a Cobalt Strike server. But, again, this tool targets only Cobalt Strike servers with default malleable C2 profile configurations. Code snippet of the Vote method ## DynamoDB Component ----- We chose DynamoDB service to store scan results. DynamoDB can handle more than 10 trillion requests per day and support peaks of more than 20 million requests per second. That is what we needed 100%! We wanted to scan 20000–25000 targets per 60 seconds, which is about 40k-50k writing requests to the database. On the first implementation, the Output and Beacon workers exceeded DynamoDB rate limits because they performed a write request to the DynamoDB table for each target object separately, and, in addition, we used the default DynamoDB configuration. The default capacity configuration could not handle that many requests, but by increasing the capacity we would pay more money for autoscaling during constant scanning. Further examination of the AWS documentation revealed that AWS had implemented the batch write to DynamoDB. For each lambda invocation, we have 10–20 targets (depending on the packet size) to scan, so this should reduce the number of requests to DynamoDB tables by a factor of 10–20. We found that DynamoDB’s `BatchWriteItemInput function allows writing up to 25 items and` up to 16 Mb in one request. The batch write implementation significantly decreased the number of requests and removed the rate limiting issue at the default configuration level. We did not have to pay for unnecessary autoscaling. This method has the disadvantage that if one of the items in the batch fails to be written, the whole batch will not be saved. (The partition key must be unique and not exist in the table, but this is suitable in our case, as we filter our targets by unique values before launching the scans). ----- Code snippet of the WriteBatchTarget function Also, for unpredictable cases where the default capacity configuration cannot handle a large number of requests, we configure autoscaling: AWS Console → DynamoDB → choose the Table → Edit Capacity → Read / Write Capacity increase to 10–15. To enable autoscaling we should give the required permissions for the DynamoDB service role . ## Lambda Component ----- AWS Lambda is an interesting service. We wanted to try Lambda as a core service for our scans, however we did not want to get a crazy paycheck at the end of the month, so we had several things to figure out: 1. How much memory to allocate for Lambda execution 2. What default timeout to set for Lambda execution 3. How to manage Lambda concurrency 4. What internal concurrency would be suited for our model; 5. What request timeouts would be suited for our model 6. What packet size would be suited for our model And the most difficult question — How to setup everything the way it would be efficient, cheap, and with minimum loss rate? **Lambda memory allocation** It was interesting to research how AWS allocates memory and CPU for Lambda functions, because it is physically impossible to divide 1/10 of the CPU. But it can allocate 1/10 of the time of the CPU to a single function, and you can have 10 of them working at the same time to share [the same CPU core (check this research, it explains how AWS Lambda allocates CPU).](https://engineering.opsgenie.com/how-does-proportional-cpu-allocation-work-with-aws-lambda-41cd44da3cac) The only controllable parameter in AWS for Lambda functions is memory usage: ----- Example of the memory configuration in AWS Console We designed our model with a multithreaded architecture — the more cores we have, the better performance we can potentially obtain. But the nasty thing here is what the price of this luxury is.))) We cannot directly control the number of cores we want to use. The CPU performance scales with the memory configuration. Lambda functions used to always have 2 vCPU cores, regardless of the allocated memory. The rest of the cores are throttled at certain memory configurations. By increasing the memory allocation, we obtain more cores. I found the [research that discovered how the number of vCPUs and multithreaded computation power vary](https://www.sentiatechblog.com/aws-re-invent-2020-day-3-optimizing-lambda-cost-with-multi-threading) depending on the memory configuration. ----- The price for using the AWS Lambda function is based on the function runtime (in milliseconds) multiplied by the allocated memory (fixed prices per Mb). So, allocating 3008MB for the Lambda function, we get 2 vCPUs, and allocating 3009MB we get 3vCPUs. By allocating 3009MB memory, we could gain more performance, at almost the same price.))) According to the research, we get better performance gain for multithreading with each spike transition (jump in cores). But for our model we do not need more than 3 cores, 3009MB is enough for our purposes. Correlation between Lambda memory configuration and number of cores By the way, we decided to measure the computational power ourselves, and the practical tests showed that the power spike between 3008–3009 MB is bigger than between 5307–5308 MB. This once again confirms that the 3009MB memory configuration is the best choice for us. ----- Code snippet for measuring the multithreaded computation power **Internal parameter tuning** As we got a solid understanding of how many resources to use, we started tuning and suiting other parameters for our model. In my opinion, the lifetime of the Pointer lambda function should not be more than 60 seconds, because otherwise it will not be a true server-less tool with easy management, stable to errors and autoscaled architecture. With a memory configuration of 3009 Mb and a default timeout of 60 seconds for one Lambda execution, we could scan from 10–20 targets in a single packet. ----- In case the lambda execution fails, we do not want to rescan all the targets inside the packet again. By having less number of targets in the packet, we minimize the probability that the packet will be crashed. Therefore, the optimal size, in my opinion, is 10–20 targets. By having less number of targets inside the packet, we are minimising the chance that the packet will be crashed. In case the lambda execution fails, we must rescan all targets inside the packet (even those that have been successfully scanned). When we defined lambda configuration parameters, the rest of the parameters were tuned with a big number of tests: ``` Targets/per packet 20 (Items)Concurrency 140 (Items)Lambda Memory 3009 (Mb)Lambda Timeout 60 (sec)Http timeout 4 (sec)Port timeout 2 (sec)Beacon timeout 10 (sec) ``` **Lambda Concurrency** AWS Lambda provides autoscaling for function instances. But we simply cannot deploy as many instances as we want. We are limited by the AWS region quota (All the lambda functions of an account can use the pool of 1000 unreserved concurrent executions). Thus, having only 1 deployed function, we could get 1000 concurrent executions at the same time. Dorking potential Cobalt Strike servers through Shodan, we could retrieve around 200–300k potential targets. However, we are designing the tool to scan 2M-10M targets. For example, 2M targets is about 100k packets (20 targets per packet), which means 100k lambda function invocations. If Lambda function is invoked 100k times, the lambda puller would process only 1k requests at a time, and the rest would be just throttled. So even if we increase the AWS region quota, it will not be enough. So the question arises — how we can manage the invocation process? The answer is simple — SQS. ## SQS Component **Configuration** Amazon Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. This means we can send all our packets to the queue and SQS will manage the process of lambda invocation. The SQS management can be configured according to our needs: 1. We can control the number of retries. We can configure the maximum number of retries the SQS would perform, if the batch of messages(packets) fails ----- The SQS sends messages packed in the batches, we can control the number of messages inside the batch we want to pass to the Lambda function, so having 1 message in the batch will equal 1 message. In our model we decided that If message fails more than once, it would be sent to the (DLQ). We designed the (DLQ) to redirect the failed messages to the Lambda function with the same logic as the core one, but before scanning activities it writes the the failed packet to the DynamoDB table and rescans each target separately. 2. Visibility timeout The visibility timeout sets the length of time that a message received from the queue (by one lambda function) will not be visible to the lambda function again. If the lambda function fails to process and delete the message before the visibility timeout expires, the message becomes visible to the lambda function again. 3. SQS batch size We configured SQS batch size to a single message. **SQS & Lambda autoscaling** For standard queues, Lambda uses long polling to poll the queue until it becomes active. When messages are available, Lambda reads up to 5 batches and sends them to our lambda function. If messages are still available, Lambda increases the number of processes that read batches up to 60 more instances per minute. The maximum number of batches that can be processed simultaneously by event source mapping is 1000. This means that the full power we can get after 16 minutes of continuous scanning. ## Results At the first launch, when we ran a scan for 160k targets, we were able to identify 1,700 Cobalt Strike servers and parse 1,400 of their beacon configurations within 40 minutes. The Pointer tool can produce best performance results if the target size exceeds 500k. Scanning 160k targets took a little longer because 1000 concurrent lambda executions were achieved only after 30 minutes when the tool was launched. For the current implementation, the cost of scanning 250k targets is about 20$, however we are looking for a solution that will make it cheaper. ## Targets table [sample] We have developed 2 tables, first one for identified Cobalt Strike servers, and the second for parsed beacon configurations. Identified Cobalt Strike servers can be described by 7 features: IP address is a unique sorting key probability that it’s the actual cobalt strike server (easier filtering) JARM signature ----- Certificate Issuer Opened Ports Response behaviour Links to the beacon configurations that we parsed and saved to another table There is an example of the cobalt strike server table: Table of parsed Cobalt Strike targets ## Beacons table [sample] ----- The Beacon configuration table has the `uri feature as a unique sorting key, when the rest of` the features are the actual parsed beacon configurations. Here is an example of the table with parsed beacon configurations: The full version of tables you can find here: ## Data analysis We are using collected data to map attackers infrastructure and understand how the attackers operate Cobalt Strike. ----- We know that threat intelligence groups are tracking specific ransomware groups with the help of Watermarks, For example: Sodinokibi (Watermark 452436291) APT 27 (Watermark 305419896). Based on the beacon’s `spawnto locations the blue teams can develop detection controls.` Location of servers (IP) (Hosting provides) ----- Watermarks ----- Countries ----- Spawn location ----- Sample of the Dork database (not completed) ## Summary & Future work For the first Pointer version, we have developed a system with a complete hunting methodology that could be easily scalable up to 2–3 millions targets. The first tests showed that the tool can scan 200k targets in 45–50 minutes with 10% packet loss. We strongly believe that we have taken a big step in hunting and detection system. But, of course, we have not achieved the results we wanted, this is just the first demo version of the Pointer. ----- Any feedback is more than welcome and if you have any ideas, suggestions, and recommendations, or if you want to help us improve the Cobalt Strike Hunting tool, just contact Pavel Shabarkin and Michael Koczwara. ## References -----