{
	"id": "b8231bb1-8811-4e70-b2dd-22d12f9c54a7",
	"created_at": "2026-04-06T00:19:00.612987Z",
	"updated_at": "2026-04-10T03:21:41.695449Z",
	"deleted_at": null,
	"sha1_hash": "2c2a6ed028522f859d0518bf50b6ab36397dd35f",
	"title": "Automating Qakbot Malware Analysis with Binary Ninja",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 832261,
	"plain_text": "Automating Qakbot Malware Analysis with Binary Ninja\r\nArchived: 2026-04-05 14:51:51 UTC\r\nOverview\r\nWe recently finished a stream series where we wrote a static unpacker and deobfuscation scripts for 64-bit Qakbot\r\nsamples using Binary Ninja. Binary Ninja is a powerhouse reverse engineering suite that provides a plethora of\r\nfunctionality that is useful when reverse engineering malware. It has a robust Python API for interacting with\r\nabstractions (semantic representations) generated by their multiple levels of Binary Ninja Intermediate Languages\r\n(BNILs). These abstractions result in large simplifications of disassembled instructions into intrinsic functions and\r\nhigh level languages that can be accessed directly and easily, which we leveraged multiple times throughout these\r\nstreams.\r\nSeamless Headless Experience\r\nBinary Ninja commercial versions provide the ability to run in headless mode, which we worked with throughout\r\nthese streams. We have yet to see a reverse engineering suite that can provide such a seamless headless experience\r\nas Binary Ninja:\r\nbv = BinaryViewType['PE'].open(\u003cPath\u003e)\r\nbv.update_analysis_and_wait()\r\nThese two lines will load and process the entire provided binary producing a Binary Ninja database that can be\r\nused to access all levels of BNILs from the BinaryView.\r\nSidekick\r\nWhat’s a 2024 blog without talking about AI? Vector35 has released a plugin named Binary Ninja Sidekick which\r\nintegrates multiple large language models directly within the UI that can be interacted with to assist with database\r\nmarkups, ask questions and generate automation scripts for you. The models have contextual awareness of the\r\ndatabase which can be leveraged to produce automation based on the content itself, rather than abstract asks like\r\nyou would do with other LLM chat interfaces or development environments. We’ve seen multiple open source\r\nprojects that have attempted to achieve the same seamless interaction with LLMs, but this is the best that we’ve\r\nobserved thus far. We experimented with Sidekick in the third part of our stream series to recover structures,\r\nautomatically translate High Level Intermediate Language (HLIL) into Python code, and identify encoding\r\nalgorithms.\r\nObfuscation Techniques\r\nThe Qakbot samples we analyzed contain the following obfuscation techniques that we addressed throughout\r\nthese streams:\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 1 of 18\n\nStack strings\r\nMultiple levels of packers resulting in a final Qakbot DLL\r\nDynamic function resolution using function hash tables\r\nEncrypted string tables\r\nThe methods in which all of these obfuscation techniques are addressed using Binary Ninja with the help of other\r\nprojects will be described throughout this post.\r\nStack Strings\r\nThe stack strings encountered were no match for Binary Ninja’s HLIL. The following disassembly contains a\r\nstack string that is used as an XOR key to decrypt a first-stage shellcode segment:\r\n691bc02b c644247040 mov byte [rsp+0x70 {copied_alphabet}], 0x40\r\n691bc030 c644247141 mov byte [rsp+0x71 {var_147}], 0x41\r\n691bc035 c64424726c mov byte [rsp+0x72 {var_146}], 0x6c\r\n691bc03a c64424737a mov byte [rsp+0x73 {var_145}], 0x7a\r\nThese instructions (truncated for readability) are simplified into the HLIL instruction:\r\n__builtin_strncpy(dest: \u0026copied_alphabet, src: \"@AlzsQ1DSS...\", n: 0x3a)\r\nMaking it easy for us to extract the XOR key passed as a parameter to __builtin_strncpy .\r\nStatic Unpacker\r\nThe packer that we encountered contains multiple stages, these are depicted in the following diagram:\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 2 of 18\n\nOur goal was to unpack every step within this diagram statically using Binary Ninja scripting where possible, and\r\nextract a final Qakbot binary for analysis. The final script can be found here.\r\nThe first stage of this packer extracts position independent shellcode ciphtertext from an embedded PE resource\r\nusing Windows resource APIs, decrypts this shellcode using the XOR key mentioned in the previous section, and\r\nexecutes the shellcode in memory. In order for us to automatically extract this resource and decrypt it, we needed\r\nto process the binary using Binary Ninja and leverage the Binary Ninja Python API to extract needed attributes.\r\nGeneric Function Identification\r\nFirst, we needed a method of generically identifying the location of these attributes within the database HLIL. In\r\nthis case, we only needed to identify a single function, which contains both a reference to the resource identifier\r\nthat maps to the resource, and the XOR key to decrypt the ciphertext. The Binary Ninja Python API contains a\r\nfunction called binaryninja.highlevelil.HighLevelILFunction.visit which allows recursive enumeration of HLIL\r\ninstructions. This is the best approach (over iteration) as HLIL instructions are often nested within (tree)\r\nstructures. We can use visit to find the target __builtin_strncpy intrinsic function:\r\ndef get_key(bv):\r\n print(\"Looking for decryption key...\")\r\n key = None\r\n for inst in bv.hlil_instructions:\r\n inst.visit(visitor)\r\n return key\r\ndef visitor(_a, inst, _c, _d) -\u003e bool:\r\n if isinstance(inst, commonil.Localcall):\r\n if len(inst.params) \u003e 1:\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 3 of 18\n\nif inst.tokens[0].text == '__builtin_strncpy':\r\n print(\"Found target strncpy: {}\".format(inst))\r\n key = bytes(inst.params[1].tokens[1].text + \"\\x00\", 'ascii')\r\nWe then call isinstance to identify if the instruction in question is of type commonil.Localcall (the IL object\r\nfor calls) and that the instruction contains the token text of __builtin_strncpy . Once verified, the XOR key is\r\nextracted using key = bytes(inst.params[1].tokens[1].text + \"\\x00\", 'ascii') . Another strength of the\r\nBinary Ninja API is the ability to “work backwards” from an instruction object, which we use to acquire the\r\nresource identifier in this function:\r\n rsrc_inst_index = 4\r\n rsrcid = None\r\n rsrc_inst = list(inst.function.instructions)[rsrc_inst_index]\r\n rsrcid = rsrc_inst.operands[1].value.value\r\nHere we knew each resource identifier would be at the 4th instruction (from looking at multiple samples) within\r\neach HLIL representation of the target function (a weak heuristic but works nonetheless). Therefore, we acquire\r\nall HLIL instructions for the function using inst.function.instructions , and extract the resource identifier\r\nvalue from the instruction’s operand at this index using rsrc_inst.operands[1].value.value .\r\nUsing the extracted resource identifier, we can extract the resource data using PEFile to enumerate the PE\r\nresource directories:\r\ndef extract_resource(fpath, min_resource_size, rsrcid):\r\n rsrc_data = None\r\n pe = pefile.PE(fpath)\r\n pe_mapped = pe.get_memory_mapped_image()\r\n for rsrc in pe.DIRECTORY_ENTRY_RESOURCE.entries:\r\n for entry in rsrc.directory.entries:\r\n if entry.directory.entries[0].data.struct.Size \u003e= min_resource_size and entry.struct.Name == rsrcid\r\n rsrc_offset = entry.directory.entries[0].data.struct.OffsetToData\r\n rsrc_size = entry.directory.entries[0].data.struct.Size\r\n rsrc_data = pe_mapped[rsrc_offset:rsrc_offset + rsrc_size]\r\n return rsrc_data\r\nThe XOR key is then used to decrypt the resource data:\r\ndef xor(key, ct):\r\n r = bytes()\r\n for i, b in enumerate(ct):\r\n r += (b ^ key[i % len(key)]).to_bytes(1, 'little')\r\n return r\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 4 of 18\n\nFortunately, all later stage portable executables are plaintext within the infection chain, therefore, we can simply\r\ncarve all subsequent plaintext PEs within the decrypted shellcode blob. The only difficulty is acquiring the exact\r\nsize of the PEs, which requires parsing their optional headers for sections and adding all physical section sizes\r\ntogether to acquire their total size as they’d reside on disk. Fortunately, there’s the amazing Binary Refinery\r\nproject by our friend Jesko that contains a “unit” that will do exactly this: carve_pe. We ended up using code from\r\nthe project and the get_pe_size function to carve out multiple PEs from the plaintext shellcode segment:\r\n# Based on https://binref.github.io/units/pattern/carve_pe.html\r\n# Thanks Rattle \u003c3\r\ndef carve_pe(data):\r\n cursor = 0\r\n mv = memoryview(data)\r\n carved = []\r\n while True:\r\n offset = data.find(B'MZ', cursor)\r\n if offset \u003c cursor: break\r\n cursor = offset + 2\r\n ntoffset = mv[offset + 0x3C:offset + 0x3E]\r\n if len(ntoffset) \u003c 2:\r\n return None\r\n ntoffset, = unpack('H', ntoffset)\r\n if mv[offset + ntoffset:offset + ntoffset + 2] != B'PE':\r\n print(F'invalid NT header signature for candidate at 0x{offset:08X}')\r\n continue\r\n try:\r\n pe = PE(data=data[offset:], fast_load=True)\r\n print(\"Found a valid PE: {}\".format(bytes(data[offset:offset+256]).hex()))\r\n except PEFormatError as err:\r\n print(F'parsing of PE header at 0x{offset:08X} failed:', err)\r\n continue\r\n pesize = get_pe_size(pe, memdump=False)\r\n pedata = mv[offset:offset + pesize]\r\n carved.append(bytes(pedata))\r\n return carved\r\nThis results in the following output:\r\n$ python3 extract_qakbot.py qakbot-64-bit/dll/780be7a70ce3567ef268f6c768fc5a3d2510310c603bf481ebffd65e4fe95ff3\r\nLooking for decryption key...\r\nFound target strncpy: __builtin_strncpy(\u0026var_148, \"@AlzsQ1DSS...\", 0x3a)\r\nIdentified key: b'@AlzsQ1DSS...' // Identified Resource ID: 948\r\nFound a valid PE: 4d5a9\r\ninvalid NT header signature for candidate at 0x00001AC6\r\nFound a valid PE: 4d5a9\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 5 of 18\n\nThe second PE is the Qakbot DLL that is written to disk by the script for further analysis.\r\nDynamic Function Resolution\r\nThe Qakbot DLL performs dynamic function resolution using a series of hash tables. In the sample that we\r\nanalyzed, the dynamic function resolution is performed in two steps:\r\n1. Decrypt given DLL ciphertext using a hard-coded XOR key ( 0xa235cb91 )\r\n2. The resulting DLL’s export table is then walked to resolve a provided hash table, which contains a series of\r\nCRC32 hashes modified with a single XOR operation that map to Windows API functions\r\nThe resulting API functions are then stored within a function table whose first address is stored within a global\r\nvariable. These steps are performed for multiple function tables and the resulting global variables are used to\r\naccess functions through relative offsets within these tables throughout the binary.\r\nIn order for us to recover the function tables used throughout the binary, we wrote automation to perform the\r\nfollowing:\r\n1. Decrypt each DLL name\r\n2. Parse each DLL export table to extract all possible function names\r\n3. Hash each export and compare them to each hash table entry\r\n4. Write discovered function names into a format that can be consumed by Binary Ninja\r\nGiven that each hash table was resolved by the same function, we were able to extract all calls to this function in\r\norder to extract each hash table and their respective DLL names:\r\ndef resolve_hash_tables(bv, module_ct_bytes, xor_key_bytes, hash_xor, import_res_addr):\r\n resolved_hashes = {}\r\n import_resolution_func = bv.get_function_at(import_res_addr)\r\n for call_site in import_resolution_func.caller_sites:\r\n tokens = call_site.hlil.tokens\r\n hash_table_addr = tokens[3].value\r\n offset = tokens[7].value\r\n #This shift is done in the code\r\n hash_table_size = (tokens[5].value \u003e\u003e 3) * 4\r\nCross-references to our hash table resolution function are acquired using import_resolution_func.caller_sites\r\nand all parameters for the specific function call are acquired using the tokens for the call using\r\ncall_site.hlil.tokens .\r\nRecovering DLL Names\r\nEach DLL ciphertext is stored within a buffer that we’ve named module_ct_bytes . Each ciphertext segment is\r\nstored at a relative offset that is passed within the offset parameter. The length of the encrypted DLL name is\r\nstored within a BYTE value at this offset, which is followed by the ciphertext in the format of:\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 6 of 18\n\nstruct enc_module_info {\r\n BYTE module_name_length;\r\n BYTE module_name_ct[];\r\n};\r\nHere we use these values to extract the ciphertext at each offset within module_ct_bytes and decrypt them using\r\na 4-byte XOR key:\r\ndef decrypt_mod_name(module_ct_bytes, offset, xor_key):\r\n offset_mod = offset * 0x21\r\n module_name_length = module_ct_bytes[offset_mod]\r\n module_name_ct = module_ct_bytes[offset_mod+1:offset_mod+1+module_name_length]\r\n rbytes = bytes()\r\n xoff = 0\r\n for i, b in enumerate(module_name_ct):\r\n rbytes += (b ^ xor_key[i \u0026 3]).to_bytes(1, byteorder='little')\r\n return rbytes.decode('ascii')\r\nHashing Export Functions with Binary Ninja\r\nThe Export functions of each DLL dependency can be acquired by opening each DLL with Binary Ninja\r\nheadlessly and extracting the export symbols for each DLL:\r\ndef gen_dll_hash_table(dllname, hash_xor):\r\n #print(\"Export functions for DLL: %s\" % dllname)\r\n dbv = BinaryViewType['PE'].open(\"dependencies/{}\".format(dllname))\r\n table = {}\r\n for symbol in dbv.get_symbols_of_type(SymbolType.FunctionSymbol):\r\n if symbol.binding is SymbolBinding.GlobalBinding or symbol.binding is SymbolBinding.WeakBinding:\r\n tmp_symbol = re.sub(\"Stub\", \"\", symbol.full_name)\r\n rhash = zlib.crc32(bytes(tmp_symbol, 'ascii')) ^ hash_xor\r\n table[rhash] = tmp_symbol\r\n return table\r\nIn order to acquire the needed DLLs, we simply decrypted all required DLL names first and then copied them into\r\nthe dependencies directory on our development machine. Each export name is hashed using CRC32 and XORed\r\nwith the hard-coded XOR key. Each result is stored within a hash table for comparison against the hash tables\r\nextracted from the Qakbot DLL.\r\nStructure Generation and Relative Pointer Markups in Binary Ninja\r\nCalls to relative offsets within each dynamically resolved function table can be seen throughout the Binary Ninja\r\ndatabase:\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 7 of 18\n\nIn order for us to mark these relative offsets with resolved function names, we need to create structures that label\r\nthese relative offsets as these resolved functions:\r\n hash_table = bv.read(hash_table_addr, hash_table_size)\r\n hash_table_hashes = struct.unpack(\"I\"*(hash_table_size//4), hash_table)\r\n dll_hash_table = gen_dll_hash_table(dllname, hash_xor)\r\n for chash in hash_table_hashes:\r\n if chash in dll_hash_table:\r\n found_func_name = dll_hash_table[chash]\r\n else:\r\n found_func_name = \"unk_%x\" % chash\r\n if found_func_name:\r\n resolved_hashes[found_func_name] = chash\r\n print(generate_header(resolved_hashes, call_site.address))\r\nFirst, we read the hash table in its entirety using bv.read using the function parameters acquired earlier, then\r\nstruct.unpack these into INT32 values to compare to our dynamically generated hash tables. Each discovered\r\nhash is set into a resolved_hashes dictionary, which is then passed to a struct generation function:\r\ndef generate_header(resolved_hashes, hash_table_addr):\r\n rstr = bytes()\r\n rstr += b\"struct hashes_%x {\" % hash_table_addr\r\n for k, v in resolved_hashes.items():\r\n rstr += bytes(\"int64_t {}; \".format(k, v), 'ascii')\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 8 of 18\n\nrstr += b\"};\"\r\n return rstr.decode('ascii')\r\nThis creates a struct definition with each int64_t member being the name of each resolved API function name.\r\nThis results in the following output of C structures:\r\nstruct hashes_1800018d2 {int64_t LoadLibraryA; int64_t LoadLibraryW; int64_t FreeLibrary; int64_t GetProcAddres\r\nstruct hashes_1800018ee {int64_t NtAllocateVirtualMemory; int64_t NtFreeVirtualMemory; int64_t RtlAllocateHeap;\r\nstruct hashes_18000190c {int64_t MessageBoxA; int64_t EnumWindows; int64_t RegisterClassExA; int64_t CreateWindo\r\nstruct hashes_180001928 {int64_t CreateCompatibleDC; int64_t GetDeviceCaps; int64_t CreateCompatibleBitmap; int6\r\nstruct hashes_180001944 {int64_t unk_23737de0; int64_t unk_43a8d158; int64_t NetWkstaGetInfo; int64_t unk_642a3b\r\nstruct hashes_18000195f {int64_t SetFileSecurityW; int64_t AdjustTokenPrivileges; int64_t SetEntriesInAclA; int6\r\nstruct hashes_18000197b {int64_t StrStrIA; int64_t StrStrIW; int64_t StrCmpIW; int64_t PathCombineA; int64_t Pat\r\nstruct hashes_180001997 {int64_t ShellExecuteW; int64_t SHGetFolderPathW; };\r\nstruct hashes_1800019b3 {int64_t GetUserProfileDirectoryW; };\r\nstruct hashes_180002343 {int64_t WTSQueryUserToken; int64_t WTSQuerySessionInformationW; int64_t WTSEnumerateSes\r\nstruct hashes_180002a25 {int64_t InternetOpenA; int64_t InternetOpenUrlA; int64_t InternetCloseHandle; int64_t H\r\nstruct hashes_180002a41 {int64_t ObtainUserAgentString; };\r\nstruct hashes_18000a3c1 {int64_t CryptDecodeObjectEx; int64_t CryptImportPublicKeyInfo; int64_t CryptUnprotectDa\r\nstruct hashes_180001080 {int64_t LoadLibraryA; int64_t LoadLibraryW; int64_t FreeLibrary; int64_t GetProcAddress\r\nstruct hashes_18000109b {int64_t SetFileSecurityW; int64_t AdjustTokenPrivileges; int64_t SetEntriesInAclA; int6\r\nThis script can be viewed in its entirety here.\r\nWe then imported these structures into Binary Ninja by right-clicking in the Types menu and clicking on Create\r\nTypes from C Source... :\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 9 of 18\n\nWe then applied each structure to their function table pointer within the database. Here is an example function\r\ntable with a structure applied:\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 10 of 18\n\nDecrypting String Tables\r\nWith the dynamic function tables resolved, we were able to analyze the string table decryption functionality.\r\nQakbot decrypts string tables using AES-256 in CBC mode where the key is derived by hashing a hard-coded\r\nvalue with SHA256. The derived key is used to decrypt an XOR key that is used to decrypt the string table in its\r\nentirety.\r\nWe use the same visit technique to generically identify all string table decryption function call locations to\r\ngather necessary attributes:\r\ndef visitor(_a, inst, _c, _d):\r\n if isinstance(inst, commonil.Localcall):\r\n if len(inst.params) == 7:\r\n if len(list(inst.function.instructions)) == 1:\r\n xor_key, ct = get_xor_key_and_ct(inst)\r\n decrypted_str_table = xor_byte_data(ct, xor_key)\r\n markup_str_as_comments(decrypted_str_table, inst)\r\n return False\r\ndef markup_string_tables(bv):\r\n key = None\r\n for inst in bv.hlil_instructions:\r\n inst.visit(visitor)\r\n return key\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 11 of 18\n\nHere we call get_xor_key_and_ct which extracts all data related to the string table from our identified function\r\ncall:\r\ndef get_xor_key_and_ct(inst):\r\n #mw_decrypt_data(\u0026data_180027240, 0x5ab, \u0026data_1800271a0, 0x90,\\\r\n # \u0026data_180027150, 0x47, arg1)\r\n tokens = inst.tokens\r\n ct_addr = tokens[3].value\r\n ct_len = tokens[5].value\r\n ct = bv.read(ct_addr, ct_len)\r\n \r\n iv_xor_ct_data_addr = tokens[8].value\r\n iv_xor_ct_data_len = tokens[10].value\r\n iv_xor_ct_data = bv.read(iv_xor_ct_data_addr, iv_xor_ct_data_len)\r\n \r\n aes_key_data_addr = tokens[13].value\r\n aes_key_data_len = tokens[15].value\r\n aes_key_data = bv.read(aes_key_data_addr, aes_key_data_len)\r\n xor_key = decrypt_aes(aes_key_data, iv_xor_ct_data)\r\n return xor_key, ct\r\nThis includes iv_xor_ct_data which contains the following structure:\r\nstruct iv_xor_ct_data {\r\n BYTE aes_iv[16];\r\n BYTE encrypted_xor_key[];\r\n};\r\nThis structure contains an initialization vector followed by an encrypted XOR key that is used to decrypt the string\r\ntable. This structure along with all other required data is then passed to decrypt_aes that hashes aes_key_data\r\nwith SHA256 and uses this as an AES-256 key to decrypt the XOR key:\r\ndef decrypt_aes(aes_key_data, iv_xor_ct_data):\r\n h = SHA256.new()\r\n h.update(aes_key_data)\r\n aes256key = h.digest()\r\n cipher = AES.new(aes256key, AES.MODE_CBC, iv_xor_ct_data[:16])\r\n xor_key_ct = iv_xor_ct_data[16:]\r\n xor_key = unpad(cipher.decrypt(xor_key_ct), AES.block_size)\r\n return xor_key\r\nThe decrypted XOR key and the string table ciphertext are then returned from get_xor_key_and_ct as shown\r\nabove. The XOR key is then used to decrypt the ciphertext within our visitor function:\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 12 of 18\n\ndef visitor(_a, inst, _c, _d):\r\n if isinstance(inst, commonil.Localcall):\r\n if len(inst.params) == 7:\r\n if len(list(inst.function.instructions)) == 1:\r\n xor_key, ct = get_xor_key_and_ct(inst)\r\n decrypted_str_table = xor_byte_data(ct, xor_key)\r\n markup_str_as_comments(decrypted_str_table, inst)\r\n return False\r\nWhenever a string is required throughout the execution of Qakbot, a string table decryption function is called and\r\nan offset to access a string within the table is provided. We can therefore enumerate all calls to each string table\r\ndecryption function and identify these offsets and markup each call site with each decrypted string. This is done in\r\nthe markup_str_as_comments function.\r\ndef markup_str_as_comments(decrypted_str_table, inst):\r\n rstr = {}\r\n for callsite in inst.function.source_function.caller_sites:\r\n #print(F\"Found call site: {callsite.address:08X}\")\r\n str_offset = get_str_offsets(callsite)\r\n dec_str = decrypted_str_table[str_offset:].split(b\"\\x00\")[0]\r\nFirst, we enumerate all call sites from our generically identified instruction using\r\ninst.function.source_function.caller_sites . We then acquire each string offset within the decrypted string\r\ntable using get_str_offsets :\r\ndef get_str_offsets(call_site):\r\n #Get first call within nested operands, does not account for nested\r\n #calls, but we haven't seen those.\r\n rcall = recurse_get_call(call_site.hlil)\r\n #Get first constant within operands of call\r\n rconst = recurse_get_const(rcall)\r\n if rconst:\r\n return rconst.value.value\r\n else:\r\n return None\r\nHere we had the unique problem of identifying each call associated with the callsite and extracting the offset (a\r\nconstant) from the call. These calls differed in shapes and sizes in the HLIL, so we decided to recursively acquire\r\nthe first call made at a call site and recursively acquire the constant when a call was found:\r\ndef recurse_get_call(instr):\r\n # Base case: If the instruction is a call, return it\r\n if instr.operation == HighLevelILOperation.HLIL_CALL:\r\n return instr\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 13 of 18\n\n# If the instruction has operands, recursively search within them\r\n if hasattr(instr, 'operands'):\r\n for operand in instr.operands:\r\n if isinstance(operand, HighLevelILInstruction):\r\n result = recurse_get_call(operand)\r\n if result is not None:\r\n return result\r\n # If no call is found in this instruction or its operands\r\n return None\r\ndef recurse_get_const(instr):\r\n # Base case: If the instruction is a constant, return it\r\n if instr.operation == HighLevelILOperation.HLIL_CONST:\r\n return instr\r\n # If the instruction has operands, recursively search within them\r\n if hasattr(instr, 'operands'):\r\n for operand in instr.operands:\r\n if isinstance(operand, HighLevelILInstruction):\r\n result = recurse_get_const(operand)\r\n if result is not None:\r\n return result\r\n elif isinstance(operand, list):\r\n for op in operand:\r\n result = recurse_get_const(op)\r\n if result is not None:\r\n return result\r\n return None\r\nWe were actually able to get Sidekick to help us with this by asking us how we could do this. Very cool!\r\nOnce the target offset is acquired, we are able to get the decrypted string from its table using: dec_str =\r\ndecrypted_str_table[str_offset:].split(b\"\\x00\")[0] . Each decrypted string is then marked at each callsite by\r\nadding it as a comment:\r\n rstr[\"0x%x\" % callsite.address] = dec_str.decode('ascii')\r\n bv.set_comment_at(callsite.address, dec_str)\r\nThis allows us to see what decrypted string is being used where throughout the Binary Ninja database.\r\nRecovering C2 and Campaign Info\r\nQakbot uses a specific string within these decrypted string tables as a SHA256 input with the result being used as\r\na AES-256 key to decrypt its C2 server and campaign information (e.g.,\r\newW300ns6\u00266HyygkKzfVVCJHq210vQLq7*uCNorQns ). While enumerating string decryption callsites, we have\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 14 of 18\n\nadditional checks for callsites that match the functions which process these strings and decrypt the data that we’re\r\ninterested in:\r\n #While we're enumerating all callsites, we should check them for the\r\n #strings that are being used to decrypt the campaign info and C2\r\n campaign_info = enum_callsite_for_campaign_func(callsite, dec_str)\r\n if campaign_info:\r\n print(campaign_info)\r\n c2_info = enum_callsite_for_c2_func(callsite, dec_str)\r\n if c2_info:\r\n print(c2_info)\r\nThese functions use the same techniques described above to enumerate the callsite functions using heuristics and\r\nextracting the required parameters to recover required attributes. The first being campaign info:\r\ndef enum_callsite_for_campaign_func(inst, dec_str):\r\n insts = list(inst.function.hlil.instructions)\r\n campaign_info = None\r\n if len(insts) \u003e 5:\r\n #Fingerprint with surrounding instruction types and number of basic blocks.\r\n if(type(insts[4]) == HighLevelILVarInit\r\n and type(insts[4].operands[1]) == HighLevelILCall\r\n and type(insts[5]) == HighLevelILVarInit\r\n and type(insts[6]) == HighLevelILIf and\r\n len(inst.function.basic_blocks) == 5):\r\n print(F\"Found call to campaign info decryption function: {inst.address:08X}\")\r\n #uint32_t ct_len = zx.d(ct_len)\r\n iv_ct_len_addr = insts[1].tokens[7].value\r\n #WORD from aquired address for ciphertext length\r\n iv_ct_len = (struct.unpack(\"H\", bv.read(iv_ct_len_addr, 2))[0])\r\n iv_ct_addr = insts[4].tokens[8].value\r\n #Read ciphertext and IV from this address\r\n iv_ct = bv.read(iv_ct_addr, iv_ct_len)\r\n aes_key_data = dec_str\r\n campaign_info = parse_campaign_info(decrypt_config_aes(aes_key_data, iv_ct))\r\n #Need to get this as the decrypted string from the call site\r\n return campaign_info\r\nHere the callsite function for processing campaign ciphertext information contains the AES IV/ciphertext address\r\nand the length of this data (in the same format as the encrypted string table data). We use this information to\r\nrecover the AES IV and ciphertext with bv.read . We then decrypt the campaign info by hashing the decrypted\r\nstring with SHA256 as the AES-256 key in the decrypt_config_aes function. The decryped campaign info is\r\nthen parsed with the parse_campaign_info function:\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 15 of 18\n\ndef parse_campaign_info(info_blob):\r\n #Skip over SHA256 of config\r\n info = info_blob[32:]\r\n s_info = info.split(b\"\\r\\n\")\r\n campaign_info = {}\r\n for cinfo in info.split(b\"\\r\\n\"):\r\n if b\"10=\" in cinfo:\r\n campaign_info['campaign_id'] = cinfo.split(b\"=\")[1].decode('ascii')\r\n elif b\"3=\" in cinfo:\r\n campaign_info['timestamp'] = cinfo.split(b\"=\")[1].decode('ascii')\r\n return campaign_info\r\nThis is followed by a similar technique to recover and decrypt the C2 information:\r\ndef enum_callsite_for_c2_func(inst, dec_str):\r\n insts = list(inst.function.hlil.instructions)\r\n c2_info = None\r\n #Fingerprint with surrounding instructions, we can probably improve\r\n #on this.\r\n if(len(insts) \u003e 20 and type(insts[17]) == HighLevelILVarInit and\r\n len(inst.function.basic_blocks) == 46):\r\n print(F\"Found call to C2 decryption function: {inst.address:08X}\")\r\n #uint32_t ct_len = zx.d(ct_len)\r\n iv_ct_len_addr = insts[14].tokens[7].value\r\n #WORD from aquired address for ciphertext length\r\n iv_ct_len = (struct.unpack(\"H\", bv.read(iv_ct_len_addr, 2))[0])\r\n iv_ct_addr = insts[17].tokens[8].value\r\n #Read ciphertext and IV from this address\r\n iv_ct = bv.read(iv_ct_addr, iv_ct_len)\r\n aes_key_data = dec_str\r\n return parse_c2_info(decrypt_config_aes(aes_key_data, iv_ct))\r\nThen parse the decrypted C2 information:\r\ndef parse_c2_info(c2_info_blob):\r\n c2_info = []\r\n #Skip over C2 config SHA256 and boolean\r\n info = c2_info_blob[32:]\r\n #Each C2 entry is 8 bytes\r\n num_entries = len(info) // 8\r\n for i in range(0, num_entries):\r\n ip_port = info[i*8:i*8+8]\r\n ip = socket.inet_ntoa(ip_port[1:5])\r\n port = struct.unpack(\"\u003eH\", ip_port[5:5+2])[0]\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 16 of 18\n\nc2_info.append({\"IP\": ip, \"Port\": port})\r\n return c2_info\r\nThis results in the recovery of the Qakbot campaign and C2 information from the binary:\r\npython3 decrypt_aes.py qakbot-64-bit/dll/780be7a70ce3567ef268f6c768fc5a3d2510310c603bf481ebffd65e4fe95ff3/sc-tm\r\nFound call to campaign info decryption function: 180003313\r\n{'campaign_id': 'tchk06', 'timestamp': '1702463600'}\r\nFound call to C2 decryption function: 180006175\r\n[{'IP': '45[.]138[.]74[.]191', 'Port': 443}, {'IP': '65[.]108[.]218[.]24', 'Port': 443}]\r\nThe full script for performing string table markups and extracting this information can be found here. From a\r\nmalware analysis/triage perspective, with recovered strings, APIs, C2 and campaign information, the next step\r\nwould be to reverse engineer the remaining functionality of Qakbot using Binary Ninja. We have, however,\r\nautomated the recovery of obfuscated information using techniques that work across multiple samples in order to\r\nassist during the reverse engineering process. The extracted indicators of compromise can also be used to identify\r\nfurther infections within an environment and block outbound communcations to the botnet C2 infrastructure.\r\nCampaign information can be used to track distribution campaigns.\r\nConclusion\r\nBinary Ninja provides a robust API for interacting with its database and syntatic representations of code. These\r\nAPIs and BNILs can assist in extracting information that may be difficult to acquire directly from dissembled\r\ncode, and provides helper functions for navigating its intermediate representations quickly. Tools like Sidekick\r\nprovide a means of leveraging AI models to assist during the reverse engineering process, and we are excited\r\nabout the prospect of leveraging them further within our workflows.\r\nAcknowledgements \u0026 References\r\nHuge shoutout to Jordan from Vector35 @psifertex for hopping on our streams multiple times during his\r\nweekends to help us with our scripting and Binary Ninja use.\r\nHuge shoutout to Sergei from OALabs @herrcore for helping us throughout this stream series with various\r\naspects.\r\nTracking 15 Years of Qakbot Development - used as a reference when decrypting string tables. Thanks\r\nZScaler team (shoutz ThreatLabz and BSG)!\r\nhttps://malcat.fr/blog/writing-a-qakbot-50-config-extractor-with-malcat/ - used as a reference when\r\nfiguring out the campaign info and C2 data formats. Thanks Malcat team!\r\nSamples\r\n780be7a70ce3567ef268f6c768fc5a3d2510310c603bf481ebffd65e4fe95ff3\r\n12094a47a9659b1c2f7c5b36e21d2b0145c9e7b2e79845a437508efa96e5f305\r\nAll the best,\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 17 of 18\n\nThe Invoke RE Team\r\nSource: https://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nhttps://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/\r\nPage 18 of 18",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"references": [
		"https://invokere.com/posts/2024/02/automating-qakbot-malware-analysis-with-binary-ninja/"
	],
	"report_names": [
		"automating-qakbot-malware-analysis-with-binary-ninja"
	],
	"threat_actors": [],
	"ts_created_at": 1775434740,
	"ts_updated_at": 1775791301,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/2c2a6ed028522f859d0518bf50b6ab36397dd35f.pdf",
		"text": "https://archive.orkl.eu/2c2a6ed028522f859d0518bf50b6ab36397dd35f.txt",
		"img": "https://archive.orkl.eu/2c2a6ed028522f859d0518bf50b6ab36397dd35f.jpg"
	}
}