{
	"id": "419757b6-3bd3-478a-88a4-04680ad7ad3a",
	"created_at": "2026-04-06T00:21:39.811206Z",
	"updated_at": "2026-04-10T13:12:54.911783Z",
	"deleted_at": null,
	"sha1_hash": "ef658e944ebf3162f14fd38c71edae3ae89f3d2c",
	"title": "Reverse Engineering Crypto Functions: RC4 and Salsa20",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 872230,
	"plain_text": "Reverse Engineering Crypto Functions: RC4 and Salsa20\r\nBy Jacob Pimental\r\nPublished: 2021-08-25 · Archived: 2026-04-05 13:46:31 UTC\r\n25 August 2021\r\nBy Jacob Pimental\r\nMany malware samples use encryption for Command and Control (C2) communications, encrypting files, string\r\nobfuscation, and many other tasks. It can be challenging to know which encryption algorithm you are looking at\r\nwhen analyzing a sample. This post aims to teach newer analysts about common encryption algorithms, how they\r\nwork, and how you can identify them when reverse engineering.\r\n1. RC4 Algorithm\r\n1. How it Works\r\n1. Key Scheduling Algorithm (KSA)\r\n2. Pseudo-Random Generation Algorithm (PRGA)\r\n3. Putting it All Together\r\n2. Identifying RC4 in Assembly\r\n2. Salsa20 Algorithm\r\n1. How it Works\r\n1. State Generation\r\n2. Generating Keystream\r\nhttps://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nPage 1 of 12\n\n3. Putting it All Together\r\n2. Identifying Salsa20 in Assembly\r\n3. Conclusion\r\nRC4 Algorithm\r\nHow it Works\r\nRC4’s internal state is an array of 256 bytes, denoted as S[] , ranging from 0-255. RC4 will use its Key\r\nScheduling Algorithm (KSA) to randomly swap the bytes in S[] using the user inputted key as the seed. S[] is\r\nthen used to generate a keystream via the Pseudo-Random Generation Algorithm (PRGA). This keystream,\r\ndenoted as KS[] , is the same size as the plaintext input. Finally, RC4 will XOR the keystream by the plaintext to\r\ncreate the encrypted ciphertext.\r\nKey Scheduling Algorithm (KSA)\r\nThe Key Scheduling Algorithm for RC4 will take the internal state referenced earlier, denoted as S[] , and\r\npermutate it based on a key the user inputs. For each index in S[] , the algorithm will swap the value with\r\nanother index of S[] based on the value: (j + S[index] + key[index % keylength]) % 256 , where j has a\r\nstarting value of zero. This can be shown in the following Python code:\r\ndef KSA(key):\r\n \"\"\"Rearranges the values in an array of 256 bytes based on key.\r\n Params:\r\n key (str): Key used to permutate the bytes\r\n Returns:\r\n list: A permutation of 256 bytes used to generate keystream\r\n \"\"\"\r\n S = [i for i in range(256)] # Initialize array of 256 bytes\r\n j = 0\r\n for i in range(256):\r\n k = ord(key[i % len(key)])\r\n j = (j + S[i] + k) % 256 # Calculate index to swap\r\n S[i], S[j] = S[j], S[i] # Swap values in the array based\r\n return S\r\nPseudo-Random Generation Algorithm (PRGA)\r\nThe output from the Key Scheduling Algorithm is used to generate a keystream using RC4’s PRGA. This\r\nkeystream will be the same size as the plaintext input and is generated by taking the value at S[i + 1] , and\r\nswapping that with the value of (j + S[i + 1]) % 256 . For this example, i is all numbers from zero to the\r\nhttps://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nPage 2 of 12\n\nlength of the plaintext and j is zero. After this swap, the value S[ (S[i] + S[j]) % 256 ] is appended to the\r\nkeystream, thus creating a pseudo-random list of bytes. This can be shown in the following Python code:\r\ndef PRGA(S, amount):\r\n \"\"\"Pseudo-Random algorithm that creates the final keystream used to encrypt.\r\n Params:\r\n S (list): The 256 byte array generated by KSA\r\n amount (int): Length the keystream needs to be (size of plaintext)\r\n Returns:\r\n list: The final keystream used for encryption\r\n \"\"\"\r\n j = 0\r\n K = []\r\n for i in range(amount):\r\n i = (i + 1) % 256\r\n j = (j + S[i]) % 256\r\n S[i], S[j] = S[j], S[i]\r\n K.append(S[(S[i] + S[j]) % 256])\r\n return K\r\nPutting it All Together\r\nOnce the keystream is generated, the RC4 algorithm will use it to encrypt the plaintext input by XORing the bytes\r\ntogether. Decryption works by deriving the same keystream using the original key and XORing that by the\r\nciphertext. This entire process is shown in the following Python code:\r\ndef XOR(pt, k):\r\n \"\"\"XORs two arrays together.\r\n Params:\r\n pt (list): The plaintext array\r\n k (list): The key to XOR by\r\n Returns:\r\n list: The ciphertext\r\n \"\"\"\r\n ct = []\r\n for i in range(len(pt)):\r\n ct.append(ord(pt[i]) ^ k[i])\r\n return ct\r\ndef RC4(plaintext, key):\r\n \"\"\"Main RC4 function.\r\nhttps://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nPage 3 of 12\n\nParams:\r\n plaintext (str): The plaintext to encrypt\r\n key (str): The key used for encryption\r\n Returns:\r\n list: List of encrypted bytes\r\n \"\"\"\r\n S = KSA(key)\r\n print(S)\r\n K = PRGA(S, len(plaintext))\r\n print(K)\r\n ct = XOR(plaintext, K)\r\n return ct\r\nIdentifying RC4 in Assembly\r\nAn easy way of identifying that an application is using the RC4 algorithm is by looking for the value 256 when\r\nthe algorithm is creating the initial state ( S[] ). This normally occurs in two loops that run 256 times each and\r\nwill be either creating or modifying an array.\r\nLoop that creates initial S array of bytes from 0 to 255\r\nhttps://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nPage 4 of 12\n\nIt is important to notice that in the second loop in RC4’s key scheduling algorithm the bytes in S[] will be\r\nswapped around. You can see this in the following screenshot:\r\nSecond loop in S array creation that swaps bytes\r\nYou can also identify RC4 by its pseudo-random generation algorithm. Two important things to notice here are the\r\nuse of the previously created S[] variable and the XOR operand being used. Keep in mind that this section will\r\nbe looped by the length of the plaintext, not 256 times like the KSA.\r\nhttps://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nPage 5 of 12\n\nMain loop used for RC4 PRGA\r\nBy identifying both functionalities in the code, it is safe to say that this is the RC4 algorithm. This particular\r\nexample was from my analysis of the Sodinokbi Ransomware in a previous post.\r\nSalsa20 Algorithm\r\nhttps://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nPage 6 of 12\n\nHow it Works\r\nSalsa20 works by encrypting data in 64 bytes “blocks”. The algorithm is counter-based, meaning that a counter\r\nvariable is used when generating the key depending on which “block” of data is being encrypted. The internal\r\nstate of Salsa20 consists of an array of 16 32-bit words that can be shown as a 4x4 matrix:\r\nSalsa20’s initial state\r\nThis state then undergoes a “quarter-round” function which randomizes the values in the matrix. Once the state is\r\nrun through this function multiple times, normally 20, the final result is then added back to the initial state’s\r\nvalues. This becomes the keystream that will be XOR’d against 64 bytes of the plaintext data. Finally, the counter\r\nvariable will be incremented and the process starts again with the next 64 bytes.\r\nState Generation\r\nhttps://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nPage 7 of 12\n\nThe initial state for Salsa20 consists of 16 32-bit words consisting of the following:\r\nState\r\nvariable\r\nDescription\r\nKey 16 or 32 byte key defined by the user\r\nNonce Eight byte nonce value that can be randomly generated or given\r\nCounter The counter variable that denotes which “block” is being encrypted\r\nConstant\r\nConstant value of either “expand 32-byte k” or “expand 16-byte k” depending on the length\r\nof the key\r\nIf the length of the key is 32 bytes, then it is split between the two sets of four 32-bit words in the state with the\r\nfirst 16 bytes in the first set and the last 16 bytes in the second. Otherwise, if the length of the key is 16 bytes it is\r\nrepeated between the two sets of four 32-bit words. The state generation can be defined in the following Python\r\ncode:\r\ndef setup_keystate(self, key, nonce, counter=0):\r\n \"\"\"Sets up initial keystate for Salsa20.\r\n Params:\r\n key (bytes): Key used to encrypt data (16 or 32 bytes)\r\n nonce (bytes): One-time pad used to generate keystate\r\n counter (int): Determines which blok is being encrypted\r\n \"\"\"\r\n nonce = list(struct.unpack('\u003c2I', nonce)) # Splits nonce into 2 words\r\n count = [counter \u003e\u003e 16, counter \u0026 0xffff] # Generates high and low order words for counter\r\n if len(key) == 32:\r\n const = list(struct.unpack('\u003c4I', b'expand 32-byte k')) # Splits const into 4 words\r\n k = list(struct.unpack('\u003c8I', key)) # Splits key into 8 words\r\n self.state = [const[0], k[0], k[1], k[2],\r\n k[3], const[1], nonce[0], nonce[1],\r\n count[1], count[0], const[2], k[4],\r\n k[5], k[6], k[7], const[3]]\r\n elif len(key) == 16:\r\n const = list(struct.unpack('\u003c4I', b'expand 16-byte k'))\r\n k = list(struct.unpack('\u003c4I', key)) # Splits key into 4 words\r\n self.state = [const[0], k[0], k[1], k[2],\r\n k[3], const[1], nonce[0], nonce[1],\r\n count[1], count[0], const[2], k[0],\r\n k[1], k[2], k[3], const[3]]\r\nAn example of how the state would look with a 16 byte and 32 byte key can be seen below:\r\nhttps://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nPage 8 of 12\n\nDifference between 16 and 32 Byte key in Salsa20\r\nGenerating Keystream\r\nTo generate the keystream, Salsa20 uses a “quarter-round” function to randomize the data in its initial state. This\r\nfunction is called “quarter-round” as it is working on one column or row at a time out of four, or one “quarter” at a\r\ntime. The default number of “rounds” is 20, unless otherwise specified. On even rounds, the algorithm will\r\ntransform its column values using the quarter-round function and on odd rounds it will transform its rows. The\r\nquarter-round funtion can be shown in the following Python code:\r\ndef QR(self, x, a, b, c, d):\r\n \"\"\"quarter-round function used in Salsa20.\r\n Params:\r\n x (array): Starting array to permutate\r\n a (int): index value for array\r\n b (int): index value for array\r\n c (int): index value for array\r\n d (int): index value for array\r\n \"\"\"\r\n x[b] ^= rol((x[a] + x[d]) \u0026 0xffffffff, 7)\r\n x[c] ^= rol((x[b] + x[a]) \u0026 0xffffffff, 9)\r\n x[d] ^= rol((x[c] + x[b]) \u0026 0xffffffff, 13)\r\n x[a] ^= rol((x[d] + x[c]) \u0026 0xffffffff, 18)\r\nOnce the initial state is run through this permutation function for the number of rounds specified, it will then add\r\nthe newly randomized state to its original values. This will ensure that the process cannot be reversed and the key\r\ncannot be recovered. The entire keystream generation process looks like:\r\nhttps://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nPage 9 of 12\n\ndef generate_ks(self):\r\n \"\"\"Generates Keystream for Salsa20\r\n Returns:\r\n bytes: 64-byte keystream\r\n \"\"\"\r\n x = self.state[:]\r\n for i in range(10):\r\n self.QR(x, 0, 4, 8, 12)\r\n self.QR(x, 5, 9, 13, 1)\r\n self.QR(x, 10, 14, 2, 6)\r\n self.QR(x, 15, 3, 7, 11)\r\n self.QR(x, 0, 1, 2, 3)\r\n self.QR(x, 5, 6, 7, 4)\r\n self.QR(x, 10, 11, 8, 9)\r\n self.QR(x, 15, 12, 13, 14)\r\n out = []\r\n for i in range(len(self.state)):\r\n out.append((self.state[i] + x[i]) \u0026 0xffffffff)\r\n out = struct.pack('\u003c16I',\r\n out[0], out[1], out[2], out[3],\r\n out[4], out[5], out[6], out[7],\r\n out[8], out[9], out[10], out[11],\r\n out[12], out[13], out[14], out[15])\r\n return out\r\nPutting it All Together\r\nAfter the keystream is generated, the Salsa20 algorithm will XOR it by the first 64 bytes, or less, of the plaintext.\r\nIf there is more than 64 bytes in the plaintext data, then the counter variable is incremented and a new keystream\r\nis generated for the next 64 byte block. This process continues until the entirety of the plaintext is encrypted. The\r\nPython code for this would look like the following:\r\ndef encrypt(self):\r\n ct = []\r\n print(len(self.data))\r\n for i in range(0, len(self.data), 64):\r\n block = self.data[i:i+64]\r\n self.setup_keystate(self.key, self.nonce, i//64)\r\n ks = self.generate_ks()\r\n for x in range(len(block)):\r\n ct.append(block[x] ^ ks[x])\r\n return ct\r\nhttps://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nPage 10 of 12\n\nIdentifying Salsa20 in Assembly\r\nThe easiest way to identify Salsa20 when analyzing a binary is to look for the constants expand 32-byte k or\r\nexpand 16-byte k . These will almost always be present for Salsa20 and are a guaranteed indicator.\r\nSalsa20 constant value being moved into the state\r\nHowever, in order to evade analysis, the author might change these constant values. If these values are changed,\r\nnext thing to look for would be the quarter-round function that Salsa20 uses to generate the keystream. To locate\r\nthis, the analyst should be looking for the rol operands followed by the normal quarter-round values: 7, 9, 13,\r\nand 18.\r\nhttps://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nPage 11 of 12\n\nSalsa20 quarter-round function showing the rol operands\r\nThe examples for this section were from an open source version of the Salsa20 algorithm written in C by\r\nalexwebr.\r\nConclusion\r\nHopefully this post will help newer analysts in identifying basic crypto functions that can be used by malware. By\r\nlearning how the algorithms operate at a low level, it will make it easier to spot them in the wild and possibly be\r\nable to identify different variations of the same algorithm that an author may use to evade detection. If you have\r\nany questions or comments about this post, feel free to message me on my Twitter or LinkedIn.\r\nThanks for reading and happy reversing!\r\nTutorial, Encryption, RC4, Salsa20\r\nMore Content Like This:\r\nSource: https://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nhttps://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions\r\nPage 12 of 12",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"Malpedia"
	],
	"origins": [
		"web"
	],
	"references": [
		"https://www.goggleheadedhacker.com/blog/post/reversing-crypto-functions"
	],
	"report_names": [
		"reversing-crypto-functions"
	],
	"threat_actors": [],
	"ts_created_at": 1775434899,
	"ts_updated_at": 1775826774,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/ef658e944ebf3162f14fd38c71edae3ae89f3d2c.pdf",
		"text": "https://archive.orkl.eu/ef658e944ebf3162f14fd38c71edae3ae89f3d2c.txt",
		"img": "https://archive.orkl.eu/ef658e944ebf3162f14fd38c71edae3ae89f3d2c.jpg"
	}
}