{
	"id": "92cbebee-2f2f-42a3-8bd8-52eadceb80f3",
	"created_at": "2026-04-29T02:20:38.974553Z",
	"updated_at": "2026-04-29T08:22:29.345447Z",
	"deleted_at": null,
	"sha1_hash": "bddf8b63433405cd0f2c6520ead95fdaa38ea03c",
	"title": "Invisible Code \u0026 Hidden Prompts - How Attackers Weaponize Unicode in Repos (and How SAST Can Help) - Cycode",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 141291,
	"plain_text": "Invisible Code \u0026 Hidden Prompts - How Attackers Weaponize\r\nUnicode in Repos (and How SAST Can Help) - Cycode\r\nBy Shaked Perets\r\nPublished: 2025-11-24 · Archived: 2026-04-29 02:12:05 UTC\r\nIntroduction: When Code Isn’t What It Seems\r\nImagine reviewing a code commit that looks perfectly ordinary to the naked eye, yet harbors a hidden backdoor.\r\nThis isn’t sci-fi; it’s the reality of invisible code attacks. Attackers have learned to embed malicious instructions\r\nusing Unicode characters that are deceptive to human readers. The compiler sees one thing, while the human\r\nreviewer sees another, breaking our traditional security models.\r\nIn 2021, researchers revealed “Trojan Source“, showing how Unicode could be abused to make harmful code\r\ninvisible. By 2025, these techniques moved from theory to practice with the Glassworm worm, which hid\r\nmalicious payloads inside Visual Studio Code extensions using invisible bytes.\r\nIn this post, we will go deep into the bytes to understand how this works.\r\nUnder the Hood: From ASCII to UTF-8\r\nTo understand how code becomes invisible, we first need to understand how computers store text.\r\nDecades ago, we relied on ASCII. It was a 7-bit system capable of representing 128 characters (A-Z, 0-9, and\r\nbasic punctuation). In ASCII, “what you see is what you get.” There was no room for hidden tricks because every\r\nbyte had a visible purpose.\r\nThen came Unicode. Unicode is a map that assigns a unique number (a “Code Point”) to almost every character in\r\nevery human language, plus symbols and emojis. Finally, we have UTF-8, the encoding standard that translates\r\nthose Unicode numbers into binary bytes that computers can store.\r\nThe Rocket Example 🚀\r\nTo your eye: You see a rocket emoji: 🚀\r\nTo Unicode: It is Code Point U+1F680.\r\nTo the machine (Hex/Binary): In UTF-8, this single character requires 4 bytes: F0 9F 9A 80.\r\nAttackers exploit the fact that Unicode contains thousands of valid code points that map to nothing visually, yet\r\nstill occupy bytes in the file.\r\nUnicode is the universal character encoding standard that lets software handle text in all languages and scripts.\r\nWithin Unicode’s enormous range are numerous non-printing or formatting characters – things like zero-width\r\nspaces, directional markers, and private-use symbols. They generally have no visible representation in editors.\r\nhttps://cycode.com/blog/invisible-code-hidden-prompts-unicode-attacks-sast/\r\nPage 1 of 9\n\nDevelopers “read what they see” in code, but invisible Unicode means we can’t always trust our eyes: parts of the\r\ncode could be there, doing something nefarious, yet not visually apparent.\r\nAttackers weaponize these properties to hide logic, alter program flow, or conceal malware in source code. In\r\neffect, it’s an attack on the human reviewers and tooling: the code compiles and runs normally, but critical pieces\r\nare camouflaged from reviewers or basic text diffs.\r\nThere are a few main categories of Unicode trickery we need to understand:\r\nVariation Selectors – exploiting characters meant to modify preceding symbols (like emojis) by placing\r\nthem in isolation, creating invisible sequences that are ignored by editors but processed by the compiler.\r\nPrivate Use Area (PUA) encoding – using custom, non-standard Unicode characters that render as\r\n“nothing” to encode malicious code.\r\nBidirectional (bidi) control characters – injecting Unicode control codes that reorder text display\r\n(intended for mixing RTL and LTR scripts) to visually reshape code, e.g. making code appear as a\r\ncomment or string when it’s actually executable.\r\nLet’s examine the most dangerous techniques in depth, looking at the raw hex data to see the truth.\r\nTechnique 1: Variation Selectors (The Glassworm Method)\r\nVariation Selectors are intended to slightly modify the character preceding them (like changing an emoji color).\r\nHowever, attackers use them in isolation or chains. Because there is no base character to modify, editors simply\r\ndisplay nothing – but the bytes remain.\r\nTo see why this is dangerous, let’s compare a truly empty string against a malicious one.\r\nCase A: The Normal “Empty” String\r\nThis is what safe code looks like. A developer initializes a variable with an empty string.\r\nconst key = \"\";\r\nThe Hexdump: Everything is exactly as it seems. The file is small, and the bytes match the text 1:1.\r\nPlain text\r\nCopy to clipboard\r\nOpen code in new window\r\nEnlighterJS 3 Syntax Highlighter\r\n00000000 63 6f 6e 73 74 20 6b 65 79 20 3d 20 22 22 3b |const key = \"\";|\r\n00000000 63 6f 6e 73 74 20 6b 65 79 20 3d 20 22 22 3b |const key = \"\";|\r\nhttps://cycode.com/blog/invisible-code-hidden-prompts-unicode-attacks-sast/\r\nPage 2 of 9\n\n00000000 63 6f 6e 73 74 20 6b 65 79 20 3d 20 22 22 3b |const key = \"\";|\r\nCase B: The Weaponized “Empty” String\r\nThis is what the Glassworm attack looks like. Visually, the code looks identical to Case A.  const key = “︀”;\r\n(Looks empty, right?)\r\nThe Hexdump: Look at the highlighted bytes. Between the quote marks (22), there are now 3 extra bytes (ef b8\r\n80). These represent Variation Selector-1, which is invisible. In real attacks, this “empty” space can contain\r\nthousands of bytes of malicious payload.\r\nPlain text\r\nCopy to clipboard\r\nOpen code in new window\r\nEnlighterJS 3 Syntax Highlighter\r\n00000000 63 6f 6e 73 74 20 6b 65 79 20 3d 20 22 ef b8 80 |const key = \"...|\r\n00000010 22 3b |\";|\r\n00000000 63 6f 6e 73 74 20 6b 65 79 20 3d 20 22 ef b8 80 |const key = \"...| 00000010 22 3b |\";|\r\n00000000 63 6f 6e 73 74 20 6b 65 79 20 3d 20 22 ef b8 80 |const key = \"...|\r\n00000010 22 3b |\";|\r\nAnatomy of the Attack:\r\n🟢 Safe Prefix: const key = “\r\nNormal ASCII bytes ending at 22.\r\n🔴 The Trap: ef b8 80\r\nThis is the invisible code. These 3 bytes represent U+FE00 (Variation Selector-1). They render as\r\nnothing, but the compiler reads them as valid data.\r\n🟢 Safe Suffix: “;\r\nThe closing quote 22 and semicolon 3b.\r\nReferences:\r\nVariation Selector-1 (VS1): https://unicode-explorer.com/b/FE00\r\nVariation Selector-17 (VS17): https://unicode-explorer.com/b/E0100\r\nTechnique 2: Private Use Area (PUA) Encoding\r\nhttps://cycode.com/blog/invisible-code-hidden-prompts-unicode-attacks-sast/\r\nPage 3 of 9\n\nUnicode reserves blocks of code points called “Private Use Areas.” These are strictly for internal use by\r\napplications and have no standard glyphs. Because no font has a picture for them, editors render them as nothing\r\nor a generic “missing character” box which is easily overlooked.\r\nThe Attack: Attackers map their malicious script to these characters. For example, they might shift the standard\r\nletter ‘A’ (0x41) to a PUA character (U+E0041). The result is a “blank” blob of text that the program decodes at\r\nruntime into executable code.\r\nCase A: The Normal String\r\nHere is a standard variable assignment. The variable payload is truly empty.\r\nvar payload = \"\";\r\nThe Hexdump: The hex bytes perfectly match the visible text.\r\nPlain text\r\nCopy to clipboard\r\nOpen code in new window\r\nEnlighterJS 3 Syntax Highlighter\r\n00000000 76 61 72 20 70 61 79 6c 6f 61 64 20 3d 20 22 22 |var payload = \"\"|\r\n00000010 3b |;|\r\n00000000 76 61 72 20 70 61 79 6c 6f 61 64 20 3d 20 22 22 |var payload = \"\"| 00000010 3b |;|\r\n00000000 76 61 72 20 70 61 79 6c 6f 61 64 20 3d 20 22 22 |var payload = \"\"|\r\n00000010 3b |;|\r\nCase B: The Weaponized PUA String\r\nHere, the attacker hides a malicious instruction inside what looks like an empty string.\r\nvar payload = \" \"; (To the eye, this looks identical to Case A: “”)\r\nThe Hexdump: The file size has grown, and new bytes have appeared. The ASCII column shows dots ….\r\nBecause the terminal cannot display the character, but the Hex column reveals the truth.\r\nPlain text\r\nCopy to clipboard\r\nOpen code in new window\r\nhttps://cycode.com/blog/invisible-code-hidden-prompts-unicode-attacks-sast/\r\nPage 4 of 9\n\nEnlighterJS 3 Syntax Highlighter\r\n00000000 76 61 72 20 70 61 79 6c 6f 61 64 20 3d 20 22 f3 |var payload = \".|\r\n00000010 bf be 80 22 3b\r\n00000000 76 61 72 20 70 61 79 6c 6f 61 64 20 3d 20 22 f3 |var payload = \".| 00000010 bf be 80 22 3b\r\n00000000 76 61 72 20 70 61 79 6c 6f 61 64 20 3d 20 22 f3 |var payload = \".|\r\n00000010 bf be 80 22 3b\r\nAnatomy of the Attack:\r\n🟢 Safe Prefix: var payload = “\r\nStandard text.\r\n🔴 The Trap: f3 bf be 80\r\nHidden Payload. This is the UTF-8 representation of the PUA character U+FFF80. It occupies 4\r\nbytes of disk space but renders 0 pixels of width.\r\n🟢 Safe Suffix: “;\r\nClosing syntax.\r\nReferences:\r\nPrivate Use Area Search: https://unicode-explorer.com/search/\r\nSupplementary PUA-A (U+FFF80): https://unicode-explorer.com/b/FFF80\r\nSupplementary PUA-B (U+10FF80): https://unicode-explorer.com/b/10FF80\r\nTechnique 3: Bidirectional Control (Trojan Source)\r\nThis technique was famously documented in the research paper Trojan Source. It exploits Unicode characters\r\nintended for Right-to-Left languages (like Hebrew or Arabic) to decouple the visual order of code from the\r\nlogical order that the compiler executes.\r\nThe Attack (The “Stretched String”): In this specific example (adapted from Nick Boucher’s Trojan Source\r\nRepo), an attacker uses the Right-to-Left Override (RLO) to trick the code reviewer into thinking a string\r\ncomparison is safe.\r\nThe Goal: Force the code to print “You are an admin” even though the user is not an admin.\r\nThe Method: The attacker hides the comment // Check if admin inside the string literal using invisible\r\nBidi control characters.\r\nWhat the Reviewer Sees (Visual)\r\nTo a human eye, this looks like a standard security check. The code compares accessLevel to “user”. Since they\r\nare equal, the if block should not run.\r\nhttps://cycode.com/blog/invisible-code-hidden-prompts-unicode-attacks-sast/\r\nPage 5 of 9\n\nPlain text\r\nCopy to clipboard\r\nOpen code in new window\r\nEnlighterJS 3 Syntax Highlighter\r\nvar accessLevel = \"user\";\r\nif (accessLevel != \"user\") {// Check if admin\r\nconsole.log(\"You are an admin.\");\r\n}\r\nvar accessLevel = \"user\"; if (accessLevel != \"user\") { console.log(\"You are an admin.\"); }// Check if admin\r\nvar accessLevel = \"user\";\r\nif (accessLevel != \"user\") {// Check if admin\r\n console.log(\"You are an admin.\");\r\n}\r\nWait… that comment looks strange? In a vulnerable editor, the comment // Check if admin would appear outside\r\nthe quote marks, making it look like: if (accessLevel != “user”) { // Check if admin\r\nWhat the Compiler Sees (Logical)\r\nThe compiler reads the bytes linearly. It seems that the “comment” is actually part of the string. Logic: if (“user”\r\n!= “user [invisible_bytes] // Check if admin”)\r\nSince “user” is NOT equal to that long garbage string, the condition is TRUE, and the code grants admin\r\nprivileges.\r\nThe Evidence (Hexdump): The hexdump -C exposes the trick. Look at how the Bidi characters (e2 80 ae) flip the\r\nrendering.\r\nPlain text\r\nCopy to clipboard\r\nOpen code in new window\r\nEnlighterJS 3 Syntax Highlighter\r\n00000000 69 66 20 28 61 63 63 65 73 73 4c 65 76 65 6c 20 |if (accessLevel |\r\n00000010 21 3d 20 22 75 73 65 72 e2 80 ae 20 e2 81 a6 2f |!= \"user... .../|\r\nhttps://cycode.com/blog/invisible-code-hidden-prompts-unicode-attacks-sast/\r\nPage 6 of 9\n\n00000020 2f 20 43 68 65 63 6b 20 69 66 20 61 64 6d 69 6e |/ Check if admin|\r\n00000030 e2 81 a9 20 e2 81 a6 22 29 20 7b 0a |... ...\") {.|\r\n00000000 69 66 20 28 61 63 63 65 73 73 4c 65 76 65 6c 20 |if (accessLevel | 00000010 21 3d 20 22 75 73 65 72\r\ne2 80 ae 20 e2 81 a6 2f |!= \"user... .../| 00000020 2f 20 43 68 65 63 6b 20 69 66 20 61 64 6d 69 6e |/ Check if\r\nadmin| 00000030 e2 81 a9 20 e2 81 a6 22 29 20 7b 0a |... ...\") {.|\r\n00000000 69 66 20 28 61 63 63 65 73 73 4c 65 76 65 6c 20 |if (accessLevel |\r\n00000010 21 3d 20 22 75 73 65 72 e2 80 ae 20 e2 81 a6 2f |!= \"user... .../|\r\n00000020 2f 20 43 68 65 63 6b 20 69 66 20 61 64 6d 69 6e |/ Check if admin|\r\n00000030 e2 81 a9 20 e2 81 a6 22 29 20 7b 0a |... ...\") {.|\r\nAnatomy of the Attack:\r\n🟢 Safe Start: if (accessLevel != “user\r\nNormal code up to byte 72.\r\n🔴 The Trigger: e2 80 ae\r\nRight-to-Left Override (RLO). This byte sequence forces the editor to display the next characters\r\nbackwards.\r\n🔴 The Trick: 2f 2f (//)\r\nThe RLO makes these slashes appear at the end of the line as a comment. In reality (and in the hex),\r\nthey are trapped inside the string literal.\r\nThe Fix: GitHub’s Warning\r\nBecause this attack is so subtle, platforms like GitHub had to intervene. If you view a file with these hidden\r\ncharacters on GitHub today, you will see a big yellow warning banner:\r\n“This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears\r\nbelow.”\r\nReferences:\r\nTrojan Source Research: https://trojansource.codes/\r\nJavaScript Proof-of-Concept: https://github.com/nickboucher/trojan-source/tree/main/JavaScript\r\nGitHub Warning Announcement: https://github.blog/changelog/2021-10-31-warning-about-bidirectional-unicode-text/\r\nThe AI Blind Spot: Invisible Prompt Injection\r\nAs software development shifts toward AI-driven workflows, a new and dangerous vector has emerged.\r\nDevelopers act as the “eyes” checking the code, but AI agents act as the “brain” reading the raw bytes.\r\nhttps://cycode.com/blog/invisible-code-hidden-prompts-unicode-attacks-sast/\r\nPage 7 of 9\n\nThe Technique: Unicode Tag Characters Attackers use Unicode Tag Characters (Block U+E0000). These are\r\nspecial versions of standard letters (like A, B, C) that are defined by Unicode to be invisible for display. They\r\noccupy 4 bytes of storage each but render 0 pixels on the screen.\r\nThe Attack Scenario: An attacker wants to trick an AI agent into ignoring safety rules. They inject these invisible\r\ntags into a markdown file (like agents.md).\r\n1. The Human View (Visual) When a developer reviews the file, they see only the safe text. The invisible\r\ncharacters are rendered as zero-width, effectively disappearing.\r\nSystem: You are a helpful assistant.\r\n2. The AI View (Logical) The AI doesn’t “look” at the screen; it processes the text stream. When it reads the\r\nfile, it receives the standard text plus the invisible tag characters. If the tokenizer processes these tags, the\r\nAI reads:\r\nSystem: You are a helpful assistant. IGNORE\r\nThe Evidence (Hexdump) Here is a hexdump of that exact string. The text appears to end at the dot (.), but the\r\nbinary data continues.\r\nPlain text\r\nCopy to clipboard\r\nOpen code in new window\r\nEnlighterJS 3 Syntax Highlighter\r\n00000000 73 73 69 73 74 61 6e 74 2e 20 f3 a0 81 89 f3 a0 |ssistant. ......|\r\n00000010 81 87 f3 a0 81 8e f3 a0 81 8f f3 a0 81 92 f3 a0 |................|\r\n00000020 81 85\r\n00000000 73 73 69 73 74 61 6e 74 2e 20 f3 a0 81 89 f3 a0 |ssistant. ......| 00000010 81 87 f3 a0 81 8e f3 a0 81 8f\r\nf3 a0 81 92 f3 a0 |................| 00000020 81 85\r\n00000000 73 73 69 73 74 61 6e 74 2e 20 f3 a0 81 89 f3 a0 |ssistant. ......|\r\n00000010 81 87 f3 a0 81 8e f3 a0 81 8f f3 a0 81 92 f3 a0 |................|\r\n00000020 81 85\r\nDecoding the Invisible Bytes: The dots (.) in the right-hand column hide the truth, but the hex bytes on the left\r\ntell the story.\r\nf3 a0 81 89: This is the invisible Tag ‘I’ (U+E0049). It is completely different from the visible ‘I’ (0x49),\r\nbut the AI can read it.\r\nf3 a0 81 87: This is the invisible Tag ‘G’ (U+E0047).\r\nhttps://cycode.com/blog/invisible-code-hidden-prompts-unicode-attacks-sast/\r\nPage 8 of 9\n\nf3 a0 81 8e: This is the invisible Tag ‘N’ (U+E004E).\r\nThe Danger: Because the invisible text is technically part of the prompt, a malicious instruction can be smuggled\r\npast human review. The developer approves the “safe” prompt, but the AI executes the hidden “unsafe” command.\r\nDefending Against Invisible Attacks with SAST\r\nDefending against something you literally cannot see is a challenge for humans. However, as we proved with the\r\nhexdumps above, these attacks leave a clear trace in the binary data.\r\nStatic Application Security Testing (SAST) tools, like Cycode, don’t look at the rendered font – they look at the\r\nhex.\r\n1. Pattern Matching: SAST can ban specific byte sequences (like EF B8 80 / VS1) that should never appear\r\nin source code.\r\n2. Anomaly Detection: A SAST scan can flag strings that contain a high density of PUA characters or mixed-script Homoglyphs.\r\nQuick Manual Checks: If you suspect a snippet of code contains invisible characters but don’t have a scanner\r\nhandy, you can use online detectors to reveal them:\r\nInvisible Character Detector\r\nInvisible Character Viewer\r\nBy integrating SAST into your CI/CD pipeline, you ensure that no matter how “invisible” the code looks to your\r\neyes, the raw bytes are analyzed, flagged, and blocked before they can hurt your users.\r\nOriginally published: November 24, 2025\r\nSource: https://cycode.com/blog/invisible-code-hidden-prompts-unicode-attacks-sast/\r\nhttps://cycode.com/blog/invisible-code-hidden-prompts-unicode-attacks-sast/\r\nPage 9 of 9\n\nOpen code in EnlighterJS 3 new window Syntax Highlighter   \n00000000 63 6f 6e 73 74 20 6b 65 79 20 3d 20 22 22 3b |const key = \"\";|\n00000000 63 6f 6e 73 74 20 6b 65 79 20 3d 20 22 22 3b |const key = \"\";|\n   Page 2 of 9\n\nOpen code EnlighterJS in new window 3 Syntax Highlighter    \n00000000 69 66 20 28 61 63 63 65 73 73 4c 65 76 65 6c 20 |if (accessLevel |\n00000010 21 3d 20 22 75 73 65 72 e2 80 ae 20 e2 81 a6 2f |!= \"user... .../|\n    Page 6 of 9",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"MITRE"
	],
	"origins": [
		"web"
	],
	"references": [
		"https://cycode.com/blog/invisible-code-hidden-prompts-unicode-attacks-sast/"
	],
	"report_names": [
		"invisible-code-hidden-prompts-unicode-attacks-sast"
	],
	"threat_actors": [
		{
			"id": "aa73cd6a-868c-4ae4-a5b2-7cb2c5ad1e9d",
			"created_at": "2022-10-25T16:07:24.139848Z",
			"updated_at": "2026-04-29T06:58:58.13853Z",
			"deleted_at": null,
			"main_name": "Safe",
			"aliases": [],
			"source_name": "ETDA:Safe",
			"tools": [
				"DebugView",
				"LZ77",
				"OpenDoc",
				"SafeDisk",
				"TypeConfig",
				"UPXShell",
				"UsbDoc",
				"UsbExe"
			],
			"source_id": "ETDA",
			"reports": null
		}
	],
	"ts_created_at": 1777429238,
	"ts_updated_at": 1777450949,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/bddf8b63433405cd0f2c6520ead95fdaa38ea03c.pdf",
		"text": "https://archive.orkl.eu/bddf8b63433405cd0f2c6520ead95fdaa38ea03c.txt",
		"img": "https://archive.orkl.eu/bddf8b63433405cd0f2c6520ead95fdaa38ea03c.jpg"
	}
}