{
	"id": "b4dbd31d-433c-42bc-baca-0e18c8b2341d",
	"created_at": "2026-04-06T01:30:48.213121Z",
	"updated_at": "2026-04-10T03:20:53.92613Z",
	"deleted_at": null,
	"sha1_hash": "402175ba078ba7df042405edab08577ea612b6f7",
	"title": "AI-Powered Voice Spoofing for Next-Gen Vishing Attacks",
	"llm_title": "",
	"authors": "",
	"file_creation_date": "0001-01-01T00:00:00Z",
	"file_modification_date": "0001-01-01T00:00:00Z",
	"file_size": 54542,
	"plain_text": "AI-Powered Voice Spoofing for Next-Gen Vishing Attacks\r\nBy Mandiant\r\nPublished: 2024-07-23 · Archived: 2026-04-06 01:27:51 UTC\r\nWritten by: Emily Astranova, Pascal Issa\r\nExecutive Summary\r\nAI-powered voice cloning can now mimic human speech with uncanny precision, creating for more\r\nrealistic phishing schemes. \r\nAccording to news reports, scammers have leveraged voice cloning and deepfakes to steal over HK$200\r\nmillion from an organization.\r\nAttackers can use AI-powered voice cloning in various phases of the attack lifecycle, including initial\r\naccess, and lateral movement and privilege escalation.\r\nMandiant's Red Team uses AI-powered voice spoofing to test defenses, demonstrating the effectiveness of\r\nthis increasingly sophisticated attack technique.\r\nOrganizations can take steps to defend against this threat by educating employees, and using source\r\nverification such as code words. \r\nIntroduction\r\nLast year, Mandiant published a blog post on threat actor use of generative AI, exploring how attackers were using\r\ngenerative AI (gen AI) in phishing campaigns and information operations (IO), notably to craft more convincing\r\ncontent such as images and videos. We also shared insights into attackers' use of large language models (LLMs) to\r\ndevelop malware. In the post, we emphasized that while attackers are interested in gen AI, use has remained\r\nrelatively limited.\r\nThis post continues on that initial research, diving into some new AI tactics, techniques, and procedures (TTPs)\r\nand trends. We take a look at AI-powered voice spoofing, demonstrate how Mandiant red teams use it to test\r\ndefenses, and provide security considerations to help stay ahead of the threat.\r\nGrowing AI-Powered Voice Spoofing Threat\r\nGone are the days of robotic scammers with barely decipherable scripts. AI-powered voice cloning can now\r\nmimic human speech with uncanny precision, injecting a potent dose of realism into phishing schemes. We are\r\nreading more stories on this threat in the news, such as the scammers that reportedly stole over HK$200 million\r\nfrom a company using voice cloning and deepfakes, and now the Mandiant Red Team has incorporated these\r\nTTPs when testing defenses. \r\nBrief Overview of Vishing\r\nhttps://cloud.google.com/blog/topics/threat-intelligence/ai-powered-voice-spoofing-vishing-attacks\r\nPage 1 of 5\n\nUnlike its traditionally email-based counterpart, vishing (voice phishing) uses a voice-based approach. Rather than\r\nsending out an email with the hopes of garnering clicks, threat actors will instead place phone calls directly to\r\nindividuals in order to earn trust and manipulate emotions, often by creating a sense of urgency. \r\nLike traditional phishing, a threat actor's goal is to deceive individuals into divulging sensitive information,\r\ninitiating malicious actions, or transferring funds using social engineering tactics. These deceptive calls often\r\nimpersonate trustworthy entities such as banks, government agencies, or tech support, adding an extra layer of\r\nauthenticity to the scam.\r\nThe rise of powerful AI tools such as text generators, image creators, and voice synthesizers has sparked a wave of\r\nopen-source projects, making these technologies more accessible than ever before. This rapid development is\r\nputting the power of AI into the hands of a wider audience, fueling the potential for more convincing vishing\r\nattacks.\r\nAI-Powered Voice Spoofing in the Attack Lifecycle\r\nModern voice cloning involves recording and processing audio and training a model. Training the model relies on\r\na powerful combination of open-source libraries and algorithms, of which there are many popular choices today.\r\nWhen these initial steps are completed, attackers may take additional time to understand speech patterns of the\r\nindividual being impersonated, and even write a script before conducting operations. This helps create an extra\r\nlayer of authenticity, and the attack is more likely to be successful.\r\nNext, attackers may use AI-powered voice spoofing in different stages of the attack lifecycle.\r\nInitial Access\r\nThere are various ways a threat actor can gain initial access using a spoofed voice. Threat actors can impersonate\r\nexecutives, colleagues, or even IT support personnel to trick victims into revealing confidential information,\r\ngranting remote access to systems, or transferring funds. The inherent trust associated with a familiar voice can be\r\nexploited to manipulate victims into taking actions they would not normally take, such as clicking on malicious\r\nlinks, downloading malware, or divulging sensitive data. Although voice-based trust systems are seldom used, AI-spoofed voices can also potentially bypass voice-based authentication systems used for multi-factor authentication\r\nor password resets, granting unauthorized access to critical accounts.\r\nLateral Movement and Privilege Escalation\r\nThreat actors can leverage AI voice spoofing to hop from system to system, impersonating trusted individuals to\r\nmanipulate their way to higher access levels. There are a few ways this may unfold.\r\nOne method of lateral movement is chaining impersonations. Imagine an attacker initially gaining access by\r\nimpersonating a helpdesk employee. After establishing communication with a network administrator, the attacker\r\ncould subtly record the administrator's voice during the interaction. This captured audio can then be used to train a\r\nnew AI voice spoofing model, allowing the attacker to seamlessly impersonate the administrator and initiate\r\ncommunication with other unsuspecting targets within the network. This chaining of impersonations enables the\r\nattacker to move laterally, potentially gaining access to more sensitive systems and data.\r\nhttps://cloud.google.com/blog/topics/threat-intelligence/ai-powered-voice-spoofing-vishing-attacks\r\nPage 2 of 5\n\nAnother method is during the initial access phase, threat actors might discover readily available voice recordings\r\non a compromised host, such as voicemails, meeting recordings, or even training materials. These recordings can\r\nbe leveraged to train AI voice-spoofing models, allowing the attacker to impersonate specific individuals within\r\nthe organization without needing to interact with them directly. This can be particularly effective for targeting\r\nhigh-value individuals or bypassing systems that rely on voice biometrics for access control.\r\nMandiant Red Team Proactive Case Study\r\nIn late 2023, Mandiant conducted a controlled red team exercise with a client, using AI voice spoofing to gain\r\ninitial access to their internal network. This case study highlights the effectiveness of this increasingly\r\nsophisticated attack technique.\r\nThe exercise began with obtaining client consent and crafting a custom realistic social engineering pretext. The\r\nRed Team opted to impersonate a member of the client's security team, requiring a natural voice sample. After\r\nreviewing the pretext with the client, the client provided explicit permission to use their voice for this exercise.\r\nNext, we obtained the necessary audio data to train a model, and achieved a passable level of realism. Open-source intelligence (OSINT) played a crucial role in the next phase. By gathering employee data (job titles,\r\nlocations, phone numbers), the Red Team identified potential targets most likely to recognize the impersonated\r\nvoice and possess the necessary permissions for our objectives. With a curated target list, the team initiated\r\nspoofed calls via VoIP services and number spoofing.\r\nAfter facing voicemail greetings and other initial hurdles, the first unsuspecting victim answered with a trusting\r\n\"Hey boss, what's up?\". The Red Team had reached a security administrator who reported to the person whose\r\nvoice was spoofed. Leveraging the pretext of a \"VPN client misconfiguration,\" the Red Team exploited the\r\nopportune timing of a recent global outage impacting the client's VPN provider. This carefully chosen scenario\r\ninstilled a sense of urgency and increased the victim's susceptibility to our instructions.\r\nDue to the trust in the voice on the phone, the victim bypassed security prompts from both Microsoft Edge and\r\nWindows Defender SmartScreen, unknowingly downloading and executing a pre-prepared malicious payload onto\r\ntheir workstation. The successful detonation of the payload marked the completion of the exercise, showcasing the\r\nalarming ease with which AI voice spoofing can facilitate the breach of an organization.\r\nSecurity Considerations\r\nThis type of exploitation is social in nature, and currently technical detection controls are limited. Available\r\nmitigations center around three major principles: awareness, source verification, and future technical\r\nconsiderations.\r\nAwareness\r\nEducate employees, particularly those who control money and access, on the existence and methodologies of AI\r\nvishing attacks. Consider adding AI enhanced threats to security awareness training. With such effective and\r\naccessible mimicry available to threat actors, everyone should now adopt a healthy dose of skepticism when\r\ndealing with phone calls, especially if they fall under one or more of the following cases:\r\nhttps://cloud.google.com/blog/topics/threat-intelligence/ai-powered-voice-spoofing-vishing-attacks\r\nPage 3 of 5\n\nThe caller is saying things that sound too good to be true.\r\nThe call is from an untrusted number/entity.\r\nThe caller tries to enforce questionable authority.\r\nThe caller is out of character for the source.\r\nEmployees in trusted positions should be extremely wary of high urgency calls that demand immediate action,\r\nespecially when the caller asks or gives financial or access oriented information, such as requesting a one-time\r\npassword (OTP). Employees should be empowered to hang up and report suspicious calls, especially if they\r\nbelieve AI vishing is involved. It is likely another employee is about to receive the same attack.\r\nSource Verification\r\nWhen possible, cross-reference the information with trusted sources. This includes hanging up and calling back at\r\na number previously validated for the source. The caller can be asked to send a text message from a previously\r\nvalidated number or ask them to send an email or an enterprise chat message.\r\nTrain employees to spot audio inconsistencies, such as sudden variation of background noise, which could be a\r\nsymptom of the threat actor not spending enough time cleaning the audio. Look for unusual speech patterns, like a\r\ncompletely different vernacular than what the source typically uses. Watch for unnatural inflections, fillers not\r\ncommonly used by the source, strange clicks, pauses or abnormal repetition. Pay attention to voice timbre (tone)\r\nand cadence as well.\r\nEstablish code words for executives and critical staff that deal with sensitive and/or financial information. Do this\r\nout of band so there is no trace within the enterprise to limit exposure in case of a breach. The code words can\r\nthen be used to validate individuals in case of doubt.\r\nIf possible, let unknown numbers go to voicemail. Apply the same vigilance to voice calls that you would\r\notherwise apply to emails. Report any suspicious calls for wider awareness.\r\nFuture Technical Considerations\r\nToday, organizations can, at best, implement traditional security measures to protect audio conversations within\r\nthe organization, like using separate networks for VoIP channels as well as implementing authentication and\r\ntransmission encryption for the same. However, this does not resolve attacks made against employees' personal\r\nphones.\r\nGoing forward, organizations should consider protecting all audio assets, implementing technologies such as\r\ndigital watermarking that are subtle enough to be imperceptible to the human ear, but easily detected by AI\r\ntechnologies.\r\nEventually, mobile device management tools will offer technologies to help verify callers. In the meantime,\r\norganizations should consider requiring all sensitive conversations to occur over enterprise chat channels, where\r\nstrong authentication is required, and identities are not easily spoofed.\r\nResearch and tools are actively being developed to help in detecting deepfakes. While they have inconsistent\r\naccuracy today, they can still provide value in identifying deepfakes in voicemail or offline voice notes. The\r\nhttps://cloud.google.com/blog/topics/threat-intelligence/ai-powered-voice-spoofing-vishing-attacks\r\nPage 4 of 5\n\ndetection capabilities will improve over time and eventually be adopted into supportable enterprise tooling. For\r\nadditional reading, consider the active research going into real-time detection, such as DF-Captcha, which\r\nsuggests a simple application to queue human prompts implemented using challenge response to validate the\r\nidentity of the party on the other line.\r\nConclusion\r\nIn this blog post, we explored how modern AI tools can help create more convincing vishing attacks. The alarming\r\nsuccess of Mandiant's vishing underscores the urgent need for heightened security measures against AI voice-spoofing attacks. While technology offers powerful tools for both attackers and defenders, the human element\r\nremains the critical vulnerability. The case study we shared should serve as a wake-up call, urging organizations\r\nand individuals alike to take proactive steps.\r\nMandiant started leveraging AI voice-spoofing attacks in its more complex Red Team Assessments and Social\r\nEngineering Assessments to demonstrate the impact such an attack could have on an organization. As threat actors'\r\nuse of this technique increases in frequency, it is imperative that defenders plan and take precautions.\r\nPosted in\r\nThreat Intelligence\r\nSource: https://cloud.google.com/blog/topics/threat-intelligence/ai-powered-voice-spoofing-vishing-attacks\r\nhttps://cloud.google.com/blog/topics/threat-intelligence/ai-powered-voice-spoofing-vishing-attacks\r\nPage 5 of 5",
	"extraction_quality": 1,
	"language": "EN",
	"sources": [
		"MITRE"
	],
	"references": [
		"https://cloud.google.com/blog/topics/threat-intelligence/ai-powered-voice-spoofing-vishing-attacks"
	],
	"report_names": [
		"ai-powered-voice-spoofing-vishing-attacks"
	],
	"threat_actors": [],
	"ts_created_at": 1775439048,
	"ts_updated_at": 1775791253,
	"ts_creation_date": 0,
	"ts_modification_date": 0,
	"files": {
		"pdf": "https://archive.orkl.eu/402175ba078ba7df042405edab08577ea612b6f7.pdf",
		"text": "https://archive.orkl.eu/402175ba078ba7df042405edab08577ea612b6f7.txt",
		"img": "https://archive.orkl.eu/402175ba078ba7df042405edab08577ea612b6f7.jpg"
	}
}