Digital deception: the rise of AI voice manipulation in virtual kidnapping

Richard van Hooijdonk

Using biometric information from public platforms allows cybercriminals to replicate voices with alarming accuracy and use them in virtual kidnappings.

Global survey sheds light on the alarming efficacy of AI voice cloning scams
Insights into virtual kidnapping tactics
The rise of generative AI in voice impersonation cyberattacks
Real-life cases of virtual kidnappings
What will the future of cyber kidnapping look like?

Even though AI and machine learning tools improve numerous aspects of our daily lives, streamline intricate tasks, and reshape entire industries, we need to acknowledge that they also come with challenges and darker implications. These advanced tools, built with intentions to improve productivity and enhance quality of life, are increasingly falling into the hands of nefarious individuals. Cybercriminals, always adept at leveraging emerging technologies, have begun to harness the power of AI to deceive, impersonate, and exploit unsuspecting individuals. A glaring example came to light when the Federal Bureau of Investigation (FBI) issued a warning, highlighting an alarming trend: cyber offenders using deepfake technology — AI’s ability to forge hyper-realistic images and sounds — to manipulate and transform everyday photos and videos for sextortion purposes. These deceptive and distressing schemes have proven immensely profitable for cybercriminals.

One of the dark avenues that malicious actors have ventured into is the domain of AI-generated deepfake audios, also known as voice cloning. Scraping even the tiniest bit of biometric data from social media platforms — even official government portals — cybercriminals can now recreate voices with unsettling precision. Tools like VoiceLab are readily available for synthesising these deepfake voices, which can be nearly indistinguishable from the real thing. These authentic-sounding voice clones have unlocked new doors for extortionists, as they fabricate distressing scenarios, often borrowing heavily from movie narratives, to make it seem as if someone’s loved one is in grave peril. A rising tide of incidents showcases cybercriminals playing back such voice clones to unsuspecting victims, coercing them into transferring vast sums as ransom. As per the Federal Trade Commission’s data, scams involving impersonation were responsible for staggering losses, amounting to an astounding $2.6 billion in 2022 alone. This emergence of AI-driven cyber threats forces us to reconsider the balance between innovation and its potential misuse in our digital age.

“Cybercriminals create the kind of messages you might expect. Ones full of urgency and distress. They will use the cloning tool to impersonate a victim’s friend or family member with a voice message that says they’ve been in a car accident, or maybe that they’ve been robbed or injured. Either way, the bogus message often says they need money right away”.
Amy Bunn, chief communications officer at McAfee

Global survey sheds light on the alarming efficacy of AI voice cloning scams

A recent study conducted by cybersecurity giant McAfee, involving 7,000 participants from across the globe, found that a startling one in four people had either fallen victim to such scams or knew someone to whom it had happened. The modus operandi is frighteningly simple: armed with a brief audio snippet, these scammers can generate a nearly indistinguishable replica of any voice, subsequently deploying these voice clones to send voicemails or voice-text messages. Even accents from diverse regions across the world can now be replicated with astonishing accuracy, a testament to the advancements in voice imitation technologies. However, individuals who possess a unique rhythm or cadence in their speech, or those who have distinctive verbal idiosyncrasies, present a more complex challenge for these systems. The McAfee study underscores a significant vulnerability among the public, with a whopping 70 per cent admitting their inability to confidently discern between an authentic voice and an AI-generated version.

Some 53 per cent of adults regularly share their voice data on a weekly basis and nearly 49 per cent do so as frequently as ten times within a week, presenting a vast pool of data that can be used for cyber exploitation.
McAfee

Disturbingly, the survey highlighted that one in ten respondents had encountered an AI voice clone attempting to deceive them. Even more concerning was the revelation that of these individuals, a staggering 77 per cent ended up losing money to these sophisticated scams. Delving deeper into the financial impact, over a third (36 per cent) reported losses of up to $3,000, while a significant seven per cent faced losses with figures spanning between $5,000 and $15,000. The research further underscored the ease with which cybercriminals can access genuine voice recordings to create convincing clones. With more than half (53 per cent) of adults regularly sharing their voice data on a weekly basis and nearly half (49 per cent) doing so as frequently as ten times within a week, it presents a vast pool of data that can be used for cyber exploitation. Voice data is typically shared via podcasts, via reels on social media, or through video posts on YouTube. Young people and public figures, often at the forefront of embracing new tech and rapidly growing social media platforms, are increasingly vulnerable to unauthorised biometric data collection for potential use in virtual abduction schemes. Platforms like Instagram, Facebook, and TikTok provide an easy way for culprits to identify potential targets and gather specific information, enhancing the authenticity of their deceptive acts.

The convenience of voice-activated technology further exacerbates potential security vulnerabilities. This stresses the urgency for greater public awareness and the development of robust countermeasures to prevent these and similar cyber threats. “Cybercriminals create the kind of messages you might expect. Ones full of urgency and distress. They will use the cloning tool to impersonate a victim’s friend or family member with a voice message that says they’ve been in a car accident, or maybe that they’ve been robbed or injured. Either way, the bogus message often says they need money right away”, says Amy Bunn, chief communications officer at McAfee. The survey also indicated that almost half of the participants admitted to responding to voicemails or voice messages when they seemed to originate from a close friend or family member, particularly if the voice seemed to be that of a partner, parent, or child. Delving deeper into the tactics employed by these cybercriminals, they craft convincing narratives where the caller seems to be in dire circumstances — perhaps they’ve been a victim of a theft, involved in a car accident, misplaced their wallet, or faced other unfortunate events. The urgency of these made-up scenarios clouds the listener’s judgement and compels him or her to act quickly. Making matters worse, these cyber crooks frequently demand payments through channels that are notoriously difficult to trace or reverse. This includes methods like traditional wire transfers, rechargeable debit cards, gift cards, or cryptocurrencies.

Using advanced AI-driven software, criminals can replicate the victim’s voice using sound bites from dramatic movie scenes, creating an uncanny and distressing fake scenario.

Insights into virtual kidnapping tactics

The general blueprint of a virtual kidnapping scheme typically consists of meticulously planned phases. Initially, perpetrators focus on identifying someone with the financial capability to pay a substantial ransom. In many cases, this person is closely related to the intended victim, amplifying the emotional impact of the situation. Once the person who’ll be virtually ‘abducted’ is selected, the criminals craft a compelling and emotionally charged narrative designed to overwhelm and confuse the individual who will be pressured into paying the ransom. And fear is a potent manipulator. When someone is terrified, rational thinking can be short-circuited, leading to impulsive actions without thorough contemplation. As technology continues to evolve, so does the toolbox of these cybercriminals. Modern-day cyber kidnappers often use sophisticated tools to make their schemes more believable. One method involves scouring the victim’s social media profiles for voice recordings. But in instances where higher levels of deception are necessary, deepfake technology comes into play. By utilising this advanced AI-driven software, criminals can replicate the victim’s voice using sound bites from dramatic movie scenes, creating an uncanny and distressing fake scenario.

And of course, timing is everything. The criminals keep a close eye on the direct victim’s online presence, waiting for the perfect window of opportunity. Their aim is to strike when the direct victim is away from the person who needs to pay the ransom, making rapid verification more complicated. As they make the pivotal ransom call, they often employ voice modulation software, altering their voice to make their demands appear even more harrowing. Amplifying this effect, they might play those distressing, manipulated voice recordings in the background, making their false narrative hard to question. Cybercriminals sometimes also use a tactic known as SIM jacking to gain control over the ‘kidnapped’ person’s phone number. This enables them to redirect all the calls and messages to their own device, leaving the actual owner of the number cut off from communication. In virtual kidnapping scenarios, this makes it seem like the ‘victim’ is truly unreachable, thereby increasing the chances that the worried party will pay the ransom. Finally, after successfully extorting the ransom, they turn their attention to covering their tracks. Funds are often sent through intricate laundering processes to obscure their origins. Digital footprints are meticulously scrubbed from the internet, and any physical tools, such as disposable phones, are promptly disposed of. This methodical process ensures they remain as elusive shadows, ready to strike again.

The rise of generative AI in voice impersonation cyberattacks

As technological advancements continue to expand, so do the tools available to cybercriminals. The latest wave of cyberattacks has seen the infiltration of generative AI to create voice cloning scams that are alarmingly convincing. Beyond the use of AI for voice replication, there’s a growing trend where attackers use generative AI to compensate for their skill limitations and to efficiently streamline what would otherwise be labour-intensive stages of their attacks. The massive data processing involved in pinpointing victims traditionally demands sifting through extensive datasets. However, with generative AI tools like ChatGPT, culprits can now integrate vast volumes of prospective victim data, incorporating not just audio and visual details but also additional markers like geolocation, facilitated through API connections.

Information is forwarded to ChatGPT and subsequently relayed to the intended recipient. When the attacker gets the recipient’s feedback, they can refine their message using ChatGPT for enhanced efficacy. Such analytical insights can be sourced from numerous public platforms and can be refined according to expected profit margins and payout probabilities. This allows the attacker to adopt a risk-assessment strategy when picking targets, potentially amplifying the success rate and profitability of these cyberattacks. In the future — or even now, provided there’s significant funding — cybercriminals could even turn gen-AI-generated texts into audio using text-to-speech technologies. This would enable the perpetrator as well as the virtual kidnap victim (a voice clone of a real individual) to exist entirely in the digital realm. If these audio files are then disseminated through large-scale calling services, the reach and efficacy of virtual kidnapping could greatly increase. Such advancements underscore the double-edged sword of technology, illustrating how innovative methods could either bolster or hinder cybercriminals specialising in virtual abductions.

Real-life cases of virtual kidnappings

In a harrowing incident from April this year, Jennifer DeStefano, a resident of Arizona, received a chilling call from an unidentified individual claiming to have abducted her 15-year-old daughter. The kidnapper demanded a hefty ransom of $1 million, threatening her with heinous acts, including drugging and raping the daughter, should she not comply. What was particularly distressing was the familiar sound of her daughter’s voice pleading, crying, and screaming in the background. Although the caller refused to let her speak directly with her daughter, after some back and forth, the demanded amount was lowered to $50,000. Fortunately, before any payment was made, DeStefano was able to ascertain her daughter’s safety and realised the kidnapping was fictitious. Law enforcement was promptly alerted and recognised this as an increasingly commonly used scam tactic.

Another real-life example of an unsuspecting person falling victim to a virtual kidnapping attempt is that of Larry Magid, a long-standing tech journalist and CEO of an online safety organisation. Despite his expertise in online and smartphone scams, Magid got involved in an ordeal that began when he received a call from a number that he thought was his wife’s. The voice of a woman crying, which he initially believed to be his wife’s, was followed by a man’s voice, posing first as a police officer, then revealing himself as a supposed drug cartel member. The scammer claimed to have kidnapped the journalist’s wife in San Francisco, the city she had travelled to that morning, and demanded a $5,000 ransom. While the scammer insisted on not involving anyone else, the journalist managed to silently dial 911 during the conversation, enabling the operator to listen to the call and contact the local police. The scammer attempted to manipulate him further, urging him to get into his car, likely as a tactic to initiate compliance. After more than ten harrowing minutes, the scammer hung up, sensing that Magid would not be persuaded. Police officers, who had been dispatched to the journalist’s home and to locate his wife in San Francisco, confirmed that it had been a virtual kidnapping attempt. Although Magid’s wife was safe, the incident had been deeply unsettling for both. Reflecting on what happened, the journalist found it troubling how convincing the caller was, with tactics mirroring those described on an FBI site dedicated to virtual kidnapping scams.

What will the future of cyber kidnapping look like?

In the near future, criminals may use advanced profiling technologies to identify vast lists of potential victims. This could allow scammers to automate their schemes, similar to business cold-calling practices. Within the dark web, these perpetrators can easily acquire sim-jacking tools, data breach credentials, and intermediary services for illicit transactions. Virtual kidnapping is basically an AI-powered scam that incorporates elements from harmless marketing strategies as well as harmful phishing attempts. This new breed of AI-weaponised, emotion-driven extortion tactic is undergoing changes reminiscent of the evolution patterns observed in ransomware attacks. Unlike traditional scams, virtual kidnapping requires the use of audio and visual content to manipulate victims, content that is typically not monitored or policed by security software.

But as networks grow more advanced and acquire a deeper understanding of data contexts in the coming years, security mechanisms might soon use various telemetry forms to identify and address these sophisticated data abuses. Systems that are ‘data-context-aware’ can essentially make decisions based on how data pieces relate to each other. For instance, a multi-tiered system that recognises identity patterns might be capable of discerning whether the supposed virtual kidnapping victim’s phone is being used in a normal way (as detected by the device’s built-in accelerometer) — something that would be improbable if he or she were genuinely kidnapped. Much like other extortion schemes, victims succumbing to the demands and paying ransom inadvertently encourages cybercriminals to continue launching attacks on other individuals. What’s more, the victim’s payment also leads to their details being added to a database of ‘lucrative targets’ — and this information gets sold to other criminals. It’s not hard to see how this leads to a perpetual cycle of cyber abuse.

With the rise in virtual kidnapping scams, traditional cybercrime ransom strategies are evolving, integrating more and more challenging-to-stop mediums like voice, video, and even the emerging realms of virtual worlds like the metaverse. These sophisticated methods of high-context communication are more complex than what standard network-based security measures can manage. As a result, there’s a growing need for security techniques that are aware of user identities. As these scams proliferate, they’ll generate more data. Analysing this data can enhance security measures, potentially leading to more advanced, identity-conscious security devices.

While the sophistication of these scams presents significant challenges, it also creates opportunities. As cybercriminals’ tactics and use of technology evolve, so too will security solutions, driven by data analytics and advanced recognition systems. To curtail the rise of virtual kidnappings, stakeholders, including tech companies, law enforcement, and cybersecurity experts, will need to collaborate to develop proactive, data-driven defences. Staying one step ahead in this digital cat-and-mouse game will be crucial in safeguarding users in this increasingly digitised world.

Cybercrime

Artificial Intelligence

Generic talks

Industry talks

The future of

Topic talks