The Terrifying A.I. Scam That Uses Your Loved One’s Voice

A Brooklyn couple got a call from relatives who were being held ransom. Their voices—like many others these days—had been cloned.
Image may contain Pattern Light and Accessories
Illustration by Ariel Davis

On a recent night, a woman named Robin was asleep next to her husband, Steve, in their Brooklyn home, when her phone buzzed on the bedside table. Robin is in her mid-thirties with long, dirty-blond hair. She works as an interior designer, specializing in luxury homes. The couple had gone out to a natural-wine bar in Cobble Hill that evening, and had come home a few hours earlier and gone to bed. Their two young children were asleep in bedrooms down the hall. “I’m always, like, kind of one ear awake,” Robin told me, recently. When her phone rang, she opened her eyes and looked at the caller I.D. It was her mother-in-law, Mona, who never called after midnight. “I’m, like, maybe it’s a butt-dial,” Robin said. “So I ignore it, and I try to roll over and go back to bed. But then I see it pop up again.”

She picked up the phone, and, on the other end, she heard Mona’s voice wailing and repeating the words “I can’t do it, I can’t do it.” “I thought she was trying to tell me that some horrible tragic thing had happened,” Robin told me. Mona and her husband, Bob, are in their seventies. She’s a retired party planner, and he’s a dentist. They spend the warm months in Bethesda, Maryland, and winters in Boca Raton, where they play pickleball and canasta. Robin’s first thought was that there had been an accident. Robin’s parents also winter in Florida, and she pictured the four of them in a car wreck. “Your brain does weird things in the middle of the night,” she said. Robin then heard what sounded like Bob’s voice on the phone. (The family members requested that their names be changed to protect their privacy.) “Mona, pass me the phone,” Bob’s voice said, then, “Get Steve. Get Steve.” Robin took this—that they didn’t want to tell her while she was alone—as another sign of their seriousness. She shook Steve awake. “I think it’s your mom,” she told him. “I think she’s telling me something terrible happened.”

Steve, who has close-cropped hair and an athletic build, works in law enforcement. When he opened his eyes, he found Robin in a state of panic. “She was screaming,” he recalled. “I thought her whole family was dead.” When he took the phone, he heard a relaxed male voice—possibly Southern—on the other end of the line. “You’re not gonna call the police,” the man said. “You’re not gonna tell anybody. I’ve got a gun to your mom’s head, and I’m gonna blow her brains out if you don’t do exactly what I say.”

Steve used his own phone to call a colleague with experience in hostage negotiations. The colleague was muted, so that he could hear the call but wouldn’t be heard. “You hear this???” Steve texted him. “What should I do?” The colleague wrote back, “Taking notes. Keep talking.” The idea, Steve said, was to continue the conversation, delaying violence and trying to learn any useful information.

“I want to hear her voice,” Steve said to the man on the phone.

The man refused. “If you ask me that again, I’m gonna kill her,” he said. “Are you fucking crazy?”

“O.K.,” Steve said. “What do you want?”

The man demanded money for travel; he wanted five hundred dollars, sent through Venmo. “It was such an insanely small amount of money for a human being,” Steve recalled. “But also: I’m obviously gonna pay this.” Robin, listening in, reasoned that someone had broken into Steve’s parents’ home to hold them up for a little cash. On the phone, the man gave Steve a Venmo account to send the money to. It didn’t work, so he tried a few more, and eventually found one that did. The app asked what the transaction was for.

“Put in a pizza emoji,” the man said.

After Steve sent the five hundred dollars, the man patched in a female voice—a girlfriend, it seemed—who said that the money had come through, but that it wasn’t enough. Steve asked if his mother would be released, and the man got upset that he was bringing this up with the woman listening. “Whoa, whoa, whoa,” he said. “Baby, I’ll call you later.” The implication, to Steve, was that the woman didn’t know about the hostage situation. “That made it even more real,” Steve told me. The man then asked for an additional two hundred and fifty dollars to get a ticket for his girlfriend. “I’ve gotta get my baby mama down here to me,” he said. Steve sent the additional sum, and, when it processed, the man hung up.

By this time, about twenty-five minutes had elapsed. Robin cried and Steve spoke to his colleague. “You guys did great,” the colleague said. He told them to call Bob, since Mona’s phone was clearly compromised, to make sure that he and Mona were now safe. After a few tries, Bob picked up the phone and handed it to Mona. “Are you at home?” Steve and Robin asked her. “Are you O.K.?”

Mona sounded fine, but she was unsure of what they were talking about. “Yeah, I’m in bed,” she replied. “Why?”

Artificial intelligence is revolutionizing seemingly every aspect of our lives: medical diagnosis, weather forecasting, space exploration, and even mundane tasks like writing e-mails and searching the Internet. But with increased efficiencies and computational accuracy has come a Pandora’s box of trouble. Deepfake video content is proliferating across the Internet. The month after Russia invaded Ukraine, a video surfaced on social media in which Ukraine’s President, Volodymyr Zelensky, appeared to tell his troops to surrender. (He had not done so.) In early February of this year, Hong Kong police announced that a finance worker had been tricked into paying out twenty-five million dollars after taking part in a video conference with who he thought were members of his firm’s senior staff. (They were not.) Thanks to large language models like ChatGPT, phishing e-mails have grown increasingly sophisticated, too. Steve and Robin, meanwhile, fell victim to another new scam, which uses A.I. to replicate a loved one’s voice. “We’ve now passed through the uncanny valley,” Hany Farid, who studies generative A.I. and manipulated media at the University of California, Berkeley, told me. “I can now clone the voice of just about anybody and get them to say just about anything. And what you think would happen is exactly what’s happening.”

Robots aping human voices are not new, of course. In 1984, an Apple computer became one of the first that could read a text file in a tinny robotic voice of its own. “Hello, I’m Macintosh,” a squat machine announced to a live audience, at an unveiling with Steve Jobs. “It sure is great to get out of that bag.” The computer took potshots at Apple’s main competitor at the time, saying, “I’d like to share with you a maxim I thought of the first time I met an I.B.M. mainframe: never trust a computer you can’t lift.” In 2011, Apple released Siri; inspired by “Star Trek” ’s talking computers, the program could interpret precise commands—“Play Steely Dan,” say, or, “Call Mom”—and respond with a limited vocabulary. Three years later, Amazon released Alexa. Synthesized voices were cohabiting with us.

Still, until a few years ago, advances in synthetic voices had plateaued. They weren’t entirely convincing. “If I’m trying to create a better version of Siri or G.P.S., what I care about is naturalness,” Farid explained. “Does this sound like a human being and not like this creepy half-human, half-robot thing?” Replicating a specific voice is even harder. “Not only do I have to sound human,” Farid went on. “I have to sound like you.” In recent years, though, the problem began to benefit from more money, more data—importantly, troves of voice recordings online—and breakthroughs in the underlying software used for generating speech. In 2019, this bore fruit: a Toronto-based A.I. company called Dessa cloned the podcaster Joe Rogan’s voice. (Rogan responded with “awe” and acceptance on Instagram, at the time, adding, “The future is gonna be really fucking weird, kids.”) But Dessa needed a lot of money and hundreds of hours of Rogan’s very available voice to make their product. Their success was a one-off.

In 2022, though, a New York-based company called ElevenLabs unveiled a service that produced impressive clones of virtually any voice quickly; breathing sounds had been incorporated, and more than two dozen languages could be cloned. ElevenLabs’s technology is now widely available. “You can just navigate to an app, pay five dollars a month, feed it forty-five seconds of someone’s voice, and then clone that voice,” Farid told me. The company is now valued at more than a billion dollars, and the rest of Big Tech is chasing closely behind. The designers of Microsoft’s Vall-E cloning program, which débuted last year, used sixty thousand hours of English-language audiobook narration from more than seven thousand speakers. Vall-E, which is not available to the public, can reportedly replicate the voice and “acoustic environment” of a speaker with just a three-second sample.

Voice-cloning technology has undoubtedly improved some lives. The Voice Keeper is among a handful of companies that are now “banking” the voices of those suffering from voice-depriving diseases like A.L.S., Parkinson’s, and throat cancer, so that, later, they can continue speaking with their own voice through text-to-speech software. A South Korean company recently launched what it describes as the first “AI memorial service,” which allows people to “live in the cloud” after their deaths and “speak” to future generations. The company suggests that this can “alleviate the pain of the death of your loved ones.” The technology has other legal, if less altruistic, applications. Celebrities can use voice-cloning programs to “loan” their voices to record advertisements and other content: the College Football Hall of Famer Keith Byars, for example, recently let a chicken chain in Ohio use a clone of his voice to take orders. The film industry has also benefitted. Actors in films can now “speak” other languages—English, say, when a foreign movie is released in the U.S. “That means no more subtitles, and no more dubbing,” Farid said. “Everybody can speak whatever language you want.” Multiple publications, including The New Yorker, use ElevenLabs to offer audio narrations of stories. Last year, New York’s mayor, Eric Adams, sent out A.I.-enabled robocalls in Mandarin and Yiddish—languages he does not speak. (Privacy advocates called this a “creepy vanity project.”)

But, more often, the technology seems to be used for nefarious purposes, like fraud. This has become easier now that TikTok, YouTube, and Instagram store endless videos of regular people talking. “It’s simple,” Farid explained. “You take thirty or sixty seconds of a kid’s voice and log in to ElevenLabs, and pretty soon Grandma’s getting a call in Grandson’s voice saying, ‘Grandma, I’m in trouble, I’ve been in an accident.’ ” A financial request is almost always the end game. Farid went on, “And here’s the thing: the bad guy can fail ninety-nine per cent of the time, and they will still become very, very rich. It’s a numbers game.” The prevalence of these illegal efforts is difficult to measure, but, anecdotally, they’ve been on the rise for a few years. In 2020, a corporate attorney in Philadelphia took a call from what he thought was his son, who said he had been injured in a car wreck involving a pregnant woman and needed nine thousand dollars to post bail. (He found out it was a scam when his daughter-in-law called his son’s office, where he was safely at work.) In January, voters in New Hampshire received a robocall call from Joe Biden’s voice telling them not to vote in the primary. (The man who admitted to generating the call said that he had used ElevenLabs software.) “I didn’t think about it at the time that it wasn’t his real voice,” an elderly Democrat in New Hampshire told the Associated Press. “That’s how convincing it was.”

Predictably, technology has outstripped regulation. Current copyright laws don’t protect a person’s voice. “A key question is whether authentication tools can keep up with advances in deepfake synthesis,” Senator Jon Ossoff, of Georgia, who chaired a Senate Judiciary Committee hearing on the matter last year, told me. “Can we get good enough fast enough at discerning real from fake, or will we lose the ability to verify the authenticity of voices, images, video, and other media?” He described the matter as an “urgent” one for lawmakers. In January, a bipartisan group introduced the QUIET Act, which would increase penalties for those who use A.I. to impersonate people. In Arizona, a state senator introduced a bill that would designate A.I. as a weapon when used in conjunction with a crime, also allowing lengthier sentences.

The Federal Trade Commission, which investigates consumer fraud, reported that Americans lost more than two million dollars to impostor scams of various kinds in 2022. Last year, the F.T.C. put out a voice-cloning advisory, noting, “If the caller says to wire money, send cryptocurrency, or buy gift cards and give them the card numbers and PINs, those could be signs of a scam.” But it, too, has not yet created any guidelines for the use of voice-cloning technology. Even if laws are enacted, policing them will be exceedingly difficult. Scammers can use encrypted apps to execute their schemes, and calls are completed in minutes. “By the time you get there, the scam is over, and everybody’s moved on,” Farid said.

A decade ago, the F.T.C. sponsored a competition to counter the rise of robocalls, and one of its winners went on to create Nomorobo, a call-blocking service that has helped to reduce—but not eliminate—the phenomenon. Late last year, the commission offered a twenty-five-thousand-dollar prize for the development of new ways to protect consumers from voice cloning. It received around seventy-five submissions, which focus on prevention, authentication, and real-time detection. Some of the submissions use artificial intelligence, while others rely on metadata or watermarking. (Judging will be completed by April.) Will Maxson, who is managing the F.T.C.’s challenge, told me, “We’re hoping we’ll spur some innovators to come up with products and services that will help reduce this new threat.” But it’s not at all clear how effective they will be. “There are no silver bullets,” he acknowledged.

A few months ago, Farid, the Berkeley professor, participated in a Zoom call with Barack Obama. The former President was interested, he said, in learning about generative A.I. During the Zoom call, Farid found himself in an increasingly familiar online state of mind: doubt. “People have spent so much time trying to make deepfakes with Obama that I spent, like, the first ten minutes being, like, I don’t know, man, I don’t think this is him,” he said, laughing. In the end, he determined that it was the real Obama. Still, the experience was unnerving. “Shit’s getting weird,” he said.

One Friday last January, Jennifer DeStefano, who lives in Scottsdale, Arizona, got a call while walking into a dance studio where the younger of her two teen-age daughters, Aubrey, had just wrapped up a rehearsal. The caller I.D. read “unknown,” so DeStefano ignored it at first. Then she reconsidered: Brianna, her older daughter, was on a ski trip up north, and, DeStefano thought, maybe something had happened. She took the call on speaker phone. “Mom, I messed up!” Briana’s voice said, sobbing in her uniquely controlled way. A man with a Spanish accent could be heard telling her, “Lay down and put your head back.” Then Briana said, “Mom, these bad men have me. Help me, help me, help me.” One of the men took the phone, as Briana sobbed and pleaded in the background. “I have your daughter,” he said. “If you seek any help from anyone, I’ll pump her stomach so full of drugs.” He’d have his way with her, he continued, and then he’d leave her for dead.

DeStefano ran into the dance studio and screamed for help. Three other mothers responded: one called 911, one called DeStefano’s husband, and one sat with DeStefano while she talked on the phone. First, the man demanded a million dollars, but DeStefano said that wasn’t possible, so he lowered the sum to fifty thousand. As they discussed how to get the money to him, the mother who’d called 911 came back inside and said that she’d learned that the call might be a scam. DeStefano, who considers herself “pretty savvy,” was unconvinced. “I talked to her,” DeStefano replied. She continued speaking to the man, who decided that he wanted to arrange a physical pickup of the money: a white van would meet DeStefano somewhere, and someone would put a bag over her head, and bring her to him. She recalled, “He said I had better have all the cash, or else we were both dead.”

Soon, though, the second mother hurried over. She had located DeStefano’s husband, who confirmed that he was with Briana. DeStefano eventually got ahold of her older daughter. “I have no idea what’s going on, or what you’re talking about,” Briana told her. “I’m with Dad.” Eventually, DeStefano returned to her phone call. “I called the guys out for being the lowest of the low,” DeStefano said. “I used vulgar words. Then I just hung up.”

DeStefano went public with her experience, eventually testifying about it before the Senate Judiciary Committee. Other victims reached out. Another mother at the dance studio had a cousin who’d been scammed just two weeks earlier. “The call came in from her daughter’s phone, and she actually sent fifteen hundred dollars,” DeStefano said. She told me that a friend had received a call from what sounded like her nine-year-old son: “He’d been kidnapped, he said. But she’d just tucked him in bed after reading a story, so she knew it wasn’t true.”

RaeLee Jorgensen, a thirty-four-year-old teacher’s aide, contacted DeStefano. Last April, while waiting for her two youngest children to get out of school, she received a phone call from her oldest son’s number. “Hey, Mom,” her fourteen-year-old son’s voice said. “This is Tate.” He was using his family nickname. “And it was his voice,” Jorgensen told me. “But I could tell something was wrong. I asked what it was.” Then another voice said, “I have your son and I’m going to shoot him in the head.” Jorgensen panicked and she hung up. Ten minutes later, she received confirmation from Tate’s school that her son was safe, and now sitting in the principal’s office. Even DeStefano’s mother received a scam call. Months before DeStefano’s ordeal, someone had called her mother claiming to be DeStefano’s brother, and asking for money to pay a hospital bill related to a car accident. But DeStefano’s mother could sense that something was off. “She’s hard of hearing, but she’s still sharp,” DeStefano said. “She hung up.”

Robin and Steve, in Brooklyn, eventually got their money back from Venmo. Today, they’re able to joke about some aspects of the ordeal: the pizza-emoji instruction, for example. “But we told everyone we knew to be aware of this very sophisticated thing,” Robin said. The family has created a plan for the next time. “It doesn’t seem like this scam is going to stop anytime soon,” Robin told me. “So we came up with an extended-family password. If one of us is in trouble, others can verify that it’s really them.” When I recently called up Mona, her mother-in-law, though, she confessed that she’d already forgotten the family password: “I’m going to have to go over it.” She added that it took her a while to accept one aspect of the call. “Seven hundred and fifty dollars,” she said. “I still can’t believe that’s all I was worth.” ♦