Facebook Twitter

Software speaking in stars’ voices

SHARE Software speaking in stars’ voices

NEW YORK — AT&T Labs will start selling speech software that it says is so good at reproducing the sounds, inflections and intonations of a human voice that it can re-create voices and even bring the voices of long-dead celebrities back to life.

The software, which turns printed text into synthesized speech, makes it possible for a company to use recordings of a person's voice to utter things that the person never said.

The software, called Natural Voices, is not flawless — its utterances still contain a few robotic tones and unnatural inflections. But some of those who have tested the technology say it is the first text-to-speech software to raise the specter of replicating a person's voice so perfectly that the human ear cannot tell the difference.

"If ABC wanted to use Regis Philbin's voice for all of its automated customer-service calls, it could," said Lawrence R. Rabiner, vice president for AT&T Labs Research.

Potential customers for the software, which is priced in the thousands of dollars, include telephone call centers, companies that make software that reads digital files aloud and makers of voice-activated devices.

Scientists say the technology is not yet good enough to perpetrate fraud.

To build the software that re-creates voices — which AT&T Labs is calling its "custom voice" product — a person must first go to a studio and record 10 hours to 40 hours of readings. The recordings are then chopped into fragments of sounds and sorted into databases. When the software processes a text, it retrieves the sounds and reassembles them to form new sentences.

In the case of long-dead celebrities, archival recordings could be used in the same way.

Others, like IBM Research and Lernout and Hauspie, are also experimenting with this technique. It is a big step up, engineers say, from the speech engines that were built from whole words that had been pre-recorded. And it is also a vast improvement, some say, from the entirely computer-generated and therefore robotic sounds that are used in many versions of text-to-speech software on the market today.

Now aided by the declining cost and increasing speed of microprocessors, far smoother sentences are possible, Rabiner said.

Analysts at McKinsey & Co. have predicted that the market for text-to-speech software will reach more than $1 billion in the next five years.

AT&T Labs' speech technology will be the first product that is actually sold by the laboratory, which is typically a research and development division.