Apple Investigates New Technology to Stream Synthesized Speech Over the Net

Cupertino, CA. With uncompressed audio over the net weighing in at upwards of 500 kbits/second, and the most heavily compressed speech files moving at around 8 kbits/second, Apple Computer figures there’s room for a new bandwidth-saving approach to moving speech across the web.

Apple engineers think the answer may be SPIDI (Streamed Phoneme Intonation Description Interchange), which couples a MIDI-like data description protocol with Apple’s speech synthesis software technology. Their goal is to provide natural, human-like speech at a very low bandwidth for interactive multimedia distribution.

Rather than encoding audio data in terms of digital samples, SPIDI represents speech as a sequence of basic sound units, or phonemes, combined with information about the rhythm and inflection of these sounds. The result is a form of synthetic speech which captures the emotion and emphasis used by human speakers, but which takes up only about 0.8 K bits per second of transmission bandwidth.

Creation of data in the SPIDI format requires a special authoring tool which uses speech recognition to extract vocal inflections from recorded human speech. SPIDI’s inflected data is played back using Apple’s existing synthetic voices (PlainTalk system software).

The SPIDI team is developing cross-platform player components that will be integrated into the QuickTime Media Layer (QTML) architecture. If they get the job done, it will mean that spoken-word audio can be delivered over the Internet without disruption even in low-bandwidth settings. The technology is also said to address the media authoring problem of maintaining lip synch.

Speak Your Mind