Amazon Polly has announced the general availability of its generative engine with three new voices: Ruth and Matthew in American English, and Amy in British English. The generative engine, trained with a variety of voices, languages, and styles, renders context-dependent prosody, pausing, spelling, dialectal properties, and foreign word pronunciation with high precision.
Amazon Polly is a machine learning service that converts text to lifelike speech, also known as text-to-speech (TTS) technology. It now includes high-quality, natural-sounding human-like voices in dozens of languages. With Amazon Polly, users can select various voice options, including neural, long-form, and generative voices, which deliver improvements in speech quality and produce highly expressive, emotionally adept voices.
The generative engine is the latest addition to Amazon Polly's voice engines, which also include standard TTS voices, Neural TTS (NTTS) voices, and long-form voices. The generative voices are created using a new research TTS model called Big Adaptive Streamable TTS with Emergent abilities (BASE), introduced in February 2024.
The new generative voices can be accessed using the AWS Management Console, AWS Command Line Interface (AWS CLI), or the AWS SDKs. Users can input their text and listen to or download the generated voice output. The generative voices are available in the US East (N. Virginia) Region, and users only pay for what they use based on the number of characters of text that they convert to speech.