Exploring Cartesia AI: The Future of Text-to-Speech Technology

Today, we’re diving into the world of text-to-speech (TTS) technology by exploring one of the best AI voice generators available today: Cartesia AI.

What is Cartesia AI?

Cartesia AI is an impressive TTS system that is both fast and highly versatile. This new company has developed a powerful TTS system, and I’m here to show you how to use their API in your own projects. Additionally, Cartesia AI’s system can clone voices, but we’ll talk about that in upcoming article.

Exploring the Voice Library

Cartesia AI offers a variety of voices, including a 1920s radio man, an anime girl, and an announcer man.

1920s Radio Man: This voice has a nostalgic feel, reminiscent of old-time radio broadcasts.
Anime Girl: Perfect for those high-pitched, energetic characters.
Confident British Man: A voice that exudes confidence and clarity.

These are just a few examples of the voices available. You can observe the speed and quality of each voice by visiting the official page of Cartesia AI.

How to Use Cartesia AI’s TTS System

Let’s dive into how you can use Cartesia AI’s TTS system. The process is straightforward and user-friendly.

Writing the Prompt:

Enter the text you want to convert to speech.
Select the voice you want to use from the library.

Voice Cloning:

Add a clone of a voice, which we’ll discuss in our next article & video.

Settings:

Adjust the speed and add emotions like anger, curiosity, positivity, surprise, and sadness.

For example, if I write, “Hello, I am Simplify AI,” and select the announcer man voice, I can adjust the settings to increase positivity. Then, I click ‘Speak’ to hear the result. The response is almost instant, taking just 117 milliseconds. You can also download the voice by clicking ‘Download’ button.

Implementing Cartesia AI’s API

The main aim of today’s article & video is to demostrate how you can implement Cartesia AI’s API into your projects.

Get the API Key:

Go to the Cartesia website and click on the ‘Docs’ option in the menu bar to get all the related documents.
Create your API key from the console.

Voice ID:

Select any voice from the voice library to get the voice ID.

Coding:

Install the required packages: cartesia and pyaudio.
Copy the example code provided and paste it into your IDE (e.g., Visual Studio Code).
Replace the placeholders with your API key and voice ID.

Here’s a simplified version of the code:

from cartesia import Cartesia
import pyaudio
import os

client = Cartesia(api_key=os.environ.get("YOUR_API_KEY"))
voice_name = "Barbershop Man"
voice_id = 'YOUR_VOICE_ID'
voice = client.voices.get(id=voice_id)

transcript = "hello i am simplify AI"
model_id = "sonic-english"
output_format = {
    "container": "raw",
    "encoding": "pcm_f32le",
    "sample_rate": 44100,
}

p = pyaudio.PyAudio()
rate = 44100
stream = None

for output in client.tts.sse(
    model_id=model_id,
    transcript=transcript,
    voice_embedding=voice["embedding"],
    stream=True,
    output_format=output_format,
):
    buffer = output["audio"]
    if not stream:
        stream = p.open(format=pyaudio.paFloat32, channels=1, rate=rate, output=True)
    stream.write(buffer)

stream.stop_stream()
stream.close()
p.terminate()

Running the Code

To run this code, follow these steps:

Create a Virtual Environment:

Use the command: conda create -n your_env_name python=3.10

Install Dependencies:

Install cartesia and pyaudio.

Execute the Code:

Type python your_file_name.py in your terminal.

Within a second or two, you should hear the AI-generated voice, confirming successful integration into your project.

That’s all about Cartesia AI and how you can implement it into your personal projects. I hope you found this explanation helpful! Stay tuned for more articles & videos where we explore advanced features like voice cloning and more. Happy coding, and see you next time!