Voices

You can generate audio utterances from one of your custom voices.

You can either create a voice by getting audio recordings of the person whose voice you want to clone and uploading these recordings to our servers, or you can contact us so we can help you with the recording process. Check the recordings section for more information.

Voice models

In both cases, the voice that you will have access to will be created from a model. A model represents a procedure to create a voice. Different models are available, with different trade-offs: some models have a very high audio quality but are slow to generate, while other models have are very fast to generate but have slightly lower quality. You should contact us so that we can help you choose the best model for your use cases.

Listing your voices

You can get a list of all the digital voices you have access to:

CURL
# Request #
curl -H 'Authorization: Bearer API_KEY' 'https://custom.lyrebird.ai/api/v0/voices?offset=0&limit=10'
# Response (simplified) #
{
  "results": [
    {
      "id": "VOICE_ID",
      "description": "A male voice that is very fast to generate",
      "model_id": "MODEL_ID",
      "name": "Realtime Male Voice",
    }
  ]
}

You can also get one by its voice id (returned in the previous request):

CURL
# Request #
curl -H 'Authorization: Bearer API_KEY' 'https://custom.lyrebird.ai/api/v0/voices/VOICE_ID'

Generating audio from your voice

Once you have picked a voice from your voice list, you can use it to generate audio. There are two ways to generate audio from your voice: they both produce the same audio but differ in the way you access this audio after it is generated:

  • You can generate audio synchronously, in which case the audio will be generated as soon as your request is sent, and will be returned directly (you can also access it later). This is appropriate for real-time usages where you need to generate the audio utterance of a small sentence as fast as possible.
  • You can also generate audio asynchronously, which lets you generate the audio of potentially very large amounts of text. Generation happens in the background and resulting utterances are added to your utterances list when they are ready. This is appropriate for generating audio for large texts or when generation latency is not a concern.

Generating audio synchronously

To generate audio synchronously, you must send a text of no more than 250 characters (for longer sentences, generate audio asynchronously instead). You can also specify some metadata as described in the metadata section.

The VOICE_ID is the id field of the voice you got when listing them as described above.

CURL
# Request #
curl -H 'Authorization: Bearer API_KEY' 'https://custom.lyrebird.ai/api/v0/voices/VOICE_ID/generate_sync' -X POST -d
'{
    "text": "I am very angry.",
    "metadata": {
        "emotion": "angry",
        "session": "march22"
    }
}'

The request will block until the generation ends, and return the WAVE audio file of the utterance in the HTTP response body. The response will also include an 'X-Duration' header for the duration of the generated audio in seconds, and an X-Utteranceid header which is the id of the utterance you can use to download it again later.

Generating audio asynchronously

To generate audio asynchronously, you must send a list of paragraphs to generate.

There is no limit on the length of text that can be sent, but long paragraphs (typically more than 1000 characters) will be internally split, generated independently and joined back. We try to split them at the best places to avoid any noticeable audio change when joining them back, but if you want precise control over where your paragraphs are split, you should simply split them yourself and send a list of small paragraphs rather than one large paragraph. For example, in the case of the generation of an audio book, we typically recommend sending a paragraph for each paragraph of the audio book.

You can also specify some metadata (as described in the metadata section), for each paragraph, so they they can have independent metadata.

The VOICE_ID is the id field of the voice you got when listing them as described above.

CURL
# Request #
curl -H 'Authorization: Bearer API_KEY' 'https://custom.lyrebird.ai/api/v0/voices/VOICE_ID/generate_async' -X POST -d
'[
    {
      "text": "This is the first paragraph, corresponding to the first paragraph of an audio book I want to translate.",
      "metadata": {
          "emotion": "calm"
    },
    {
      "text": "This is the second paragraph, in which I am surprised by some event. My voice will sound the same as the recordings I tagged as being surprised.",
      "metadata": {
          "emotion": "surprised"
    },    
]'
# Response #
{
  "async_job_id": "ASYNC_JOB_ID"
}

The request will queue the audio generation of these paragraphs and return immediately with an asynchronous job id, which you can use later to track the progress of the audio generation..

If you need real-time audio generation of short sentences, use the synchronous generation endpoint instead.

After you get the asynchronous job id, you can get the status of the generations:

CURL
curl -H 'Authorization: Bearer API_KEY' 'https://custom.lyrebird.ai/api/v0/async_jobs/ASYNC_JOB_ID'
# Response (simplified) #
{
  "created_at": "2018-05-29T20:56:43Z",
  "id": "ASYNC_JOB_ID",
  "status": "processing",
}

The returned object contains the status of the asynchronous job:

  • processing: the audio utterances are being generated (some of them could already have been generated and added to your utterances);
  • done: all the audio utterances have been generated and added to your utterances;
  • error: one or more audio utterances couldn't be generated. This shouldn't happen, contact us if you get this multiple times.

You can then download the generated utterances as described below.

Getting and downloading utterances

After you have generated utterances either synchronously or asynchronously, you can get and download them.

You can list all your utterances, and optionally specify an asynchronous job id to filter by, in case you want to see only the utterances corresponding to a specific asynchronous audio generation request.

CURL
# Request #
curl -H 'Authorization: Bearer API_KEY' 'https://custom.lyrebird.ai/api/v0/utterances?async_job_id=ASYNC_JOB_ID'
# Response (simplified) #
{
  "duration": 0.562,
  "id": "UTTERANCE_ID",
  "index_in_batch": 0,
  "metadata": {
    "emotion": "calm",
  },
  "model_id": "MODEL_ID",
  "text": "I'm very relaxed and calm.",
  "voice_id": "VOICE_ID"
}

The utterances do not have a direct link to the audio file, but they have an utterance id which you can use to download the audio utterance, along with its audio duration and other information.

You can also get a single utterance by its id and delete it.

Once you have an utterance id, you can use it to download its audio utterance:

CURL
# Request #
curl -H 'Authorization: Bearer API_KEY' 'https://custom.lyrebird.ai/api/v0/utterances/UTTERANCE_ID/download'

This will redirect you to a temporary link for the WAVE audio file of the utterance. Make sure you enable following redirects for this HTTP request.

results matching ""

    No results matching ""