Skip to main content
POST
/
generation
/
qwen
/
qwen-3-tts
/
text-to-speech
/
1.7b
Qwen 3 TTS 1.7B
curl --request POST \
  --url https://open.skills.video/api/v1/generation/qwen/qwen-3-tts/text-to-speech/1.7b \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "text": "<string>"
}
'
{
  "id": "gen_123",
  "status": "IN_QUEUE",
  "input": {
    "text": "Hello world.",
    "prompt": "Hello world.",
    "voice": "Vivian"
  },
  "usage": {
    "total": 20,
    "subscription": 20,
    "permanent": 0
  }
}
Text-to-speech with preset voices or cloned speaker embeddings and controllable decoding parameters.

Authorizations

Authorization
string
header
required

API Key for API endpoints

Body

text
string
required

The text to be converted to speech.

Example:

"I am solving the equation x = (-b +/- sqrt(b^2-4ac)) / 2a. Nobody can, it is a disaster and very sad."

prompt
string

Optional prompt to guide the style of generated speech. Ignored when speaker embedding is provided.

Example:

"Very happy."

voice
enum<string>

Preset voice for synthesis. Ignored when speaker embedding is provided.

Available options:
Vivian,
Serena,
Uncle_Fu,
Dylan,
Eric,
Ryan,
Aiden,
Ono_Anna,
Sohee
Example:

"Vivian"

language
enum<string>
default:Auto

The language of the voice.

Available options:
Auto,
English,
Chinese,
Spanish,
French,
German,
Italian,
Japanese,
Korean,
Portuguese,
Russian
Example:

"English"

speaker_voice_embedding_file_url
string<uri>

URL to a speaker embedding safetensors file from fal-ai/qwen-3-tts/clone-voice. If provided, cloned voice is used instead of predefined voices.

Example:

"https://storage.googleapis.com/falserverless/example_outputs/qwen3-tts/clone_out.safetensors"

reference_text
string

Optional reference text used when creating the speaker embedding. This can improve quality when using cloned voice.

Example:

"Okay. Yeah. I resent you. I love you. I respect you. But you know what? You blew it!"

top_k
integer
default:50

Top-k sampling parameter.

Required range: x >= 0
top_p
number
default:1

Top-p sampling parameter.

Required range: 0 <= x <= 1
temperature
number
default:0.9

Sampling temperature; higher means more random output.

Required range: 0 <= x <= 1
repetition_penalty
number
default:1.05

Penalty to reduce repeated tokens/codes.

Required range: x >= 0
subtalker_dosample
boolean
default:true

Sampling switch for the sub-talker.

subtalker_top_k
integer
default:50

Top-k sampling for the sub-talker.

Required range: x >= 0
subtalker_top_p
number
default:1

Top-p for sub-talker sampling.

Required range: 0 <= x <= 1
subtalker_temperature
number
default:0.9

Temperature for sub-talker sampling.

Required range: 0 <= x <= 1
max_new_tokens
integer
default:200

Maximum number of new codec tokens to generate.

Required range: 1 <= x <= 8192

Response

Generation status

id
string
required

Generation id.

status
enum<string>
required

Generation status

Available options:
starting,
processing,
succeeded,
failed,
canceled
usage
object
required

Credit usage breakdown for the request

input
Qwen 3 TTS 1.7B Input · object
required