API Key for API endpoints
Text to convert to speech. Use <#x#> for pauses (x = 0.01-99.99 seconds). Supports interjection tags: (laughs), (sighs), (coughs), (clears throat), (gasps), (sniffs), (groans), (yawns).
1 - 10000"Hello world! Welcome to MiniMax's new text to speech model <#0.1#> Speech 2.8 HD (laughs) now available on Fal!"
Predefined voice ID or preset name.
Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl 1"Wise_Woman"
"Friendly_Person"
"Calm_Woman"
Speech speed (0.5 - 2.0).
0.5 <= x <= 2Volume (0.01 - 10).
0.01 <= x <= 10Voice pitch (-12 to 12).
-12 <= x <= 12Emotion style of generated speech.
happy, sad, angry, fearful, disgusted, surprised, neutral Enables English text normalization to improve number reading performance, with a slight increase in latency.
Enhance recognition of specified languages and dialects.
auto, Chinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Slovak, Swedish, Croatian, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Afrikaans Sample rate of generated audio.
8000, 16000, 22050, 24000, 32000, 44100 Bitrate of generated audio.
32000, 64000, 128000, 256000 Audio format.
mp3, pcm, flac Number of audio channels (1=mono, 2=stereo).
1, 2 Enable loudness normalization for the audio.
Target loudness in LUFS (default -18.0).
-70 <= x <= -10Target loudness range in LU (default 8.0).
0 <= x <= 20Target peak level in dBTP (default -0.5).
-3 <= x <= 0Pitch adjustment in semitones. Range: -100 to 100. Positive values raise pitch, negative values lower it.
-100 <= x <= 100Intensity or energy of the voice. Range: -100 to 100. Higher values create more energetic speech.
-100 <= x <= 100Timbre adjustment. Range: -100 to 100. Affects the tonal quality of the voice.
-100 <= x <= 100Format of the output content (non-streaming only).
url, hex