Skip to main content
POST
/
generation
/
minimax
/
speech-2.8-hd
MiniMax Speech 2.8 HD
curl --request POST \
  --url https://open.skills.video/api/v1/generation/minimax/speech-2.8-hd \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "prompt": "<string>"
}
'
{
  "id": "gen_123",
  "status": "IN_QUEUE",
  "input": {
    "prompt": "Hello world.",
    "voice_id": "Wise_Woman",
    "speed": 1
  },
  "usage": {
    "total": 20,
    "subscription": 20,
    "permanent": 0
  }
}
High-quality text-to-speech with multilingual voices and controllable voice/audio settings.

Authorizations

Authorization
string
header
required

API Key for API endpoints

Body

prompt
string
required

Text to convert to speech. Use <#x#> for pauses (x = 0.01-99.99 seconds). Supports interjection tags: (laughs), (sighs), (coughs), (clears throat), (gasps), (sniffs), (groans), (yawns).

Required string length: 1 - 10000
Example:

"Hello world! Welcome to MiniMax's new text to speech model <#0.1#> Speech 2.8 HD (laughs) now available on Fal!"

voice_id
enum<string>
default:Wise_Woman

Predefined voice ID or preset name.

Available options:
Wise_Woman,
Friendly_Person,
Inspirational_girl,
Deep_Voice_Man,
Calm_Woman,
Casual_Guy,
Lively_Girl,
Patient_Man,
Young_Knight,
Determined_Man,
Lovely_Girl,
Decent_Boy,
Imposing_Manner,
Elegant_Man,
Abbess,
Sweet_Girl_2,
Exuberant_Girl
Minimum string length: 1
Examples:

"Wise_Woman"

"Friendly_Person"

"Calm_Woman"

speed
number
default:1

Speech speed (0.5 - 2.0).

Required range: 0.5 <= x <= 2
vol
number
default:1

Volume (0.01 - 10).

Required range: 0.01 <= x <= 10
pitch
integer
default:0

Voice pitch (-12 to 12).

Required range: -12 <= x <= 12
emotion
enum<string>

Emotion style of generated speech.

Available options:
happy,
sad,
angry,
fearful,
disgusted,
surprised,
neutral
english_normalization
boolean
default:false

Enables English text normalization to improve number reading performance, with a slight increase in latency.

language_boost
enum<string>
default:auto

Enhance recognition of specified languages and dialects.

Available options:
auto,
Chinese,
Chinese,Yue,
English,
Arabic,
Russian,
Spanish,
French,
Portuguese,
German,
Turkish,
Dutch,
Ukrainian,
Vietnamese,
Indonesian,
Japanese,
Italian,
Korean,
Thai,
Polish,
Romanian,
Greek,
Czech,
Finnish,
Hindi,
Bulgarian,
Danish,
Hebrew,
Malay,
Slovak,
Swedish,
Croatian,
Hungarian,
Norwegian,
Slovenian,
Catalan,
Nynorsk,
Afrikaans
audio_sample_rate
enum<integer>
default:32000

Sample rate of generated audio.

Available options:
8000,
16000,
22050,
24000,
32000,
44100
audio_bitrate
enum<integer>
default:128000

Bitrate of generated audio.

Available options:
32000,
64000,
128000,
256000
audio_format
enum<string>
default:mp3

Audio format.

Available options:
mp3,
pcm,
flac
audio_channel
enum<integer>
default:1

Number of audio channels (1=mono, 2=stereo).

Available options:
1,
2
normalization_enabled
boolean
default:true

Enable loudness normalization for the audio.

target_loudness
number
default:-18

Target loudness in LUFS (default -18.0).

Required range: -70 <= x <= -10
target_range
number
default:8

Target loudness range in LU (default 8.0).

Required range: 0 <= x <= 20
target_peak
number
default:-0.5

Target peak level in dBTP (default -0.5).

Required range: -3 <= x <= 0
voice_modify_pitch
integer
default:0

Pitch adjustment in semitones. Range: -100 to 100. Positive values raise pitch, negative values lower it.

Required range: -100 <= x <= 100
voice_modify_intensity
integer
default:0

Intensity or energy of the voice. Range: -100 to 100. Higher values create more energetic speech.

Required range: -100 <= x <= 100
voice_modify_timbre
integer
default:0

Timbre adjustment. Range: -100 to 100. Affects the tonal quality of the voice.

Required range: -100 <= x <= 100
output_format
enum<string>
default:url

Format of the output content (non-streaming only).

Available options:
url,
hex

Response

Generation status

id
string
required

Generation id.

status
enum<string>
required

Generation status

Available options:
starting,
processing,
succeeded,
failed,
canceled
usage
object
required

Credit usage breakdown for the request

input
MiniMax Speech 2.8 HD Input · object
required