ConceptsStreaming controls
Utterance detection
Detect end of speech utterances with utterance_end_ms parameter
Utterance detection identifies when a speaker has finished a turn by monitoring for a configurable period of silence. When the silence threshold is reached, the server sends an UtteranceEnd message.
Usage
Set the utterance_end_ms parameter to the silence duration (in milliseconds) that should trigger an utterance end:
wss://api.aldea.ai/v1/listen?utterance_end_ms=1000&encoding=mp3This example fires an UtteranceEnd event after 1 second of silence.
UtteranceEnd message
When the silence threshold is reached, the server sends:
{
"type": "UtteranceEnd",
"channel": [0],
"last_word_end": 2.5
}| Field | Description |
|---|---|
type | Always "UtteranceEnd" |
channel | Array indicating which channel detected the utterance end |
last_word_end | Timestamp (in seconds) of the last detected word |
Word timestamps must be enabled for UtteranceEnd events to fire.
Next steps
- Endpointing to fine-tune sentence finalization timing
- VAD events to detect when speech starts
- WebSocket protocol