AldeaAldea
ConceptsStreaming controls

Utterance detection

Detect end of speech utterances with utterance_end_ms parameter

Utterance detection identifies when a speaker has finished a turn by monitoring for a configurable period of silence. When the silence threshold is reached, the server sends an UtteranceEnd message.

Usage

Set the utterance_end_ms parameter to the silence duration (in milliseconds) that should trigger an utterance end:

wss://api.aldea.ai/v1/listen?utterance_end_ms=1000&encoding=mp3

This example fires an UtteranceEnd event after 1 second of silence.

UtteranceEnd message

When the silence threshold is reached, the server sends:

{
  "type": "UtteranceEnd",
  "channel": [0],
  "last_word_end": 2.5
}
FieldDescription
typeAlways "UtteranceEnd"
channelArray indicating which channel detected the utterance end
last_word_endTimestamp (in seconds) of the last detected word

Word timestamps must be enabled for UtteranceEnd events to fire.

Next steps