AldeaAldea
Get startedQuickstart

Real-time streaming

Stream audio and get transcripts in real-time over WebSocket

Real-time streaming

In this quickstart, you create a WebSocket connection to the Aldea Speech-to-Text API and receive transcripts in real-time as audio streams in. At the end, you'll have a working client that sends audio and returns the transcripts.

The API exposes a WebSocket endpoint at wss://api.aldea.ai/v1/listen. You send binary audio frames and receive JSON messages containing interim and final transcripts.

Prerequisites

  • An Aldea API key. Sign up and generate one from the API Keys page.

Step 1: Connect and stream

Pick your language and run the example. Each one connects to the WebSocket endpoint, streams audio, and prints transcripts as they arrive.

Install the websockets library:

pip install websockets

Stream audio from a file and print transcripts as they arrive:

stream.py
import asyncio
import json
import websockets

API_KEY = "YOUR_ALDEA_API_KEY"
AUDIO_FILE = "audio.wav"
WS_URL = "wss://api.aldea.ai/v1/listen?encoding=linear16&sample_rate=16000"

async def stream():
    headers = {"Authorization": f"Bearer {API_KEY}"}

    async with websockets.connect(WS_URL, additional_headers=headers) as ws:

        async def send_audio():
            with open(AUDIO_FILE, "rb") as f:
                while chunk := f.read(4096):
                    await ws.send(chunk)
            await ws.send(json.dumps({"type": "CloseStream"}))

        async def receive_transcripts():
            async for message in ws:
                data = json.loads(message)
                if data.get("type") == "Results" and data.get("is_final"):
                    transcript = data["channel"]["alternatives"][0]["transcript"]
                    if transcript:
                        print(transcript)

        await asyncio.gather(send_audio(), receive_transcripts())

asyncio.run(stream())

Run it:

python stream.py

Replace YOUR_ALDEA_API_KEY with your API key (starts with org_).

This example streams a pre-recorded file over WebSocket for simplicity. In production, you'd stream audio from a microphone or live source. The API behaves the same way: send binary audio frames, receive JSON transcripts.

Install the Deepgram SDK (Aldea is fully compatible):

npm install @deepgram/sdk

Stream audio and print transcripts:

stream.mjs
import { createClient, LiveTranscriptionEvents } from "@deepgram/sdk";
import { createReadStream } from "fs";

const API_KEY = "YOUR_ALDEA_API_KEY";
const AUDIO_FILE = "audio.wav";

const client = createClient({
  key: API_KEY,
  global: { url: "https://api.aldea.ai" },
});

const connection = client.listen.live({
  encoding: "linear16",
  sample_rate: 16000,
  interim_results: true,
});

connection.on(LiveTranscriptionEvents.Open, () => {
  console.log("Connected. Streaming audio...");

  const stream = createReadStream(AUDIO_FILE, { highWaterMark: 4096 });
  stream.on("data", (chunk) => connection.send(chunk));
  stream.on("end", () => {
    setTimeout(() => connection.finish(), 1000);
  });
});

connection.on(LiveTranscriptionEvents.Transcript, (data) => {
  const transcript = data?.channel?.alternatives?.[0]?.transcript;
  if (transcript && data.is_final) {
    console.log(transcript);
  }
});

connection.on(LiveTranscriptionEvents.Error, (err) => {
  console.error("Error:", err.message);
});

Run it:

node stream.mjs

Replace YOUR_ALDEA_API_KEY with your API key (starts with org_).

This example uses your microphone to capture audio and stream it to Aldea in real-time. Save it as an HTML file and open it in your browser:

live.html
<!DOCTYPE html>
<html>
<body>
  <h1>Aldea Live Transcription</h1>
  <button id="start">Start</button>
  <button id="stop" disabled>Stop</button>
  <div id="transcript" style="margin-top: 1rem; white-space: pre-wrap;"></div>

  <script>
    const API_KEY = "YOUR_ALDEA_API_KEY"; 
    const WS_URL = "wss://api.aldea.ai/v1/listen";

    let socket, recorder;

    document.getElementById("start").onclick = async () => {
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

      socket = new WebSocket(WS_URL, ["token", API_KEY]);

      socket.onmessage = (event) => {
        const data = JSON.parse(event.data);
        if (data.type === "Results") {
          const text = data.channel.alternatives[0].transcript;
          if (text && data.is_final) {
            document.getElementById("transcript").textContent += text + "\n";
          }
        }
      };

      socket.onopen = () => {
        recorder = new MediaRecorder(stream, { mimeType: "audio/webm" });
        recorder.ondataavailable = (e) => {
          if (e.data.size > 0 && socket.readyState === WebSocket.OPEN) {
            socket.send(e.data);
          }
        };
        recorder.start(250); // Send audio every 250ms
        document.getElementById("start").disabled = true;
        document.getElementById("stop").disabled = false;
      };
    };

    document.getElementById("stop").onclick = () => {
      recorder?.stop();
      socket?.send(JSON.stringify({ type: "CloseStream" }));
      socket?.close();
      document.getElementById("start").disabled = false;
      document.getElementById("stop").disabled = true;
    };
  </script>
</body>
</html>

Open live.html in your browser, click Start, and speak. Transcripts appear in real-time.

Security: Never expose your API key in client-side code in production. Use a backend proxy to authenticate WebSocket connections. This example is for local testing only.

Step 2: Understand the response

Each message from the server is a JSON object. Transcript results look like this:

Streaming result
{
  "type": "Results",
  "channel_index": [0],
  "duration": 1.98,
  "start": 0.0,
  "is_final": true,
  "speech_final": true,
  "channel": {
    "alternatives": [
      {
        "transcript": "Hello world",
        "confidence": 0.95,
        "words": [
          ["Hello", 0, 320],
          ["world", 320, 640]
        ]
      }
    ]
  },
  "metadata": {
    "request_id": "...",
    "model_info": {
      "name": "<model>",
      "version": null,
      "arch": "aldea-asr"
    }
  }
}

Key fields

FieldMeaning
is_final: falseInterim result. The transcript may change as more audio arrives
is_final: trueFinal result for this audio segment
speech_final: trueThe speaker finished talking (end of utterance)
confidenceConfidence score (0–1) for the transcript
wordsArray of [word, start_ms, end_ms]. Timestamps are in milliseconds

Interim results are enabled by default. Set interim_results=false in the query string if you only want final transcripts.

Next steps

You can control endpointing sensitivity, enable PII redaction, set the transcription language, and more by adding query parameters to the WebSocket URL. See the full streaming API reference for all available options.