AldeaAldea
Get startedQuickstart

Pre-recorded audio

Transcribe an audio file with a single API call

Pre-recorded audio

In this quickstart, you send an audio file to the Aldea Speech-to-Text API and receive a JSON transcript with word-level timestamps. By the end, you have a working API call you can adapt for your own audio files.

The API accepts a POST request to /v1/listen with the raw audio bytes in the body. It auto-detects the audio format (WAV, MP3, AAC, FLAC, OGG, WebM, Opus, M4A) from binary headers, so no Content-Type header is needed. Add the timestamps: true header to include per-word timings in the response.

Prerequisites

  • An Aldea API key. Sign up and generate one from the API Keys page.

Step 1: Make your first API request

Use one of the following examples to test your API key and transcribe text from speech. Replace YOUR_ALDEA_API_KEY with your actual API key.

Send a local audio file directly to the API:

Transcribe a local file
curl -X POST "https://api.aldea.ai/v1/listen" \
  -H "Authorization: Bearer YOUR_ALDEA_API_KEY" \
  -H "timestamps: true" \
  --data-binary @audio.wav

This works with any supported file format (MP3, WAV, AAC, FLAC, OGG, WebM, Opus, M4A).

You can also transcribe audio from a URL without downloading it first:

Transcribe from URL
curl -X POST "https://api.aldea.ai/v1/listen" \
  -H "Authorization: Bearer YOUR_ALDEA_API_KEY" \
  -H "Content-Type: application/json" \
  -H "timestamps: true" \
  -d '{"url": "https://platform.aldea.ai/aldea_sample.wav"}'

Replace YOUR_ALDEA_API_KEY with your API key.

Install the requests library if you don't have it:

pip install requests

Send an audio file to the API:

transcribe.py
import requests

API_KEY = "YOUR_ALDEA_API_KEY"

with open("audio.wav", "rb") as f:
    response = requests.post(
        "https://api.aldea.ai/v1/listen",
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "timestamps": "true",
        },
        data=f.read(),
    )

result = response.json()
print(result["results"]["channels"][0]["alternatives"][0]["transcript"])

Run it:

python transcribe.py

Replace YOUR_ALDEA_API_KEY with your API key.

No dependencies needed. Uses the built-in fetch API (Node.js 18+):

transcribe.mjs
import { readFileSync } from "fs";

const API_KEY = "YOUR_ALDEA_API_KEY";

const response = await fetch("https://api.aldea.ai/v1/listen", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${API_KEY}`,
    timestamps: "true",
  },
  body: readFileSync("audio.wav"),
});

const result = await response.json();
console.log(result.results.channels[0].alternatives[0].transcript);

Run it:

node transcribe.mjs

Replace YOUR_ALDEA_API_KEY with your API key.

Step 2: Read the response

The API returns a JSON object with the transcript, confidence score, and word-level timestamps (when the timestamps: true header is included):

Response
{
  "metadata": {
    "request_id": "77aaccd1-3b19-4000-9055-3f91009751b4",
    "created": "2026-03-04T12:00:00.000000Z",
    "duration": 6.916625,
    "channels": 1
  },
  "results": {
    "channels": [
      {
        "alternatives": [
          {
            "transcript": "Something, you know, it's just like I'm saying...",
            "confidence": 0.802,
            "words": [
              { "word": "Something,", "start": 0.04, "end": 0.36 },
              { "word": "you", "start": 0.44, "end": 0.52 }
            ]
          }
        ]
      }
    ]
  }
}
FieldDescription
results.channels[0].alternatives[0].transcriptThe full transcript text
confidenceConfidence score (0–1) for the transcript
wordsArray of word objects with word, start, and end (in seconds). Requires the timestamps: true header.
metadata.durationAudio duration in seconds
metadata.request_idUnique identifier for the request

Next steps

You can enable speaker diarization, set the transcription language, process audio asynchronously with callbacks, and more by adding query parameters to the request URL. Explore all available features in the API reference.