Pre-recorded audio
Transcribe an audio file with a single API call
Pre-recorded audio
In this quickstart, you send an audio file to the Aldea Speech-to-Text API and receive a JSON transcript with word-level timestamps. By the end, you have a working API call you can adapt for your own audio files.
The API accepts a POST request to /v1/listen with the raw audio bytes in the body. It auto-detects the audio format (WAV, MP3, AAC, FLAC, OGG, WebM, Opus, M4A) from binary headers, so no Content-Type header is needed. Add the timestamps: true header to include per-word timings in the response.
Prerequisites
- An Aldea API key. Sign up and generate one from the API Keys page.
Step 1: Make your first API request
Use one of the following examples to test your API key and transcribe text from speech. Replace YOUR_ALDEA_API_KEY with your actual API key.
Send a local audio file directly to the API:
curl -X POST "https://api.aldea.ai/v1/listen" \
-H "Authorization: Bearer YOUR_ALDEA_API_KEY" \
-H "timestamps: true" \
--data-binary @audio.wavThis works with any supported file format (MP3, WAV, AAC, FLAC, OGG, WebM, Opus, M4A).
You can also transcribe audio from a URL without downloading it first:
curl -X POST "https://api.aldea.ai/v1/listen" \
-H "Authorization: Bearer YOUR_ALDEA_API_KEY" \
-H "Content-Type: application/json" \
-H "timestamps: true" \
-d '{"url": "https://platform.aldea.ai/aldea_sample.wav"}'Replace YOUR_ALDEA_API_KEY with your API key.
Install the requests library if you don't have it:
pip install requestsSend an audio file to the API:
import requests
API_KEY = "YOUR_ALDEA_API_KEY"
with open("audio.wav", "rb") as f:
response = requests.post(
"https://api.aldea.ai/v1/listen",
headers={
"Authorization": f"Bearer {API_KEY}",
"timestamps": "true",
},
data=f.read(),
)
result = response.json()
print(result["results"]["channels"][0]["alternatives"][0]["transcript"])Run it:
python transcribe.pyReplace YOUR_ALDEA_API_KEY with your API key.
No dependencies needed. Uses the built-in fetch API (Node.js 18+):
import { readFileSync } from "fs";
const API_KEY = "YOUR_ALDEA_API_KEY";
const response = await fetch("https://api.aldea.ai/v1/listen", {
method: "POST",
headers: {
Authorization: `Bearer ${API_KEY}`,
timestamps: "true",
},
body: readFileSync("audio.wav"),
});
const result = await response.json();
console.log(result.results.channels[0].alternatives[0].transcript);Run it:
node transcribe.mjsReplace YOUR_ALDEA_API_KEY with your API key.
Step 2: Read the response
The API returns a JSON object with the transcript, confidence score, and word-level timestamps (when the timestamps: true header is included):
{
"metadata": {
"request_id": "77aaccd1-3b19-4000-9055-3f91009751b4",
"created": "2026-03-04T12:00:00.000000Z",
"duration": 6.916625,
"channels": 1
},
"results": {
"channels": [
{
"alternatives": [
{
"transcript": "Something, you know, it's just like I'm saying...",
"confidence": 0.802,
"words": [
{ "word": "Something,", "start": 0.04, "end": 0.36 },
{ "word": "you", "start": 0.44, "end": 0.52 }
]
}
]
}
]
}
}| Field | Description |
|---|---|
results.channels[0].alternatives[0].transcript | The full transcript text |
confidence | Confidence score (0–1) for the transcript |
words | Array of word objects with word, start, and end (in seconds). Requires the timestamps: true header. |
metadata.duration | Audio duration in seconds |
metadata.request_id | Unique identifier for the request |
Next steps
You can enable speaker diarization, set the transcription language, process audio asynchronously with callbacks, and more by adding query parameters to the request URL. Explore all available features in the API reference.