Audio field to your Output:
Audio format
Audio data is a NumPy array of signed 16-bit integers:| Property | Value |
|---|---|
| dtype | int16 |
| Shape | (channels, samples) - typically (1, N) for mono |
| Sample rate | 48,000 Hz (default) |
| Channels | 1 (mono, default) |
Custom sample rates
If your model generates audio at a different sample rate, subclassAudio:
Synchronizing audio and video
When you emit an Output with both audio and video, the runtime keeps them in sync. The runtime splits batch video into individual frames and proportionally splits the audio across them. For models that produce audio and video at different rates (e.g. separate decoder threads), pair them by block index before emitting:frames_per_block video-frames worth of samples. At 30 fps with 3 frames per block, that’s round(48000 / 30) * 3 = 4,800 samples (100 ms) per block.
Full example
A model that plays back an MP4 file with synchronized audio and video:config.yml
reactor.yaml
Next
Video Input
Read webcam and media streams from the client.
The Run Loop
Emitting frames, batches, backpressure, and adaptive frame rates.