Skip to main content
Audio tracks work the same way as video tracks. Declare them on your Output dataclass, and the runtime handles encoding, transport, and synchronization. Add an Audio field to your Output:
from dataclasses import dataclass
from reactor_runtime import Audio, Output, Video

@dataclass
class MyOutput(Output):
    main_video: Video
    main_audio: Audio
Emit audio alongside video in your run loop:
async def run(self):
    while True:
        await self.connected.wait()
        while self.connected.is_set():
            video_frame = self.pipe.forward(prompt=self.prompt)
            audio_samples = self.pipe.generate_audio()
            await self.emit(MyOutput(
                main_video=video_frame,    # (H, W, 3) uint8
                main_audio=audio_samples,  # (1, N) int16
            ))

Audio format

Audio data is a NumPy array of signed 16-bit integers:
PropertyValue
dtypeint16
Shape(channels, samples) - typically (1, N) for mono
Sample rate48,000 Hz (default)
Channels1 (mono, default)
The runtime encodes audio to Opus and streams it over WebRTC. The 48 kHz sample rate matches the Opus standard, so no resampling is needed if your audio is already at 48 kHz.

Custom sample rates

If your model generates audio at a different sample rate, subclass Audio:
from reactor_runtime import Audio

class Audio16k(Audio):
    sample_rate = 16_000

@dataclass
class MyOutput(Output):
    main_video: Video
    main_audio: Audio16k
The runtime resamples to 48 kHz for transport automatically.

Synchronizing audio and video

When you emit an Output with both audio and video, the runtime keeps them in sync. The runtime splits batch video into individual frames and proportionally splits the audio across them. For models that produce audio and video at different rates (e.g. separate decoder threads), pair them by block index before emitting:
import queue
import threading

AUDIO_SAMPLE_RATE = 48000

class AVModel(ReactorModel):
    fps = 30

    def load(self, config):
        self.fpb = config.get("frames_per_block", 3)

    async def run(self):
        while True:
            await self.connected.wait()
            block_queue = queue.Queue()

            # Decode audio and video on separate threads
            audio_thread = threading.Thread(
                target=self._decode_audio, args=(block_queue,)
            )
            video_thread = threading.Thread(
                target=self._decode_video, args=(block_queue,)
            )
            audio_thread.start()
            video_thread.start()

            # Pair blocks by index and emit synchronized
            await self._demux_and_emit(block_queue)

    async def _demux_and_emit(self, block_queue):
        audio_pending = {}
        video_pending = {}
        next_idx = 0

        while self.connected.is_set():
            kind, idx, data = block_queue.get()

            if kind == "audio":
                audio_pending[idx] = data
            elif kind == "video":
                video_pending[idx] = data

            # Emit when both tracks are ready for the next block
            while next_idx in audio_pending and next_idx in video_pending:
                await self.emit(AVOutput(
                    main_video=video_pending.pop(next_idx),
                    main_audio=audio_pending.pop(next_idx),
                ))
                next_idx += 1
Each audio block should contain frames_per_block video-frames worth of samples. At 30 fps with 3 frames per block, that’s round(48000 / 30) * 3 = 4,800 samples (100 ms) per block.

Full example

A model that plays back an MP4 file with synchronized audio and video:
from dataclasses import dataclass
from reactor_runtime import (
    Audio, Output, Video, ReactorModel, ModelMessage,
    connected, disconnected, event,
)
import numpy as np
import av

AUDIO_SAMPLE_RATE = 48000

@dataclass
class AVOutput(Output):
    main_video: Video
    main_audio: Audio

@dataclass
class PlaybackStarted(ModelMessage):
    pass

@dataclass
class PlaybackPaused(ModelMessage):
    pass

class PlaybackModel(ReactorModel):
    fps = 30

    def load(self, config):
        self.video_path = config["video_path"]
        self.fpb = config.get("frames_per_block", 3)
        self.audio_rtf = config.get("audio_rtf", 1.0)
        self.video_rtf = config.get("video_rtf", 1.0)
        self.playing = False

    @connected
    async def on_connect(self):
        self.playing = True
        await self.send(PlaybackStarted())

    @disconnected
    async def on_disconnect(self):
        self.playing = False
        self.output_buffer.flush()

    @event(name="pause", description="Pause playback")
    async def pause(self):
        self.playing = False
        await self.send(PlaybackPaused())

    @event(name="play", description="Resume playback")
    async def play(self):
        self.playing = True
        await self.send(PlaybackStarted())

    async def run(self):
        # See the synchronization section above for the
        # threaded decode + demux pattern
        ...
config.yml
fps: 30
frames_per_block: 3
audio_rtf: 1.0
video_rtf: 1.0
reactor.yaml
model: model:PlaybackModel
name: audio-example
config: config.yml

Next

Video Input

Read webcam and media streams from the client.

The Run Loop

Emitting frames, batches, backpressure, and adaptive frame rates.