inference() is where your model produces output. It’s a Python generator that the runtime calls automatically. You write the loop, yield frames, and the runtime handles everything else.
Basic pattern
- Read
self.statefor current parameters. - Run your forward pass.
yieldanOutputinstance to send it to the client.
Sync vs async
inference() can be a regular generator or an async generator. Use async when you need to await inside the loop, for example sending messages to the client.
What to yield
Yield anOutput instance with data for each track:
Batch yields
If your model generates frames in batches, yield the entire batch(N, H, W, 3) in a single yield. The runtime splits it and emits each frame at the target rate.
Idling
YieldIdle or None to skip an iteration without emitting anything:
Lifecycle
The runtime manages the generator’s lifetime automatically:- Created when a client connects, after
@connectedfires. - Closed when the client disconnects. State is reset and buffers are flushed.
- Restarted if it returns early (e.g. a finite loop ends). The runtime creates a new generator on the same model instance as long as the client is still connected.
self.state is created for each connection, so every client starts with clean defaults.
State consistency
Events are dispatched only betweenyield points. While your generator is running an iteration (reading state, doing a forward pass, awaiting I/O), no event handler can mutate self.state. You can safely read state multiple times within an iteration without worrying about concurrent changes.
Next
Interactive State
Add richer parameters with validation and custom logic.
Events & Messages
Custom events, lifecycle hooks, and outbound messages.