RTVI-AI: Real-time Voice and Video Inference

Today Daily announcing an open standard for Real-time Voice and Video Inference: RTVI-AI. The RTVI abstractions and data structures define how client applications communicate with inference services.

These are the “real-time APIs” for use cases like:

Voice chat with LLMs
Enterprise voice workflows such as healthcare patient intake
Video avatars and immersive experiences
Voice-driven user interfaces
Voice conversational apps for education, customer support, and games
High-framerate image generation and streaming generative video

Daily Team shipping open source reference JavaScript and React SDKs today, with iOS, Android and other platform SDKS coming soon.

Voice chat demo: https://demo.rtvi.ai
RTVI overview: https://github.com/rtvi-ai/
JavaScript and React SDKs: https://github.com/rtvi-ai/rtvi-client-web

This first release has been several months in the making, and incorporates work and insights from GroqInc, Deepgram, @FAL and others.

With RTVI, a “hello world” voice-to-voice AI chat app in JavaScript is 21 lines of code.

If you want to build real-time AI applications, implement infrastructure for real-time inference, or implement your own SDKs that leverage the RTVI standard, you are more than welcome to join this project.

//

import { RealtimeClient } from "@realtime-ai/voice-sdk";
function myTrackHandler (track, participant, voiceclient)
if (participant.isLocal ll track.kind !== 'audio') {
return;
｝
let audioElement = document.createElement('audio');
audioElement.srcObject = new MediaStream([track]);
document.body.appendChild(audioElement); audioElement.play;
}
const voiceClient = new RealtimeClient({
baseUrl, enableMic: true,
eventHandlers: {
trackStarted: myTrackHandler,
});
voiceClient.start;

We welcome all contributions and ideas!

Read related articles: