RTVI-AI: Real-time Voice and Video Inference

RTVI-AI: Real-time Voice and Video Inference

Today Daily announcing an open standard for Real-time Voice and Video Inference: RTVI-AI. The RTVI abstractions and data structures define how client applications communicate with inference services.

These are the “real-time APIs” for use cases like:

  • Voice chat with LLMs
  • Enterprise voice workflows such as healthcare patient intake
  • Video avatars and immersive experiences
  • Voice-driven user interfaces
  • Voice conversational apps for education, customer support, and games
  • High-framerate image generation and streaming generative video

Daily Team shipping open source reference JavaScript and React SDKs today, with iOS, Android and other platform SDKS coming soon.

This first release has been several months in the making, and incorporates work and insights from GroqInc, Deepgram, @FAL and others.

With RTVI, a “hello world” voice-to-voice AI chat app in JavaScript is 21 lines of code.

If you want to build real-time AI applications, implement infrastructure for real-time inference, or implement your own SDKs that leverage the RTVI standard, you are more than welcome to join this project.

Voice-to-voice example code
//

import { RealtimeClient } from "@realtime-ai/voice-sdk";
function myTrackHandler (track, participant, voiceclient)
if (participant.isLocal ll track.kind !== 'audio') {
return;
}
let audioElement = document.createElement('audio');
audioElement.srcObject = new MediaStream([track]);
document.body.appendChild(audioElement); audioElement.play;
}
const voiceClient = new RealtimeClient({
baseUrl, enableMic: true,
eventHandlers: {
trackStarted: myTrackHandler,
});
voiceClient.start;

We welcome all contributions and ideas!

Read related articles:


Posted

in

by