Today Daily announcing an open standard for Real-time Voice and Video Inference: RTVI-AI. The RTVI abstractions and data structures define how client applications communicate with inference services.
These are the “real-time APIs” for use cases like:
- Voice chat with LLMs
- Enterprise voice workflows such as healthcare patient intake
- Video avatars and immersive experiences
- Voice-driven user interfaces
- Voice conversational apps for education, customer support, and games
- High-framerate image generation and streaming generative video
Daily Team shipping open source reference JavaScript and React SDKs today, with iOS, Android and other platform SDKS coming soon.
- Voice chat demo: https://demo.rtvi.ai
- RTVI overview: https://github.com/rtvi-ai/
- JavaScript and React SDKs: https://github.com/rtvi-ai/rtvi-client-web
This first release has been several months in the making, and incorporates work and insights from GroqInc, Deepgram, @FAL and others.
With RTVI, a “hello world” voice-to-voice AI chat app in JavaScript is 21 lines of code.
If you want to build real-time AI applications, implement infrastructure for real-time inference, or implement your own SDKs that leverage the RTVI standard, you are more than welcome to join this project.
//
import { RealtimeClient } from "@realtime-ai/voice-sdk";
function myTrackHandler (track, participant, voiceclient)
if (participant.isLocal ll track.kind !== 'audio') {
return;
}
let audioElement = document.createElement('audio');
audioElement.srcObject = new MediaStream([track]);
document.body.appendChild(audioElement); audioElement.play;
}
const voiceClient = new RealtimeClient({
baseUrl, enableMic: true,
eventHandlers: {
trackStarted: myTrackHandler,
});
voiceClient.start;
We welcome all contributions and ideas!
Read related articles: