Latency Metrics Guide
In voice AI, Latency is the time difference between the user finishing a sentence and the AI agent starting to speak. High latency breaks the conversational flow and makes it feel robotic.Understanding the Latency Stack
- ASR Latency: Time to transcribe speech to text.
- LLM Latency: Time for the “Brain” to think and generate the first token of the response.
- TTS Latency: Time to turn that text into audio tokens.
- Network Latency: Time for all this data to travel over the internet.
How we optimize for < 500ms
Movoice AI uses several techniques to achieve industry-leading response times:- Streaming: We “stream” results from each layer. As soon as the first token of a sentence is ready from the LLM, it’s sent to the TTS.
- Edge Deployment: Our servers are deployed close to telephony gateways to minimize network travel.
- Small Models: Using models like
GPT-4o-miniorLlama 3 8Bsignificantly reduces “Time to First Word.”
Monitoring Your Agent
You can see real-time latency metrics for every call in the Analytics Tab. Look for the TTFB (Time to First Byte) metric.Recommendations for Low Latency
- Use Deepgram Aura or Sarvam AI for TTS.
- Keep your System Prompts clean and efficient.
- Use Streaming Mode (Enabled by default in Movoice AI).
