Skip to main content

Latency Metrics Guide

In voice AI, Latency is the time difference between the user finishing a sentence and the AI agent starting to speak. High latency breaks the conversational flow and makes it feel robotic.

Understanding the Latency Stack

  1. ASR Latency: Time to transcribe speech to text.
  2. LLM Latency: Time for the “Brain” to think and generate the first token of the response.
  3. TTS Latency: Time to turn that text into audio tokens.
  4. Network Latency: Time for all this data to travel over the internet.

How we optimize for < 500ms

Movoice AI uses several techniques to achieve industry-leading response times:
  • Streaming: We “stream” results from each layer. As soon as the first token of a sentence is ready from the LLM, it’s sent to the TTS.
  • Edge Deployment: Our servers are deployed close to telephony gateways to minimize network travel.
  • Small Models: Using models like GPT-4o-mini or Llama 3 8B significantly reduces “Time to First Word.”

Monitoring Your Agent

You can see real-time latency metrics for every call in the Analytics Tab. Look for the TTFB (Time to First Byte) metric.

Recommendations for Low Latency

  • Use Deepgram Aura or Sarvam AI for TTS.
  • Keep your System Prompts clean and efficient.
  • Use Streaming Mode (Enabled by default in Movoice AI).