Skip to main content

LLM Tab

The LLM Tab allows you to configure the “brain” of your voice AI agent. You can select the provider and specific model that best fits your latency and intelligence requirements.

Supported LLM Models

OpenAI (Industry Leader)

  • GPT-4o: Highly intelligent, multimodal. Best for complex reasoning.
  • GPT-4o-mini: Specialized for speed and cost-effectiveness. Recommended for most voice agents.

Anthropic (Nuanced Conversation)

  • Claude 3.5 Sonnet: Excellent at following strict instructions and handling multi-step logic.

Deepgram/Groq (Lowest Latency)

  • Llama 3.1 (70B/8B): Open-source models optimized for inference speed. Best for extremely fast, conversational pacing.

Key Configurations

1. Temperature

Controls the “creativity” or “randomness” of the agent’s output.
  • Low (0.0 - 0.3): Highly consistent and factual. Recommended for technical support or bank agents.
  • Medium (0.4 - 0.7): Balanced. Good for general customer service.
  • High (0.8 - 1.0): More varied responses. Good for storytelling or informal chat.

2. Max Tokens

Limits the length of the agent’s response. Note: Voice AI agents should generally have short responses of 50-100 tokens to keep latency low.

3. Context Window

Determines how many previous turns of the conversation the agent remembers. Typically, the last 5-10 turns are sufficient for most tasks.