BUILD AI (VIDEO) AGENTS

The Conversational Video Interface (CVI) is the end-to-end pipeline for face-to-face AI. Perception, dialogue, and real-time rendering. Plug in alongside your existing audio or text stack, or build from scratch.

Try the API

Quickstart

CVI Preview

MEDIA

2B interactions with video agents

<500ms average response time

Best in class engineering support community

15x user retention vs voice-only agents

Enterprise-grade security & compliance

1080p real-time avatar rendering

2B interactions with video agents

<500ms average response time

15x user retention vs voice-only agents

enterprise-grade security & compliance

1080p real-time avatar rendering

The Framework

Interactive video interface

CVI is an API-first platform for shipping AI video conversations fast. Start with our end-to-end defaults, then swap in your own LLM, voice, and knowledge stack as you scale, without rebuilding the pipeline. Resulting in AI agents that feel present in real-time, with natural turn-taking, active listening, and high-fidelity video output.

See Docs

CVI Session

Interactions

500ms

Latency

100+

Stock Replicas

15x

Retention

The Output

Deploy video
agents at scale

Video agents are the output while CVI is how you deploy them reliably at scale. We handle the real-time infrastructure, including latency, concurrency, and streaming, so you can launch globally with enterprise grade security, compliance, and white glove support from day one.

See Docs

Book Demo

CVI Session

Persona Config

{ "persona_name": "Sales Agent", "tools": ["book_meeting", "send_quote"], "language": "english" }

Architecture

How CVI works

CVI runs as a live closed loop system that captures and uses human signals at every step, then responds in real time with speech and rendering. Hover any node to explore.

User

video

Raven-1

Perception

context

LLM

Language Model

response

TTS

Text-to-Speech

video

Phoenix-4

Rendering

stream

Agent

audio

STT

Speech-to-Text

transcript

Sparrow-1

Turn-Taking

turn signal

live transcript

AUDIO & VIDEO

EVENTS & METADATA

User speaks

<500ms end-to-end

Agent responds

STT

Speech-to-Text

Transformer-based turn-taking that reads speech rhythm, pauses, and cues. Configurable patience and interruptibility per use case.

Lexical + semantic awareness
Custom hotwords
Speaker identification

Real-time conversation loop

Modular layers

30+

Languages

API call to start

100%

Configurable

Live in 10 lines of code

Get an interactive video agent running in minutes. Everything you need in one place.

Integration

curl

javascript

python

curl --request POST \
  --url https://tavusapi.com/v2/conversations \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: ' \
  --data '
{
  "replica_id": "rf4e9d9790f0",
  "persona_id": "pcb7a34da5fe"
}
'

const options = {
  method: 'POST',
  headers: {'x-api-key': '', 'Content-Type': 'application/json'},
  body: JSON.stringify({replica_id: 'rf4e9d9790f0', persona_id: 'pcb7a34da5fe'})
};

fetch('https://tavusapi.com/v2/conversations', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

import requests

url = "https://tavusapi.com/v2/conversations"

payload = {
    "replica_id": "rf4e9d9790f0",
    "persona_id": "pcb7a34da5fe"
}
headers = {
    "x-api-key": "",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

api reference

full docs

Sign up & get API key

Free credits, no credit card required.

Pick a replica

100+ stock replicas or train your own from 2 min of video.

Create a persona

Customize your conversation by adding your system prompt and guardrails.

Drop in 10 lines of code

React component, REST API, or plain iframe with full customization.

Deploy & go live

Ship to Vercel, AWS, or any infrastructure.

Monitor and iterate

Rich data channels, transcripts, recordings, and tools to iterate fast.

Start Building free

Capabilities

Models

We build models that teach machines perception, empathy, and expression so AI can finally understand the world as we do.

Phoenix-4

Rendering

Render & react in real-time

Real-time facial behavior engine that produces full-face animation, micro-expressions, and emotion-driven reactions with context-aware active listening. Studio-quality lip-sync with consistent identity preservation at 1080p, the highest fidelity real-time rendering on the market.

Best-in-class 1080p full-face rendering at 40+ FPS
Context-aware active listening (reacts while listening)
Explicit emotion control + micro-expressions across 10+ emotions

Read the Research

Raven-1

Perception

See & understand

Multimodal perception that analyzes facial expressions, tone of voice, gaze, emotion, and ambient environment in real time. Feeds rich context into the LLM so your agent actually understands what it sees and hears.

Visual + audio emotion detection
LLM oriented encoding
Trigger tools from visual/audio events

Read the Research

Sparrow-1

Dialogue

AI conversation that flows

Transformer-based turn-taking that handles natural pauses, interruptions, and conversational timing.

Smart turn-taking & interruption handling
Configurable patience & interruptibility
Learns & evolves with every conversation

Read the Research

Capabilities

Everything you need to build

Nine capabilities that turn a basic video agent into a production system. Mix and match to fit your use case.

Bring Your Own LLM + Audio

Plug in any OpenAI-compatible LLM and any TTS — ElevenLabs, Cartesia, or your own. Custom voices, custom models, fully modular.

Visual Perception

Raven-1 reads facial expressions, emotions, gaze direction, and objects in real time. Trigger function calls from visual or audio events.

Function Calling

Your agent can book meetings, pull records, submit forms, and call external APIs mid-conversation. Define tools and let the LLM decide when to use them.

Knowledge Base

Upload PDFs, docs, or crawl websites. 30ms retrieval with configurable quality, being the faster on the market today, your agent answers from your data, not hallucinations.

Cross-Session Memory

Agents remember context across conversations using flexible memory stores. Tie memories to users, sessions, or shared contexts like classrooms.

Emotionally aware conversations

CVI listens and responds with emotion you can see and hear. Phoenix renders real time micro expressions and natural timing so your agent feels present and human.

Multilingual

Deploy agents in 50+ languages with native-quality voices. Auto-detect speaker language and respond in kind. One agent, global reach.

Conversational Override

Take the wheel anytime. Inject responses verbatim or directionally, set turn-taking patience, force topic changes, or let the LLM run on autopilot. From fully autonomous to fully puppeted — and everything in between.

Conversation Data Layer

Every conversation generates structured data: full transcripts, emotion timelines, perception events, sentiment shifts. Export, query, or analyze at scale and in real time! Your conversations are a goldmine.

use cases

Capabilities combine
into real outcomes

Each card shows how CVI capabilities combine to solve a specific industry problem. This is what a configurable pipeline actually enables.

Healthcare

Perception

Function Calling

→

Data Layer

AI-Powered Patient Intake

The agent reads facial cues for distress or confusion, adjusts its tone, and triggers intake form submission and follow-up booking — all within a single video call.

Scenario

Patient joins video intake
Raven detects anxiety from facial expression
Agent softens tone, asks simpler questions
Function call submits intake form to EHR
Schedules follow-up appointment automatically

Education

Memory

Emotion Control

→

Data Layer

Adaptive AI Tutor

Remembers where each student struggled last session. Adjusts pacing, difficulty, and emotional tone to match the student's current state and learning history.

Scenario

Student returns for algebra session
Memory recalls last session's struggles with fractions
Starts with targeted review, not generic intro
Detects frustration → slows pace, adds encouragement
Tracks progress in memory for next session

Sales

custom LLM

Knowedge Base

→

Data Layer

Technical Sales Engineer

Connects your proprietary product database via RAG and your fine-tuned model. Answers deep technical questions with accurate specs pulled in real time.

Scenario

Prospect asks about API rate limits
RAG retrieves current pricing & limits doc
Custom LLM generates accurate comparison
Prospect asks follow-up about enterprise tier
Agent pulls enterprise docs, no hallucination

HR & Recruiting

Function Calling

Memory

→

Data Layer

Structured Interview Agent

Conducts consistent, rubric-scored interviews across hundreds of candidates. Remembers multi-session contexts and submits structured evaluations to your ATS.

Scenario

Candidate joins scheduled interview
Agent follows structured question flow
Scores responses against rubric in real time
Function call submits evaluation to ATS
Memory stores context for panel debrief

Customer Support

Perception

Emotion Control

→

Data Layer

Empathetic Support Agent

Sees confusion or frustration through visual perception and shifts to a calmer, more empathetic communication style. Escalates automatically when needed.

Scenario

Customer explains billing issue
Raven detects confusion from furrowed brow
Agent simplifies explanation, adds visual aids
Detects rising frustration → triggers empathy mode
Offers live agent handoff if unresolved

Real Estate

Knowledge Base

Function Calling

→

24/7 Property Concierge

Loaded with your full property listings via RAG. Answers detailed questions about any listing and books viewings directly into your calendar — around the clock.

Scenario

Buyer asks about 3BR homes under $500K
RAG retrieves matching listings instantly
Agent walks through features and neighborhood
Buyer wants to visit → function call books viewing
Follow-up details sent via email automatically

Travel & Hospitality

Multilingual

Override

→

Data Layer

Multilingual Concierge

Auto-detects guest language and responds natively. Loaded with local recommendations, booking info, and hotel policies. Staff can override to inject special offers or upgrades in real time.

Scenario

Guest joins from hotel room tablet
Agent detects Japanese, switches instantly
Answers questions about spa availability
Staff injects a room upgrade offer via override
Agent books spa + confirms upgrade in Japanese

Financial Services

Data Layer

Perception

→

Data Layer

Compliance-Ready Advisor

Every conversation is fully transcribed with emotion timelines and sentiment analysis. Perception flags confusion before clients sign. Memory ensures continuity across advisory sessions.

Scenario

Client joins quarterly portfolio review
Memory loads previous session context & goals
Agent walks through performance with RAG data
Perception detects hesitation on risk allocation
Full transcript + sentiment exported for compliance

Live Events

Override

Emotion Control

→

Data Layer

Interactive Event Host

A live event host that can be puppeted by producers in real time. Override injects scripted announcements, emotion control sets the energy level, and multilingual support handles global audiences.

Scenario

Virtual conference opens with 10K attendees
Producer injects welcome script via override
Agent delivers with high-energy emotion setting
Switches to Spanish for LATAM segment
Q&A mode: agent goes autonomous, answers live

FAQs

Questions? Answers

What's your latency?

~600ms from speech to video. Sub-500ms average. Industry leading.

Can I use my own LLM?

Yes. Any OpenAI-compatible API. Keep your logic private. 100% yours.

How much does it cost?

Free tier for dev. Starter $59/mo. Growth $397/mo. Custom for enterprise.

What languages are supported?

30+ languages with accent preservation. Auto-detection. Real multilingual support.

Can it see me?

Yes, with Raven perception. Emotion detection, facial expressions, objects. Optional and configurable.

Is there a react library?

@tavus/react-cvi on npm. Drop-in components. Full TypeScript support.

Do I need my own avatar?

No. Use 100+ stock avatars. Or upload 2 minutes of video to create your own.

Is it secure?

SOC2, HIPAA, GDPR compliant. White-label for enterprise. Privacy first.

developers

The future is just
an API call away

Free tier · No credit card · Full API access · 100+ stock replicas

Start building free

read the docs

API

curl

Javascript

Python

import requests

url = "https://tavusapi.com/v2/conversations"

payload = {
    "replica_id": "rf4e9d9790f0",
    "persona_id": "pcb7a34da5fe"
}
headers = {
    "x-api-key": "",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

curl --request POST \
  --url https://tavusapi.com/v2/conversations \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: ' \
  --data '
{
  "replica_id": "rf4e9d9790f0",
  "persona_id": "pcb7a34da5fe"
}
'

const options = {
  method: 'POST',
  headers: {'x-api-key': '', 'Content-Type': 'application/json'},
  body: JSON.stringify({replica_id: 'rf4e9d9790f0', persona_id: 'pcb7a34da5fe'})
};

fetch('https://tavusapi.com/v2/conversations', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

api reference

example repos

agent skills

Developer Account

PALs Account

BUILD AI (VIDEO) AGENTS

Interactive video interface

Deploy videoagents at scale

How CVI works

Live in 10 lines of code

Sign up & get API key

Pick a replica

Create a persona

Drop in 10 lines of code

Deploy & go live

Monitor and iterate

Models

Everything you need to build

Capabilities combineinto real outcomes

AI-Powered Patient Intake

Adaptive AI Tutor

Technical Sales Engineer

Structured Interview Agent

Empathetic Support Agent

24/7 Property Concierge

Multilingual Concierge

Compliance-Ready Advisor

Interactive Event Host

1

1

6

1

The future is justan API call away

Deploy video
agents at scale

Capabilities combine
into real outcomes

The future is just
an API call away