Phoenix-4 is now live: Real-time human rendering with emotional intelligence. Learn more.
Beta
PAls
Developers
DOCS
CVI
Video Generation
Developer Sign In
Enterprise
Video AgentsSOlutionsMeet our AI SDR
Research
pricing
LoginGet Started
get started
LoginGet Started

Select an Account Type

Choose how you want to experience Tavus. Whether you’re building with our APIs or meeting a PAL, you can switch anytime.

Developer Account

Build real-time, human-like AI experiences using Tavus APIs and tools.

Best for developers, founders, and teams integrating Tavus into a product.

NEW DEVELOPER ACCOUNT

PALs Account

Meet your personal AI companions who listen, remember, and are always present.

Best for individuals looking to talk, explore, and connect with a friend.

NEW PALS ACCOUNT

BUILD AI (VIDEO) AGENTS

The Conversational Video Interface (CVI) is the end-to-end pipeline for face-to-face AI. Perception, dialogue, and real-time rendering. Plug in alongside your existing audio or text stack, or build from scratch.

Try the API
Quickstart

CVI Preview

MEDIA

2B interactions with video agents

<500msΒ average response time

Best in class engineering support community

15xΒ user retention vs voice-only agents

Enterprise-grade security & compliance

1080p real-time avatar rendering

2B interactions with video agents

<500ms average response time

15x user retention vs voice-only agents

enterprise-grade security & compliance

1080p real-time avatar rendering

The Framework

Interactive video interface

CVI is an API-first platform for shipping AI video conversations fast. Start with our end-to-end defaults, then swap in your own LLM, voice, and knowledge stack as you scale, without rebuilding the pipeline. Resulting in AI agents that feel present in real-time, with natural turn-taking, active listening, and high-fidelity video output.

Β See Docs
Sign Up for Free

CVI Session

2B

Interactions

500ms

Latency

100+

Stock Replicas

15x

Retention

The Output

Deploy video
agents at scale

Video agents are the output while CVI is how you deploy them reliably at scale. We handle the real-time infrastructure, including latency, concurrency, and streaming, so you can launch globally with enterprise grade security, compliance, and white glove support from day one.

Β See Docs
Book Demo

CVI Session

Persona Config

{ "persona_name": "Sales Agent",
Β "tools": ["book_meeting", "send_quote"],
Β "language": "english" }

Architecture

How CVI works

CVI runs as a live closed loop system that captures and uses human signals at every step, then responds in real time with speech and rendering. Hover any node to explore.
User
video
Raven-1
Perception
context
LLM
Language Model
response
TTS
Text-to-Speech
video
Phoenix-4
Rendering
stream
Agent
audio
STT
Speech-to-Text
transcript
Sparrow-1
Turn-Taking
turn signal
live transcript
AUDIO & VIDEO
EVENTS & METADATA
User speaks
<500ms end-to-end
Agent responds
STT
Speech-to-Text
Transformer-based turn-taking that reads speech rhythm, pauses, and cues. Configurable patience and interruptibility per use case.
  • Lexical + semantic awareness
  • Custom hotwords
  • Speaker identification
Real-time conversation loop
7
Modular layers
30+
Languages
1
API call to start
100%
Configurable

Live in 10 lines of code

Get an interactive video agent running in minutes. Everything you need in one place.

Integration

curl
javascript
python
curl --request POST \
  --url https://tavusapi.com/v2/conversations \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: ' \
  --data '
{
  "replica_id": "rf4e9d9790f0",
  "persona_id": "pcb7a34da5fe"
}
'
const options = {
  method: 'POST',
  headers: {'x-api-key': '', 'Content-Type': 'application/json'},
  body: JSON.stringify({replica_id: 'rf4e9d9790f0', persona_id: 'pcb7a34da5fe'})
};

fetch('https://tavusapi.com/v2/conversations', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));
import requests

url = "https://tavusapi.com/v2/conversations"

payload = {
    "replica_id": "rf4e9d9790f0",
    "persona_id": "pcb7a34da5fe"
}
headers = {
    "x-api-key": "",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)
api reference
full docs
01

Sign up & get API key

Free credits, no credit card required.
02

Pick a replica

100+ stock replicas or train your own from 2 min of video.
03

Create a persona

Customize your conversation by adding your system prompt and guardrails.
04

Drop in 10 lines of code

React component, REST API, or plain iframe with full customization.
05

Deploy & go live

Ship to Vercel, AWS, or any infrastructure.
06

Monitor and iterate

Rich data channels, transcripts, recordings, and tools to iterate fast.
Start Building free
Capabilities

Models

We build models that teach machines perception, empathy, and expression so AI can finally understand the world as we do.

Phoenix-4

Rendering

Render & react in real-time

Real-time facial behavior engine that produces full-face animation, micro-expressions, and emotion-driven reactions with context-aware active listening. Studio-quality lip-sync with consistent identity preservation at 1080p, the highest fidelity real-time rendering on the market.

  • Best-in-class 1080p full-face rendering at 40+ FPS

  • Context-aware active listening (reacts while listening)

  • Explicit emotion control + micro-expressions across 10+ emotions

Read the Research

Raven-1

Perception

See & understand

Multimodal perception that analyzes facial expressions, tone of voice, gaze, emotion, and ambient environment in real time. Feeds rich context into the LLM so your agent actually understands what it sees and hears.

  • Visual + audio emotion detection

  • LLM oriented encoding

  • Trigger tools from visual/audio events

Read the Research

Sparrow-1

Dialogue

AI conversation that flows

Transformer-based turn-taking that handles natural pauses, interruptions, and conversational timing.

  • Smart turn-taking & interruption handling

  • Configurable patience & interruptibility

  • Learns & evolves with every conversation

Read the Research
Capabilities

Everything you need to build

Nine capabilities that turn a basic video agent into a production system. Mix and match to fit your use case.

view all

BYO

Bring Your Own LLM + Audio

Plug in any OpenAI-compatible LLM and any TTS β€” ElevenLabs, Cartesia, or your own. Custom voices, custom models, fully modular.

view all

VISION

Visual Perception

Raven-1 reads facial expressions, emotions, gaze direction, and objects in real time. Trigger function calls from visual or audio events.

view all

ACTIONS

Function Calling

Your agent can book meetings, pull records, submit forms, and call external APIs mid-conversation. Define tools and let the LLM decide when to use them.

view all

RAG

Knowledge Base

Upload PDFs, docs, or crawl websites. 30ms retrieval with configurable quality, being the faster on the market today, your agent answers from your data, not hallucinations.

view all

MEMORY

Cross-Session Memory

Agents remember context across conversations using flexible memory stores. Tie memories to users, sessions, or shared contexts like classrooms.

view all

EMOTION

Emotionally aware conversations

CVI listens and responds with emotion you can see and hear. Phoenix renders real time micro expressions and natural timing so your agent feels present and human.

view all

I18N

Multilingual

Deploy agents in 50+ languages with native-quality voices. Auto-detect speaker language and respond in kind. One agent, global reach.

view all

OVERRIDE

Conversational Override

Take the wheel anytime. Inject responses verbatim or directionally, set turn-taking patience, force topic changes, or let the LLM run on autopilot. From fully autonomous to fully puppeted β€” and everything in between.

view all

BYO

Conversation Data Layer

Every conversation generates structured data: full transcripts, emotion timelines, perception events, sentiment shifts. Export, query, or analyze at scale and in real time! Your conversations are a goldmine.

use cases

Capabilities combine
into real outcomes

Each card shows how CVI capabilities combine to solve a specific industry problem. This is what a configurable pipeline actually enables.
Healthcare
Perception
+
Function Calling
β†’
Data Layer

AI-Powered Patient Intake

The agent reads facial cues for distress or confusion, adjusts its tone, and triggers intake form submission and follow-up booking β€” all within a single video call.
Scenario
  1. Patient joins video intake
  2. Raven detects anxiety from facial expression
  3. Agent softens tone, asks simpler questions
  4. Function call submits intake form to EHR
  5. Schedules follow-up appointment automatically
Education
Memory
+
Emotion Control
β†’
Data Layer

Adaptive AI Tutor

Remembers where each student struggled last session. Adjusts pacing, difficulty, and emotional tone to match the student's current state and learning history.
Scenario
  1. Student returns for algebra session
  2. Memory recalls last session's struggles with fractions
  3. Starts with targeted review, not generic intro
  4. Detects frustration β†’ slows pace, adds encouragement
  5. Tracks progress in memory for next session
Sales
custom LLM
+
Knowedge Base
β†’
Data Layer

Technical Sales Engineer

Connects your proprietary product database via RAG and your fine-tuned model. Answers deep technical questions with accurate specs pulled in real time.
Scenario
  1. Prospect asks about API rate limits
  2. RAG retrieves current pricing & limits doc
  3. Custom LLM generates accurate comparison
  4. Prospect asks follow-up about enterprise tier
  5. Agent pulls enterprise docs, no hallucination
HR & Recruiting
Function Calling
+
Memory
β†’
Data Layer

Structured Interview Agent

Conducts consistent, rubric-scored interviews across hundreds of candidates. Remembers multi-session contexts and submits structured evaluations to your ATS.
Scenario
  1. Candidate joins scheduled interview
  2. Agent follows structured question flow
  3. Scores responses against rubric in real time
  4. Function call submits evaluation to ATS
  5. Memory stores context for panel debrief
Customer Support
Perception
+
Emotion Control
β†’
Data Layer

Empathetic Support Agent

Sees confusion or frustration through visual perception and shifts to a calmer, more empathetic communication style. Escalates automatically when needed.
Scenario
  1. Customer explains billing issue
  2. Raven detects confusion from furrowed brow
  3. Agent simplifies explanation, adds visual aids
  4. Detects rising frustration β†’ triggers empathy mode
  5. Offers live agent handoff if unresolved
Real Estate
Knowledge Base
+
Function Calling
β†’

24/7 Property Concierge

Loaded with your full property listings via RAG. Answers detailed questions about any listing and books viewings directly into your calendar β€” around the clock.
Scenario
  1. Buyer asks about 3BR homes under $500K
  2. RAG retrieves matching listings instantly
  3. Agent walks through features and neighborhood
  4. Buyer wants to visit β†’ function call books viewing
  5. Follow-up details sent via email automatically
Travel & Hospitality
Multilingual
+
Override
β†’
Data Layer

Multilingual Concierge

Auto-detects guest language and responds natively. Loaded with local recommendations, booking info, and hotel policies. Staff can override to inject special offers or upgrades in real time.
Scenario
  1. Guest joins from hotel room tablet
  2. Agent detects Japanese, switches instantly
  3. Answers questions about spa availability
  4. Staff injects a room upgrade offer via override
  5. Agent books spa + confirms upgrade in Japanese
Financial Services
Data Layer
+
Perception
β†’
Data Layer

Compliance-Ready Advisor

Every conversation is fully transcribed with emotion timelines and sentiment analysis. Perception flags confusion before clients sign. Memory ensures continuity across advisory sessions.
Scenario
  1. Client joins quarterly portfolio review
  2. Memory loads previous session context & goals
  3. Agent walks through performance with RAG data
  4. Perception detects hesitation on risk allocation
  5. Full transcript + sentiment exported for compliance
Live Events
Override
+
Emotion Control
β†’
Data Layer

Interactive Event Host

A live event host that can be puppeted by producers in real time. Override injects scripted announcements, emotion control sets the energy level, and multilingual support handles global audiences.
Scenario
  1. Virtual conference opens with 10K attendees
  2. Producer injects welcome script via override
  3. Agent delivers with high-energy emotion setting
  4. Switches to Spanish for LATAM segment
  5. Q&A mode: agent goes autonomous, answers live

1

1

6

1

FAQs

Questions? Answers

What's your latency?

~600ms from speech to video. Sub-500ms average. Industry leading.

Can I use my own LLM?

Yes. Any OpenAI-compatible API. Keep your logic private. 100% yours.

How much does it cost?

Free tier for dev. Starter $59/mo. Growth $397/mo. Custom for enterprise.

What languages are supported?

30+ languages with accent preservation. Auto-detection. Real multilingual support.

Can it see me?

Yes, with Raven perception. Emotion detection, facial expressions, objects. Optional and configurable.

Is there a react library?

@tavus/react-cvi on npm. Drop-in components. Full TypeScript support.

Do I need my own avatar?

No. Use 100+ stock avatars. Or upload 2 minutes of video to create your own.

Is it secure?

SOC2, HIPAA, GDPR compliant. White-label for enterprise. Privacy first.

developers

The future is just
an API call away

Free tier Β· No credit card Β· Full API access Β· 100+ stock replicas
Start building free
read the docs

API

curl
Javascript
Python
import requests

url = "https://tavusapi.com/v2/conversations"

payload = {
    "replica_id": "rf4e9d9790f0",
    "persona_id": "pcb7a34da5fe"
}
headers = {
    "x-api-key": "",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)
curl --request POST \
  --url https://tavusapi.com/v2/conversations \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: ' \
  --data '
{
  "replica_id": "rf4e9d9790f0",
  "persona_id": "pcb7a34da5fe"
}
'
const options = {
  method: 'POST',
  headers: {'x-api-key': '', 'Content-Type': 'application/json'},
  body: JSON.stringify({replica_id: 'rf4e9d9790f0', persona_id: 'pcb7a34da5fe'})
};

fetch('https://tavusapi.com/v2/conversations', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));
api reference
example repos
agent skills

company

PricingEnterpriseCareersPartnerships

Resources

BlogPerspectivesBrand kit (download)Press kitInfo for AIs

developers

DocsAPI referenceVideo GenerationQuickstartllms.txt

research

Turn TakingRenderingLLM ThinkingSee all research

socials

LinkedInX

legal

ADAPrivacy policyTerms of serviceWebsite terms of service

Support

DiscordEmail support@tavus.ioPALs HelpSupport centerTrust center
explore with ai:
Β© 2026 Tavus Β Β | Β THE HUMAN COMPUTING COMPANY Β | Β All Rights Reserved