our research

A new kind of
research lab

Bridging the human-machine divide

our approach

Human communication
is like a dance

Human conversation is a rhythm—every glance, pause, and tone changes the meaning. At Tavus, we study that rhythm, designing AI that understands emotion, intent, and timing as one signal. We’re building systems that don’t just respond, they move with you.

SEE DOCS

The Dance

PIONEERING HUMAN COMPUTING

We’re teaching machines the art of being human: bringing together rendering, perception, emotion, and understanding to create AI humans that
feel natural, intuitive, and alive.

models

Models

We build models that teach machines perception, empathy, and expression so AI can finally understand the world as we do.

Rendering

Phoenix [4]

Phoenix-4, a gaussian-diffusion rendering model developed to synthesize high-fidelity facial behavior at the speed of human interaction, is the result of building real-time facial animation systems that reproduce subtle, temporally consistent expressions with precise control over motion and identity.

Perception

Raven [1]

Raven-1, a novel multimodal perception model designed to unify object recognition, emotion detection, and adaptive attention within a single contextual framework, emerged from modeling how machines interpret people and environments by integrating visual input, emotional signals, and spatial relationships.

Emotional understanding

Sparrow [1]

Sparrow-1, a transformer-based dialogue model that captures conversational timing, responsiveness, and humanlike interaction flow using multimodal alignment techniques, embodies research into parsing communicative intent, emotional state, and turn-level structure across voice, language, and gesture.

Research areas

We study how intelligence perceives context, emotion, and tone to create AI that understands and acts as humans do.

contextual perception

Understanding meaning beyond words. Tone, timing, intent, and everything unsaid.

Audio understanding

Teaching machines to truly listen. Not just to sounds, but to emotion, cadence, and rhythm.

Agentic interaction

Building systems that act with awareness, not automation. Capable of response, reasoning, and restraint.

human-like speech

Synthesizing voice that carries emotion, not just words. Warmth, hesitation, humor, humanity.

Real-time rendering

Turning intelligence into motion. Seamless, lifelike expression that feels natural and alive.

Conversational intelligence

Making dialogue intuitive and human. Conversations that adapt, remember, and build trust over time.

CVI Terminal

Read our latest research

We study how intelligence perceives context, emotion, and tone to create AI that understands and acts as humans do.

From random noise to real images: Understanding diffusion and flow matching

A clear intro to diffusion and flow-matching: data distributions, ODE vs SDE, and the path from Gaussian noise to realistic images/videos powering SOTA models.

Karthik Ragunath Ananda Kumar

22/9/2025

Understanding intuition behind multi-turn LLMs through the prism of search

Discover the latest research in how LLMs use reinforcement learning to search, reason, and refine answers across multiple turns—boosting accuracy and enabling active problem-solving.

Karthik Ragunath Ananda Kumar

8/7/2025

Hummingbird-0: Advancing Zero-Shot Lip Synchronization in AI-Generated Video

We made an unexpected discovery while developing our premium conversational AI technology. Components of our advanced video pipeline could be isolated and explicitly optimized for lip synchronization, with remarkable results. This serendipitous research byproduct evolved into Hummingbird, a specialized zero-shot lip-sync model that achieves state-of-the-art performance compared to other leading solutions.

Yevhenii Petrenko

24/4/2025

Sparrow-0: Advancing Conversational Responsiveness in Video Agents with Transformer-Based Turn-Taking

In this paper, we dive into the development and research behind Sparrow-0, exploring the innovative transformer-based approach for turn-taking and its integration alongside Raven and Phoenix models within our Conversational Video Interface (CVI), an end-to-end operating system designed for building responsive video agents.

Brian Johnson

2/4/2025

See all research

Ethical and aligned
by design

We believe technology earns trust through honesty, not opacity. Tavus is built on informed consent, transparent systems, and full disclosure—no fine print, no hidden levers. Every model, dataset, and likeness we use exists with permission and purpose. You deserve to know how the magic works, and we’re here to show you.

Learn more

Where research becomes reality

Our research manifests as the traits that make AI feel human.

Expressive (and authentic)

AI Humans bring face-to-face connection to every conversation.

Get Started for free

benefit [1]

Real-time conversation

Trained on millions of conversations to deliver smooth, humanlike dialogue.

benefit [2]

Superhuman perception

Understands actions, emotions, and screenshares to respond with context.

benefit [3]

Lifelike
presence

Displays expressive reactions and movement that build trust and engagement.

Perceptive (and aware)

AI Humans are modeled after us: they see, sense, and understand to build trust through real conversation.

Get Started for Free

benefit [1]

Perception

Deciphers nonverbal cues like body language and micro-expressions. Uses context to adapt responses and create meaningful, two-way interactions.

benefit [2]

Multimodal

Every input adds context, ensuring the AI Human sees the full picture: screenshare, voice, and surroundings.

benefit [3]

Awareness

Monitors key events and behaviors to trigger function calls while continuously sensing subtle background shifts with real-time data.

Thinking (with agency)

AI Humans are fully formed, with the cognitive skills needed for efficient, effective conversations.

Get Started for Free

benefit [1]

Knowledge

Industry-leading RAG grounds responses in your data. 15x faster than other solutions.

benefit [2]

Memory

Remembers past interactions to personalize responses and pickup conversations where they left off. Free to toggle on or off to fit any interaction.

benefit [3]

Structure

Uses customizable frameworks and logic branching to naturally structure conversations and keep moving toward your goals.

Deployable (and customizable)

AI Humans are designed to work for you: scalable, flexible, and ready to perform.

Get Started for free

benefit [1]

Scale

Deploy and manage AI Humans at scale, with infrastructure, WebRTC, VAD, and ASR fully managed behind the scenes.

benefit [2]

Insights

Transcripts, visual context, and emotional markers from every conversation are accessible and used to inform improve user experiences.

benefit [3]

White-labeled

Developer first APIs. With simple, plug-and-play endpoints, you can embed AI Humans into any website or platform with ease.

Join the team
decoding conversation

Join the team shaping how humans and machines understand each other. We’re researchers, engineers, and artists building AI that listens, learns, and connects like people do. If you care about the future of intelligence and how it feels, you’ll fit right in.

Careers

We’re teaching machines the art of being human: bringing together rendering, perception, emotion, and understanding to create AI humans that feel natural, intuitive, and alive.

Models

Phoenix [4]

Raven [1]

Sparrow [1]

Research areas

Where research becomes reality

Expressive (and authentic)

Real-time conversation

Superhuman perception

Lifelikepresence

Perceptive (and aware)

Perception

Multimodal

Awareness

Thinking (with agency)

Knowledge

Memory

Structure

Deployable (and customizable)

Scale

Insights

White-labeled

You’ve never talked to AI like this before.

Bring human connection to every AI interaction.

We’re teaching machines the art of being human: bringing together rendering, perception, emotion, and understanding to create AI humans that
feel natural, intuitive, and alive.

Expressive (and authentic)

Lifelike
presence