All Posts

AI, News, and Ethics

Talent acquisition software: adding AI video to your recruiting stack

Written by

Tavus Team

publish date

May 8, 2026

Gaussian Splatting: Explained Through Code

Recruiting teams have spent the last decade building sophisticated data pipelines. They can source from dozens of channels, score resumes with machine learning models, and move applicants through multi-stage workflows.

Candidate communication often loses momentum between the application form and the hiring manager's calendar. Some candidates hit silence. Others get a one-way video prompt that feels like talking into a void.

The data layer and workflow layer are mature, but candidate communication still goes quiet when applicants need presence and clear attention. Real-time conversational video provides candidates with live interaction at the points in the hiring funnel where a form, silence, or a recorded prompt typically appears. AI Personas that see, hear, understand, and respond in live face-to-face interactions give every applicant a real conversation regardless of volume, time zone, or recruiter availability.

What talent acquisition software does

Talent acquisition software is the category of platforms that hiring teams use to attract, evaluate, and hire candidates. Gartner defines it as the operational backbone of an organization's hiring stack. An applicant tracking system (ATS) is a component within a broader talent acquisition suite that integrates applicant tracking with CRM, recruitment marketing, employer branding, onboarding, and analytics.

Gartner's analysis identifies four forces reshaping talent acquisition in 2026:high-volume recruiting going AI-first, recruiter skills shifting toward more complex work, early-career programs being reshaped for future roles, and AI reshaping talent assessment.

Inside the modern recruiting stack

A mid-to-large enterprise runs talent acquisition across five layers: sourcing and candidate discovery, applicant tracking, screening and assessment, scheduling and candidate communication, and onboarding handoff into the human resources information system (HRIS) employee records.

Aptitude Research has noted that interviews are often neglected relative to upstream investments like job advertising and recruitment marketing. Scheduling and candidate communication are where fragmentation becomes most visible to candidates. Across all five layers, lack of integration within and between systems remains a major pain point for HR technology users.

The conversation gap in today's hiring stack

The recruiting stack has a structural gap. Most teams cannot hold real conversations with candidates at application volume between the moment someone applies and the moment a recruiter picks up the phone.

Candidates notice broken communication quickly. They expect clear communication, transparency, and respect throughout the hiring process. Lighthouse Research found that candidates want to see videos of hiring managers 2.5 times more often than company overviews, and 10 times more often than a message from HR.

Application volume per recruiter has surged in recent years; recruiter teams are handling more with fewer people, and candidate drop-off remains a persistent challenge. One-way video interviews can become a dropout point when they feel impersonal.

Candidates want speed and signs that someone is actually paying attention.

How AI video fits into talent acquisition software

Most recruiting teams have encountered AI-generated video in the form of pre-recorded clips for employer branding or as one-way candidate prompts. Real-time conversational video supports live, two-way conversations where the AI video agent sees the candidate, hears their responses, and replies with the timing and behavior of a person on the other end of a call.

Tavus provides real-time conversational video infrastructure that deploys AI Personas capable of seeing, hearing, understanding, and responding in these live interactions. The Conversational Video Interface (CVI) sits as an infrastructure layer within the recruiting stack, connecting to the ATS, scheduling systems, and candidate communication tools through APIs and webhooks. CVI is white-labeled, so the AI Persona carries the employer's brand.

The architecture behind candidate-facing AI video is a closed-loop behavioral stack. Sparrow-1 governs conversational flow; Raven-1 fuses the candidate's vocal tone, expression, and posture into a unified understanding of their state; the large language model (LLM) layer reasons about what to say and do next; and Phoenix-4 renders responsive facial behavior. An AI Persona is what the user sees, while the behavioral stack makes the conversation real.

Candidate-facing recruiting video works when the system can respond in real time while maintaining context, timing, and presence.

AI video use cases across the hiring funnel

Conversational AI video maps to specific points in the hiring funnel.

24/7 candidate screening and qualification is the highest-volume use case. A single evening could see 100 candidates each having a 15-minute AI-led screening conversation, something impossible for one recruiter to accomplish.
Role explanation and company culture introductions address a documented candidate need. Candidates can ask about day-to-day responsibilities, team structure, or benefits, and the AI Persona answers from the employer's own documentation.
Interview practice and candidate coaching extend beyond the employer's own funnel. Final Round AI, built on Tavus infrastructure, has logged over 1.2 million practice minutes, with 100,000 or more active users and an average session length of 12 minutes.
Scheduling, Q&A, and offer-stage conversations fill the mid-funnel communication gap where candidates most often get ghosted. An AI Persona can confirm interview logistics, answer benefits questions, or walk a candidate through next steps at 2 AM.

Candidates need answers, next steps, and a responsive presence at each of these points in the funnel.

For screening conversations, the AI Persona asks role-specific questions, branches based on responses using Objectives and Guardrails, and passes structured scorecards to the ATS. The Knowledge Base, a proprietary retrieval-augmented generation (RAG) model with approximately 30ms retrieval speed, grounds responses in uploaded job descriptions, compensation bands, and company materials.

CVI itself supports candidate interactions in 42+ languages, though the Knowledge Base currently supports English-language content only, which is worth factoring in for global recruiting programs operating across multiple language markets.

Function Calling can trigger actions in the employer's existing systems, while webhooks or a callback URL push structured interview data back to the ATS.

Memories lets the AI Persona retain context across sessions. A candidate returning for a follow-up conversation doesn't repeat information they already provided.

Conversational video fits naturally at the moments in the funnel where candidates need answers, context, or a responsive presence.

What to evaluate when adding AI video to your recruiting stack

When recruiting teams evaluate AI video, they need to look past the demo and examine the parts of the system that shape candidate experience and operational fit.

Conversational flow and response quality determine whether candidates complete the interaction or abandon it. Silence-based endpoint detection, the approach most voice systems use, creates stilted conversations because the system can't distinguish between "I'm done talking" and "I'm thinking." Sparrow-1 addresses this by predicting conversational floor ownership at the frame level, which allows more natural turn-taking than silence- or timeout-based approaches.

Sparrow-1 treats timing as a first-class modeling problem and governs precisely when the AI Persona should speak or hold. In benchmark testing across 28 challenging real-world conversational samples, it achieves 55ms median floor-prediction latency with 100% precision, 100% recall, and zero interruptions.

Multimodal perception of candidate signals adds a layer that most text and voice-only systems miss. If a candidate says "I'm fine" while their posture tightens and their tone flattens, Raven-1 fuses the flat tone with the tightened posture, catching the mismatch between what the candidate says and how they say it, and the LLM layer can adjust accordingly.

Knowledge accuracy and ATS integration determine whether a conversation holds together in production. Retrieval delays break conversational rhythm, and webhook-based integration lets structured interview data flow directly into ATS records without manual re-entry.

Guardrails, fairness, and bias considerations are non-negotiable for production recruiting deployments. AI-driven hiring is under active scrutiny, and candidate disclosure and audit requirements are becoming part of the deployment landscape. Objectives and Guardrails, native to CVI, set completion criteria and compliance boundaries for each conversation, keeping the AI Persona within scope and escalating to a human recruiter when a question falls outside approved territory or a candidate raises a sensitive topic.

The same standard applies to the visual side of the interaction. Phoenix-4, a real-time facial behavior engine running at 40fps and 1080p, animates the AI faces with active listening behaviors, including a nod when a candidate pauses mid-thought and micro-expressions matching conversational tone, while the candidate is still speaking.

Sparrow-1 timing, Raven-1 fusion, ATS integration, Objectives and Guardrails, and Phoenix-4 behavior together determine whether AI video holds up in production or only looks good in a demo.

How to roll out AI video in your recruiting stack

Start with one high-volume conversation type. First-round screening for frontline or customer service roles works well because volume is high, conversation structure is well-defined, and the stakes per interaction are lower than in executive hiring.

Define the recruiter handoff and escalation rules before launch. Set clear criteria for when a human recruiter takes over, whether that's a question outside the Knowledge Base scope, a sensitive topic, or a Persona Builder flag for human review.
Measure candidate experience alongside efficiency gains. Track completion rates, candidate satisfaction scores, and recruiter feedback on the quality of the scorecard.

Completion rates, satisfaction scores, and recruiter feedback help talent teams evaluate candidate experience and operational fit as a system, not just time-to-screen. Starting with one conversation type gives teams a cleaner way to test handoffs, candidate trust, and scorecard quality before expanding.

The future of talent acquisition software

Every recruiter remembers the candidate who almost didn't make it through the funnel. The candidate who was quiet at first needed space to think, then gave the answer that made the hiring manager sit up. Hiring has always depended on those small, human moments of recognition: a nod that says "keep going," a follow-up question that shows someone is listening.

That feeling of presence, the sense that someone on the other side of the screen genuinely sees you, is what candidates lose when the stack replaces conversation with forms and silence. The right infrastructure can return that sense of attention at every stage of the funnel, for every applicant, at any hour.

See it for yourself. Book a demo.

Phoenix-4: Real-Time Human Rendering with Emotional Intelligence

Phoenix-4 is the first real-time model to generate and control emotional states, active listening behavior, and continuous facial motion as a single, unified system. It is a real-time behavior generation engine, built from the ground up, that goes beyond photorealism to transform conversation data into emotionally responsive, context-aware facial expression and head motion with millisecond-level latency.

Eloi Du Bois

February 18, 2026

From random noise to real images: Understanding diffusion and flow matching

A clear intro to diffusion and flow-matching: data distributions, ODE vs SDE, and the path from Gaussian noise to realistic images/videos powering SOTA models.

Karthik Ragunath Ananda Kumar

September 22, 2025

Introducing the evolution of Conversational Video Interface – now with Emotional Intelligence

Introducing our new family of state-of-the-art AI models: Phoenix-3, Raven-0, and Sparrow-0. Together they bring Conversational Video Interfaces (CVI) to the next level, and power Charlie, our new demo persona.

Julia Szatar

March 6, 2025

Developer Account

PALs Account