All Posts
AI interviews: how video agents conduct screening conversations


Most high-volume recruiting breaks at the screening stage. Recruiters repeat the same 15 to 20-minute call hundreds of times, take uneven notes, and make fast decisions under pressure.
Two recruiting teams at mid-size insurance companies post the same customer service role on the same day, and each receives roughly 800 applications. Six weeks later, one team has a shortlist of candidates who match the role's actual requirements, confirmed through structured conversations that captured how each person thinks through problems.
The other has a spreadsheet of scores from one-way video recordings, and half of their top applicants never completed them. The difference is that the first team treated screening as a conversation, and the second treated it as a form. Live AI screening conversations create a record of how candidates respond from the first exchange.
An AI interview is a candidate-screening interaction conducted by artificial intelligence, in which AI poses structured questions, evaluates responses, and generates scored outputs that feed into a recruiting workflow, without a human interviewer present.
The format ranges from text-based chatbot screens to recorded video responses to live, two-way video conversations. Each format produces a different kind of signal for recruiters to review.
One-way video interviews ask candidates to record answers to preset prompts without real-time interaction. Live AI screening conversations ask questions in real time, adapt based on what the candidate says, and respond with the timing and attentiveness of a human interviewer.
Platforms like Tavus deploy AI Personas, agents that see, hear, understand, and respond in live video, to conduct these two-way screening conversations at scale.
Recruiting teams define structured question sets tied to specific competencies. Each question is mapped to behavioral anchors that describe what a strong, adequate, or weak response looks like. Standardized questions asked in the same way to every candidate reduce evaluator variance and produce more equitable comparisons across the applicant pool.
Real-time follow-up gives the conversation structure and depth. Voice, expression, and conversational pacing give the AI a richer signal than a transcript alone. If a candidate mentions project management experience, the AI probes for team size and budget scope. If technical depth is thin, it shifts toward culture fit.
After the conversation, the AI generates a structured report containing scores, a transcript, and a recording, which is written back into the applicant tracking system (ATS) for recruiter review. Human recruiters still decide who advances.
The AI interview category contains four distinct formats, each serving a different purpose in the hiring funnel.
For high-volume hiring, live AI-led screening conversations and conversational assessments usually produce the most useful structured data for recruiters. They fit best when teams need consistency across a large applicant pool.
Published evidence in this category is still limited. Direct field comparisons across human recruiters, AI voice agents, and mixed screening paths remain sparse.
The available research shows that structured interviews outperform unstructured ones in predictive validity, and AI interviews enforce that structure by default. Among organizations using AI in recruiting, interviewing, or hiring, SHRM's 2024 Talent Trends survey found that nearly 9 in 10 do so to save time or increase efficiency.
AI interviews ask every candidate the same questions in the same way, providing recruiters with more consistent, structured information to review. Recruiters can compare candidates against the same set of signals.
That consistency also frees recruiters to focus on higher-value work, such as offer negotiations and candidate closing. Reducing evaluator variance changes how candidates are compared and advanced, particularly when paired with calibration sessions across hiring managers.
Candidate trust in AI fairness is low. Pew Research's 2023 survey on Americans' views of AI in hiring found that 66% of U.S. adults said they would not want to apply for a job that used AI to make hiring decisions, and 71% opposed AI making final hiring decisions.
Candidates in covered jurisdictions must receive clear disclosure about AI involvement, what is being evaluated, and how to request an alternative process. New York City's Local Law 144 (the Automated Employment Decision Tools rule) sets one of the more specific U.S. standards.
During the conversation, the candidate's experience depends on the AI's conversational quality. An AI that interrupts or fails to acknowledge what was just said signals that no one is listening.
Presence is the feeling that someone on the other side is paying genuine attention. It turns the interview into a conversation that a candidate can stay engaged in. After the conversation, candidates should receive clear next steps, a decision timeline, and a path to human contact.
Bias in AI hiring tools is documented and actively litigated. In 2023, iTutorGroup paid $365,000 to settle EEOC charges that its hiring software automatically rejected female applicants aged 55 and older and male applicants aged 60 and older.
The rules around these systems are tightening across jurisdictions. The EU AI Act classifies most employment uses of AI as high-risk. It prohibits emotion recognition systems in the workplace and education contexts, with limited exceptions for medical or safety reasons.
In the U.S., employers remain responsible for the effects of algorithmic hiring tools on candidates under existing anti-discrimination frameworks. Structured AI screens that flag uncertainty or sensitive topics should still keep a human recruiter in the loop.
Many AI interview platforms rely on transcripts generated from audio or video. Those transcripts can compress tone shifts and hesitations that show how a candidate thinks through problems.
Tavus is a real-time conversational video infrastructure platform. Tavus's Conversational Video Interface (CVI) deploys AI Personas that can see, hear, understand, and respond in real time to live video for enterprise recruiting teams.
An AI Persona isn't an avatar with a pre-scripted script; it's a system with perception, timing, memory, and reasoning, where the face is what the user sees, and the behavioral stack is what makes the conversation real.
The behavioral stack powering CVI operates as a closed loop across four components. Sparrow-1 governs conversational flow; Raven-1 perceives and fuses the candidate's emotional and attentional signals; the large language model (LLM) intelligence layer reasons about what to say and do next; and Phoenix-4 renders responsive facial behavior.
Raven-1, Tavus's multimodal perception system, fuses audio and visual signals into a unified understanding of the candidate's state. It catches the mismatch between a confident answer and a voice that tightens mid-sentence.
Sparrow-1, the conversational flow model, governs when the AI Persona speaks, waits, or holds the floor open. On Tavus's benchmark, Sparrow-1 achieves a median floor-prediction latency of 55ms with 100% precision, 100% recall, and zero interruptions across 28 challenging real-world conversational samples. That precision is what lets the AI Persona distinguish between a pause that means "I'm done" and one that means "I'm gathering my thoughts."
The LLM intelligence layer reasons about what to say next, routing the AI Persona to the appropriate follow-up question based on what Raven-1 has just perceived.
Phoenix-4, the real-time facial behavior engine, renders responsive facial behavior at 40fps and 1080p. When a candidate delivers a strong answer, the LLM directs the response, and Phoenix-4 renders the matching expression of engaged listening.
The Knowledge Base, Tavus's retrieval-augmented generation (RAG) system with ~30ms retrieval speed, grounds every response in uploaded job descriptions, leveling guides, and interview rubrics. (Knowledge Base currently supports English-language content, which is worth factoring in for recruiting teams running screens in multiple languages.)
Objectives and Guardrails define question flow, branching logic, and completion criteria, such as confirming that a candidate has addressed all five core competencies before the conversation ends. Guardrails also enforce compliance boundaries, allowing the AI Persona to redirect or constrain conversations based on recruiter-defined rules, including sensitive topics that should be escalated to a human.
Memories retain context across sessions. If a returning candidate has already confirmed three years of claims experience in a prior screen, the AI Persona skips that ground and focuses on what remains.
Function Calling connects the AI Persona to external systems mid-conversation. It can trigger actions such as scheduling a follow-up interview or retrieving a candidate's application status in the ATS, all without leaving the conversation.
A candidate sitting in her apartment at 9 PM, interviewing for a claims adjuster role she found that afternoon, finishes explaining how she handled an angry customer at her last job. The face on her screen nods slightly, pauses, then asks a follow-up about what she'd do differently now.
She pauses to think, and the AI Persona waits without rushing her. She answers honestly, and for a few minutes she forgets she's talking to a system.
That is presence. It's what makes the candidate feel heard, and it's what makes a structured record of how she thinks worth more than another scorecard. The same conversation runs 300 times that week, and on the recruiter's side, every one of those candidates gets the same patient hearing.
Screening has long meant repetitive conversations that strain recruiter time and attention. A real conversation, conducted with consistency, perception, and the patience to let a person finish their thought, is what every applicant has always wanted, and what every recruiter has always wished they had time to give.
See it for yourself. Book a demo.