All Posts

Industry

Using Llama 3 8B to Power Your Conversational AI

Written by

Yujian Tang

publish date

September 24, 2024

Flight Log: 2/6/2026

Large Language Models (LLMs) are one of the most fluid pieces of AI technology in development today. Everyone is competing to make their own version of an ideal LLM. Here at Tavus, we’ve embraced the entire LLM landscape. We offer our own version of integrated LLMs, and we also offer you the ability to switch them out as you’d like. Find the full notebook here.

In this article, we’ll show one way you can use Llama 3 8B, one of the most popular open source LLMs to date, in place of a Tavus provided LLM. We cover:

What is Llama 3?
Using Llama 3 8B as a Custom LLM for a Personasome text
- Setting Up
- Defining the Persona and LLM
Testing the Persona in a Conversational Video Interface
Summary of Using Llama 3 8B as a Custom LLM for a Persona

What is Llama 3

Llama 3 is an open source LLM developed by Meta. The Llama model family was famously open sourced as Meta’s response to the rapidly growing LLM scene. Llama 2 was released in partnership with Microsoft within months of OpenAI’s announcement of GPT 3.5 and Mistral’s entrance into the market. A year later, Llama 3 was released.

Llama 3 comes in two main sizes: 8 billion, and 70 billion parameters. Despite its size, which is considerably smaller than the 175 billion parameters in GPT 3.5, it provides similar performance across the board. Llama 3 features multilingual capabilities with languages such as English, Spanish, French, German, and Chinese, as well as coding, reasoning, and tool usage.

Using Llama 3 8B as a Custom LLM for a Persona

Our own version of Llama, `tavus-llama`, is incredibly fast, but if you have a customized version, you can set that up easily using this guide. In this section, we show how you can use OctoAI to access Llama 3 and set it up as the LLM for the Persona. The Persona controls how the Replica acts in the Tavus real time Conversational Video Interface (CVI).

Setting Up

Before we get started, make sure you have the right API keys. For this specific example, you’ll need an API key from Tavus and one from OctoAI. In the code block below, I’ve saved my API keys in a `.env` account and I’m loading them into the environment with the `python-dotenv` package.

from dotenv import load_dotenv

import os

load_dotenv()

TAVUS_API_KEY = os.environ["TAVUS_API_KEY"]

OCTOAI_API_KEY = os.environ["OCTOAI_API_KEY"]

Defining the Persona and LLM

Once we have the API keys loaded, we can now define the Persona. In this example, we create a career coach to help software engineers advance from individual contributors to management. With Tavus, custom Personas are created via API calls. We begin by defining the JSON payload.

There are five first-level parameters: `persona_name`, `system_prompt`, `context`, `default_replica_id`, and `layers`. For this example, we name our persona “Career Coach”, and give it a system prompt that directs it to be a career coach that specializes in the software engineering discipline to help them move from ICs to managers.

Beyond telling the Persona how it should act and what it should do, we also provide some specific examples in the `context` parameter. Treat this entry like a place to give the Persona some memories. In this case, we provide the background that it should have. Next, we give the Persona a default Replica ID to use if one is not provided in the CVI later on.

The last entry we come to is `layers`. This is where we can swap out pieces like the LLM and text-to-speech (TTS). In this example, we swap out the LLM. To swap out the LLM, we need to provide the name of the model, a base URL, and the API key to use when hitting that endpoint. For this example, I’ve used Llama 3 8B instruct on the OctoAI platform.

import requests

url = "https://tavusapi.com/v2/personas"

payload = {

"persona_name": "Career Coach",

"system_prompt": "As a Career Coach, you are a dedicated professional who specializes in helping software engineers advance their careers from IC to management.",

"context": """You spent two years as a software engineer, you did a great job on the technology, then you were promoted to be a senior software engineer.

You spent three years as a senior software engineer. You did a great job not only designing robust systems, but also helping and mentoring more junior engineers.

Your work helping many other engineers got you promoted into a software engineering manager.""",

"default_replica_id": "r79e1c033f",

"layers": {

"llm": {

"model": "meta-llama-3-8b-instruct",

"base_url": "https://text.octoai.run/v1",

"api_key": OCTOAI_API_KEY,

}

‍

Before we can finish creating the Persona, we need to create our headers. In this case, all we need to do is pass the Tavus API key and the content type. Then, we simply send a POST request to the URL we defined at the top of the last code snippet with the payload and headers we just made.

‍

headers = {

"x-api-key": TAVUS_API_KEY,

"Content-Type": "application/json"

}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

‍

Your response should look something like this:

{

"persona_id":"p0b46deb",

"persona_name":"Career Coach",

"created_at":"2024-09-18T20:11:40.582Z"

}

Testing the Persona in a Conversational Video Interface

Once we’ve created a Persona, we can interact with it in a CVI instance. To do this, we set up another endpoint and payload. This payload has five first-level parameters to create. We can choose to provide this Replica ID or not. If we don’t provide a Replica ID, the conversation will use the default one provided to the Persona.

To use the Persona we just created, we do need to provide a Persona ID. This Persona ID is the ID returned after the Create Persona API call we just made above. Next, we should provide a conversation name, this is for you to access and use the conversation later.

We also should provide some conversational context. This is different from the context in the Persona. This is a more specific context not about the Persona itself, but about the upcoming conversation. Ideally, you would provide information about the person joining the call. Next up is the properties section. We use the properties to define a max call length and how long to wait to end the room if the participant disconnects.

conversation_url = "https://tavusapi.com/v2/conversations"

payload = {

"replica_id": "r79e1c033f",

"persona_id": "p0b46deb",

"conversation_name": "Career Coach Test",

"conversational_context": "You are about to talk to a senior software engineer who just got promoted and is looking to learn about the role and how to excel in it.",

"properties": {

"max_call_duration": 3600,

"participant_left_timeout": 60,

}

‍

Just like before, once we create a payload, we send a POST request to the URL.

response = requests.request("POST", conversation_url, json=payload, headers=headers)

print(response.text)

‍

The response should look something like this:

‍

{

"conversation_id":"c11e0b3a",

"conversation_name":"Career Coach Test",

"conversation_url":"https://tavus.daily.co/c11e0b3a",

"status":"active",

"callback_url":null,

"created_at":"2024-09-18T20:15:08.875Z"

}

‍

All you need to do to join the room and chat with your custom Persona is click the link provided in `conversation_url`.

Summary of Using Llama 3 8B as a Custom LLM for a Persona

In this article, we looked at how to use Llama 3 8B as a custom LLM for a Persona in Tavus’ Conversational Video Interface. One of Tavus’ features is its modular build, each conversation can be customized using different Personas and Replicas. On top of that, Personas can be easily customized using different system prompts, contexts, and components such as LLMs.

Tavus Personas have a section called “layers” which allows you to bring custom LLMs or text-to-speech engines as well as turn vision question-answering on and off. To bring a custom LLM, all you need is the name of the model you want to use, a base URL for the API endpoint, and an API key from the provider. In this example, we used Llama 3 8B from OctoAI.

Phoenix-4: Real-Time Human Rendering with Emotional Intelligence

Phoenix-4 is the first real-time model to generate and control emotional states, active listening behavior, and continuous facial motion as a single, unified system. It is a real-time behavior generation engine, built from the ground up, that goes beyond photorealism to transform conversation data into emotionally responsive, context-aware facial expression and head motion with millisecond-level latency.

Eloi Du Bois

February 18, 2026

From random noise to real images: Understanding diffusion and flow matching

A clear intro to diffusion and flow-matching: data distributions, ODE vs SDE, and the path from Gaussian noise to realistic images/videos powering SOTA models.

Karthik Ragunath Ananda Kumar

September 22, 2025

Introducing the evolution of Conversational Video Interface – now with Emotional Intelligence

Introducing our new family of state-of-the-art AI models: Phoenix-3, Raven-0, and Sparrow-0. Together they bring Conversational Video Interfaces (CVI) to the next level, and power Charlie, our new demo persona.

Julia Szatar

March 6, 2025

Developer Account

PALs Account