Retrieval augmentation for GPT-4o
May 28, 2024In this part you'll learn how to query relevant contexts to our queries from Pinecone, and pass these to a GPT-4o model to generate an answer backed by real data sources.
As an example, I'll store the transcripts of some of my videos in Pinecone. This will allow GPT to answer questions based on the detailed information shared in these videos, ensuring responses are accurate and contextually relevant.
I'll walk you through setting up your Pinecone account, creating an index, and integrating it with a Python script using the LangChain library. Let's get started!
What is Pinecone?
Pinecone is a high-performance vector database that's perfect for fast and scalable similarity searches. It's especially useful for applications that need to efficiently handle and query high-dimensional vectors, like the ones generated by machine learning models for tasks such as text and image embeddings.
Create an Account
First things first, head over to pinecone.io and create an account. Once you're registered, you can go ahead and create a new index.
Create an Index
- Name: Pick a name for your index.
- Dimension: Set the dimension to 1536. This matches the dimensionality of embeddings generated by OpenAI models. OpenAI embeddings, like those from GPT-3, use a 1536-dimensional vector space to capture the semantic meaning of text effectively.
API Key
Next, navigate to the API Keys section and copy your key. You'll need this in your Python script to interact with Pinecone.
Let's start coding ...
Next, let's create a Python script to store document chunks as embeddings in Pinecone using the LangChain library.
LangChain is a handy library that makes it easier to work with large language models and vector databases. It offers tools for loading documents, splitting text, generating embeddings, and interacting with vector stores like Pinecone.
Overview of the Process
The graphic below illustrates the overall process we'll follow:
- Document Loading: We'll start by loading documents from local txt files containing the transcripts of the last AI for Devs videos.
- Splitting: The documents are then split into smaller chunks to make processing more manageable.
- Storage: These chunks are stored in a vector store, which is a specialized database optimized for handling vector-based data.
- Retrieval: When you have a query, the system retrieves the most relevant chunks from the vector store.
- Output: Finally, these relevant chunks are used to create a prompt for a large language model (LLM) like GPT-4o, which generates the answer.
Storing Data
Let's get started with our first file to store the content of a huge text file into Pinecone. We create a new python store.py
file and start with some imports.
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_pinecone import PineconeVectorStore
For this we first have to install the libraries:
pip install langchain_community langchain_openai langchain_pinecone openai
We'll need to import modules to load documents, split text, create embeddings, and interface with the Pinecone vector database:
- TextLoader: Loads documents from a specified source.
- OpenAIEmbeddings: Uses OpenAI's models to generate embeddings.
- CharacterTextSplitter: Splits text documents into smaller chunks based on character count.
- PineconeVectorStore: Manages storing and retrieving vector data in Pinecone.
Now that we have our modules ready, let's move on to loading and splitting our text documents. We'll start by loading a document, then splitting it into smaller chunks for easier processing.
Transcript File.
loader = TextLoader("transcripts.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0, separator="\n")
docs = text_splitter.split_documents(documents)
First, we load the transcripts from 'transcripts.txt' using TextLoader
. Then, we split this text into chunks of about 1000 characters each with no overlap. We use new lines to decide where to split, thanks to CharacterTextSplitter
. Finally, we store these chunks in the docs
variable.
Now that we've split the complete text into nice little chunks, it's time to create embeddings that large language models (LLMs) can understand. We'll then save these embeddings in Pinecone.
embeddings = OpenAIEmbeddings()
index_name = "transcripts"
docsearch = PineconeVectorStore.from_documents(docs, embeddings, index_name=index_name)
Here's what this code does:
- embeddings: Initializes an
OpenAIEmbeddings
object to use OpenAI's model for converting text to embeddings. These embeddings are vector representations that capture the semantic meaning of the text. - index_name: Sets the name of the Pinecone index where the document vectors will be stored.
- docsearch: Utilizes the
from_documents
class method ofPineconeVectorStore
. This method processes the list of text documents (docs
), converts them to embeddings usingembeddings
, and stores them in the specified Pinecone index.
After running the script, you can head over to the Pinecone website and check out your index. You'll see that it's been filled with the document embeddings.
Querying and Responding with Pinecone and OpenAI
Now that we've set up our vector store, let's build on this to generate a response using OpenAI's GPT-4o. This part of the process involves retrieving the relevant documents, augmenting the query with context, and getting a response from the AI model.
First, we need to import the necessary libraries and set up our environment:
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
from openai import OpenAI
These imports bring in the modules needed to manage the vector store, generate embeddings, and interact with OpenAI's API.
Retrieving the Documents
We'll start by configuring our vector store to use the previously created index and generate embeddings for our query:
index_name = "transcripts"
embeddings = OpenAIEmbeddings()
vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings)
query = "What can be done with gpt and a fridge?"
docs = vectorstore.similarity_search(query, k=3)
Here's what this code does:
- embeddings: Initializes an
OpenAIEmbeddings
object to generate vector embeddings from text using OpenAI's models. - vectorstore: Sets up the
PineconeVectorStore
with the index name and embeddings object, establishing the connection to Pinecone. - query: Defines the text query for which we want to find semantically similar documents in our Pinecone index.
- docs: Calls the
similarity_search
method on thevectorstore
object, using the query to find the top matching document (withk=3
specifying we want the three most similar documents).
Building the Augmented Query
Next, we need to extract the content from the retrieved documents and build an augmented query to provide more context to our AI model:
contexts = [item.page_content for item in docs]
augmented_query = "\n\n---\n\n".join(contexts)+"\n\n-----\n\n"+query
Here's what this code does:
- contexts: Extracts the content from the retrieved documents.
- augmented_query: Combines the context from the documents with the original query, using delimiters for clarity. This helps the AI model understand the context of the query better.
Generating a Response
Finally, we send the augmented query to OpenAI's GPT-4o model to generate a response:
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o", messages=[
{
"role": "system",
"content": """
You are Q&A bot.
A highly intelligent system that answers user questions
based on the information provided by the user above each question.
If the information can not be found in the information provided
by the user you truthfully say "I don't know"
"""
},
{
"role": "user", "content": augmented_query
},
])
print(response.choices[0].message.content)
Here's what this code does:
- client: Initializes the OpenAI client to interact with OpenAI's API.
- response: Sends the augmented query to the GPT-4o model. The conversation context is provided in the
messages
parameter, with the system's role set to "You are Q&A bot..." and the user's role containing the augmented query. - print(response.choices[0].message.content): Outputs the model's response to the console.
When you run this script, the query is converted into a vector using OpenAI embeddings, and Pinecone performs a similarity search to find the most relevant document in the index. The context of the retrieved document is combined with the query and sent to the GPT-4o model to generate a response.
The result is then printed out, allowing you to see the AI's answer based on the contextual information from the indexed documents.
"You can use GPT, specifically a multimodal model like GPT-4V which can handle images, to create recipes from a simple image of your fridge. By analyzing the image to identify the items inside, GPT-4V can generate a menu or suggest recipes based on those ingredients."