Enhancing Large Language Models with Retrieval Augmented Generation
Written on
Retrieval Augmented Generation (RAG) is emerging as a crucial technique in the realm of large language models (LLMs), which are increasingly central to numerous organizations as they transition toward artificial intelligence. While LLMs have gained popularity for various beneficial reasons, improper use can lead to significant drawbacks, including unexpected responses, fabricated information, and biases. This phenomenon, termed "hallucination," can occur due to various factors.
To counteract LLM hallucinations, several strategies have been developed, including fine-tuning, prompt engineering, and notably, Retrieval Augmented Generation (RAG). RAG has garnered attention for its effectiveness in addressing the misinformation generated by large language models.
In this article, we will explore the workings of RAG through a practical implementation using SingleStore as a vector database for managing vector data.
What is Retrieval Augmented Generation (RAG)?
LLMs sometimes generate hallucinated outputs, and RAG serves as one of the methods to mitigate this issue. In response to a user query, RAG retrieves relevant information from a pre-defined source or dataset, which is stored in a vector database. Unlike traditional databases, a vector database is specifically designed for storing vector data.
Vector data is represented as embeddings, which encapsulate the context and meaning of various objects. For instance, if one seeks customized responses from an AI application, the organization’s documents can be transformed into embeddings using an embedding model and then stored in a vector database. When a query is issued to the AI application, it is converted into a vector query embedding. This embedding is then used to search the vector database for the most similar object through vector similarity search. As a result, the LLM-powered application is less likely to hallucinate, as it has been instructed to generate tailored responses based on the provided custom data.
A practical application could be in customer support, where specific data relevant to products or services is stored in a vector database. When a user inquiry is made, the application can generate an appropriate response rather than a generic one. Thus, RAG is transforming various domains.
RAG Pipeline
The RAG pipeline consists of three essential components: Retrieval, Augmentation, and Generation.
- Retrieval: This component is responsible for sourcing relevant information from an external knowledge base, such as a vector database, for any given user query. This step is critical for curating meaningful and contextually accurate responses.
- Augmentation: This stage enhances the retrieved information, adding relevant context to tailor the response to the user query.
- Generation: Finally, the LLM compiles a conclusive output for the user, utilizing both its prior knowledge and the provided context to formulate an appropriate response.
These three components form the backbone of the RAG pipeline, ensuring that users receive contextually rich and accurate information. This is why RAG is particularly valuable in developing chatbots, question-answering systems, and similar applications.
RAG Tutorial
Let's construct a straightforward AI application that can retrieve contextually relevant information from our own dataset for any user query.
Begin by signing up for a SingleStore database to employ it as our vector database. After registration, create a workspace, which is a simple and free process.
Once your workspace is established, create a database and name it as desired.
From the interface, you can create the database via the 'Create Database' tab.
Next, navigate to ‘Develop’ to access the Notebooks feature, akin to Jupyter Notebooks.
Create a new Notebook and give it a name of your choice.
Before proceeding, select your workspace and database from the dropdown menu in the Notebook.
Now, start adding the following code snippets into your newly created Notebook.
Install the Required Libraries
!pip install openai numpy pandas singlestoredb langchain==0.1.8 langchain-community==0.0.21 langchain-core==0.1.25 langchain-openai==0.0.6
Vector Embeddings Example
def word_to_vector(word):
# Define some basic rules for our vector components
vector = [0] * 5 # Initialize a vector of 5 dimensions
# Rule 1: Length of the word (normalized to a max of 10 characters for simplicity)
vector[0] = len(word) / 10
# Rule 2: Number of vowels in the word (normalized to the length of the word)
vowels = 'aeiou'
vector[1] = sum(1 for char in word if char in vowels) / len(word)
# Rule 3: Whether the word starts with a vowel (1) or not (0)
vector[2] = 1 if word[0] in vowels else 0
# Rule 4: Whether the word ends with a vowel (1) or not (0)
vector[3] = 1 if word[-1] in vowels else 0
# Rule 5: Percentage of consonants in the word
vector[4] = sum(1 for char in word if char not in vowels and char.isalpha()) / len(word)
return vector
# Example usage word = "example" vector = word_to_vector(word) print(f"Word: {word}nVector: {vector}")
Vector Similarity Example
import numpy as np
def cosine_similarity(vector_a, vector_b):
# Calculate the dot product of vectors
dot_product = np.dot(vector_a, vector_b)
# Calculate the norm (magnitude) of each vector
norm_a = np.linalg.norm(vector_a)
norm_b = np.linalg.norm(vector_b)
# Calculate cosine similarity
similarity = dot_product / (norm_a * norm_b)
return similarity
# Example usage word1 = "example" word2 = "sample" vector1 = word_to_vector(word1) vector2 = word_to_vector(word2)
# Calculate and print cosine similarity similarity_score = cosine_similarity(vector1, vector2) print(f"Cosine similarity between '{word1}' and '{word2}': {similarity_score}")
Embedding Models
OPENAI_KEY = "INSERT OPENAI KEY" from openai import OpenAI client = OpenAI(api_key=OPENAI_KEY)
def openAIEmbeddings(input):
response = client.embeddings.create(
input="input",
model="text-embedding-3-small"
)
return response.data[0].embedding
print(openAIEmbeddings("Golden Retriever"))
Creating a Vector Database with SingleStoreDB
We will utilize the LangChain framework and SingleStore as a vector database to store our embeddings, along with a public .txt file link containing Sherlock Holmes stories.
Add OpenAI API Key as an Environment Variable: import os os.environ['OPENAI_API_KEY'] = 'mention your openai api key'
Next, import necessary libraries, specify the file to use in the example, load the file, split it, and insert the content into the SingleStore database. Finally, pose a query related to the document utilized. import openai from langchain.text_splitter import CharacterTextSplitter from langchain_community.document_loaders import TextLoader from langchain_community.embeddings import OpenAIEmbeddings from langchain_community.vectorstores.singlestoredb import SingleStoreDB import os import pandas as pd import requests
# URL of the public .txt file you want to use file_url = "https://sherlock-holm.es/stories/plain-text/stud.txt"
# Send a GET request to the file URL response = requests.get(file_url)
# Proceed if the file was successfully downloaded if response.status_code == 200:
file_content = response.text
# Save the content to a file
file_path = 'downloaded_example.txt'
with open(file_path, 'w', encoding='utf-8') as f:
f.write(file_content)# Now, you can proceed with your original code using 'downloaded_example.txt'
# Load and process documents
loader = TextLoader(file_path) # Use the downloaded document
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
# Generate embeddings and create a document search database
OPENAI_KEY = "add your openai key" # Replace with your OpenAI API key
embeddings = OpenAIEmbeddings(api_key=OPENAI_KEY)
# Create Vector Database
vector_database = SingleStoreDB.from_documents(docs, embeddings, table_name="scarlet") # Replace "your_table_name" with your actual table name
query = "which university did he study?"
docs = vector_database.similarity_search(query)
print(docs[0].page_content)
else:
print("Failed to download the file. Please check the URL and try again.")
After executing the above code, a prompt will appear to enter your query regarding the referenced Sherlock Holmes story.
We successfully retrieved pertinent information from the provided dataset, which guided the response generation process. By converting our file into embeddings and storing them in the SingleStore database, we established a retrievable corpus of information. This ensures that responses are not only relevant but also content-rich, derived from the provided dataset.
This article is published under Generative AI Publication.
Stay connected with us on Substack, LinkedIn, and Zeniteq to keep up with the latest in AI. Together, let's shape the future of artificial intelligence!