Create your own Notion Chatbot with LangChain, OpenAI and Streamlit
Introduction
Hey! My name is Logan Vendrix and I’m a Data Scientist at Arinti, an AI company based in Belgium. Lately, I have been focusing on LLMs projects, and it’s been fascinating. I have learned a lot thanks to people sharing their knowledge online, and now, I want to give back by teaching you how to build your own Notion chatbot using LangChain, OpenAI, FAISS, and Streamlit!
All project files are available on my GitHub!
Why a Notion Chatbot?
Problem
At my company, we have all our company content in Notion. As we have a lot of documentation, it can be hard to find the right information quickly. So I built a Notion chatbot to make it easier to find information, using the latest AI technologies.
Solution
First, we use LangChain to load and split the Notion content, which we then convert to vectors using OpenAI embeddings, and store them in a vector database (FAISS in this case). Using LangChain, we build a Conversational Retrieval Chain to link our vector database and OpenAI GPT to answer the user’s question. The answer is based on the most relevant information found in our Notion content. We improve the chatbot by creating a custom system prompt and adding memory to it. Finally, we create a chat interface with Streamlit and embed the app directly in our Notion!
Roadmap
Step 1: Project structure and initialization
We look at the project structure and install the required dependencies. We also retrieve our OpenAI API key and duplicate a public Notion page that will serve as the base for this project.
Step 2: Document Ingestion
We convert all the content from our Notion pages into numerical representations (vectors). Because LLMs like GPT can’t handle very long text, we first need to split the text content into smaller text chunks. We’ll use LangChain for that. Once split, we use OpenAI’s embedding model to convert them into vectors. Finally, we store the vectors in a vector database.
Step 3: Query
The user enters a question, which we convert into a vector using the same embedding model. We use this vector to look for similar vectors in the vector database created earlier. The relevant text content is finally passed along with the user’s question to OpenAI GPT to create an answer.
For a better chat experience, we add memory to our chatbot by storing previous messages in a chat history. During the conversation, the chatbot has access to it.
Step 4. Chatbot application
We create the chatbot interface using Streamlit. With it, we can easily create a beautiful chat application, that we can then deploy online. Finally, we embed the app directly into our Notion page.
This tutorial is inspired by Harrison Chase’s one (LangChain’s founder) on how to chat with Notion content using LangChain. We improve his implementation by
- Using specific markdown characters for better content splitting
- Adding memory to the bot
- Using Streamlit as our front-end chat application using the new chat features
- Embedding the Streamlit chat application into a Notion page
Tutorial
1. Project structure and initialization
1.1 Project structure
The structure of the project notion-chatbot consists of
.streamlit/secrets.toml
⇒ to store your OpenAI API keyfaiss_index
⇒ FAISS index (vector database) where all the vectors are storednotion_content
⇒ folder containing our Notion content, in markdown files.gitignore
⇒ to ignore your OpenAI API key and Notion contentapp.py
⇒ script of the Streamlit chat applicationingest.py
⇒ script to convert Notion content to vectors and store those in a vector indexutils.py
⇒ script to create a Conversation Retrieval Chainrequirements.txt
⇒ necessary packages to deploy to Streamlit Community Cloud
We’ll create each file step-by-step during this tutorial, so no need to create them all at once.
Now, let’s initialize the project!
1.2 Project initialization
- Start by creating a project folder
notion-chatbot
- Create a new environment and install the required dependencies
pip install streamlit langchain openai tiktoken faiss-cpu
- Create the
.gitignore
file to specify which files not to track
# .gitignore
notion_content/
.streamlit/
- Head towards OpenAI’s website to get your API key
- Create the folder
.streamlit
- In
.streamlit
, create the filesecrets.toml
to store your OpenAI API key as follows
# secrets.toml
OPENAI_API_KEY = 'sk-A1B2C3D4E5F6G7H8I9J'
- For this tutorial, we’ll use Blendle Employee Handbook as our knowledge base
- No Notion account? Head to their website and create an account. It’s free!
- Select
Duplicate
on the top-right corner to duplicate it into your own Notion
2. Document ingestion
2.1 Export your Notion content
- Head towards the main Notion page of the Blendle Employee Handbook
- On the right top corner, click on the 3 dots
- Choose
Export
- Select
Markdown and CSV
for Export Format - Select
Include subpages
- Save your file as
notion_content.zip
- Unzip your folder
- Place the content folder
notion_content
into your project foldernotion-chatbot
It’s also possible to use Notion’s API to get your Notion content. For the simplicity of this tutorial, we manually export the content.
Great! Now you should have all the notion content as .md
files in the folder notion_content
, within your project folder notion-chatbot
.
2.1 Convert Notion content to vectors
To use the content of our Notion page as the knowledge base of our chatbot, we need to convert all the content into vectors and store them.
For this, we’ll use LangChain, OpenAI embedding model, and FAISS.
Open your project folder in your favorite IDE and create the file ingest.py
# ingest.py
import streamlit as st
import openai
from langchain.document_loaders import NotionDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
# Load OpenAI API key
openai.api_key = st.secrets["OPENAI_API_KEY"]
# Load the Notion content located in the folder 'notion_content'
loader = NotionDirectoryLoader("notion_content")
documents = loader.load()
# Split the Notion content into smaller chunks
markdown_splitter = RecursiveCharacterTextSplitter(
separators=["#","##", "###", "\n\n","\n","."],
chunk_size=1500,
chunk_overlap=100)
docs = markdown_splitter.split_documents(documents)
# Initialize OpenAI embedding model
embeddings = OpenAIEmbeddings()
# Convert all chunks into vectors embeddings using OpenAI embedding model
# Store all vectors in FAISS index and save to local folder 'faiss_index'
db = FAISS.from_documents(docs, embeddings)
db.save_local("faiss_index")
print('Local FAISS index has been successfully saved.')
Let’s go over the code:
- We load the OpenAI API key stored in
.streamlit/secrets.toml
- We load the Notion content located in the
notion_content
folder usingNotionDirectoryLoader
- We split the content into smaller text chunks using
RecursiveCharacterTextSplitter
. There are different ways to split text. As our Notion content consists of markdown files with headings (#
for H1,##
for H2,###
for H3), we choose to split on those specific characters. This ensures that we split the content at the best place between paragraphs, and not between sentences of the same paragraph. If the split can't be done on headings, it tries to split on the characters'\n\n', '\n', '.'
that separate sentences.RecursiveCharacterTextSplitter
follows the order of the list of characters you provide, meaning it will use the next character in the list until the chunks are small enough. We choose to have a chunk size of 1500 and an overlap of 100. Experiment with different values to see what works best for your project. - We convert each text chunk into a vector using OpenAI embedding model
- We store all the vectors in a FAISS index
Nice, that was easy! Now that we have converted our Notion content to vectors and that they are stored in a vector database, let’s see how we can interact with them!
3. Query
3.1 Flow
- A chat history is initially created. Whenever the user asks a question or the chatbot gives an answer, we store these messages in the chat history. As the conversation grows, the chatbot keeps track of the previous messages. This is the memory of our chatbot!
- The user writes a question. The question is immediately stored in the chat history.
- We combine the question and chat history into a stand-alone question.
- We convert the stand-alone question into a vector. We use the same embedding model we used in the Document Ingestion phase.
- We pass the vector to the vector database and perform a similarity search (aka vector search). To explain it simply: given a set of vectors (in red) and a query vector (user’s question in blue), we need to find the most similar items to the question. In the example below, we show the nearest neighbor search, which looks for the 3 closest vectors to the query vector.
- With the most similar vectors found, we give the Notion content linked to these vectors with the stand-alone question to GPT.
- GPT formulates an answer following the guidelines of a system prompt.
- The chatbot replies to the user with the answer from GPT.
- We pass the chatbot’s answer to the chat history.
- Repeat!
Makes sense? Nice! Now, let’s code this!
3.2 Query
First, we’ll create a LangChain Conversational Retrieval Chain, which will be the brain of our application. We’ll create the file utils.py
where we create the function load_chain()
that returns a Conversational Retrieval Chain.
# utils.py
import streamlit as st
import openai
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferWindowMemory
from langchain.chat_models import ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import SystemMessagePromptTemplate
openai.api_key = st.secrets["OPENAI_API_KEY"]
@st.cache_resource
def load_chain():
"""
The `load_chain()` function initializes and configures a conversational retrieval chain for
answering user questions.
:return: The `load_chain()` function returns a ConversationalRetrievalChain object.
"""
# Load OpenAI embedding model
embeddings = OpenAIEmbeddings()
# Load OpenAI chat model
llm = ChatOpenAI(temperature=0)
# Load our local FAISS index as a retriever
vector_store = FAISS.load_local("faiss_index", embeddings)
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
# Create memory 'chat_history'
memory = ConversationBufferWindowMemory(k=3,memory_key="chat_history")
# Create system prompt
template = """
You are an AI assistant for answering questions about the Blendle Employee Handbook.
You are given the following extracted parts of a long document and a question. Provide a conversational answer.
If you don't know the answer, just say 'Sorry, I don't know ... 😔.
Don't try to make up an answer.
If the question is not about the Blendle Employee Handbook, politely inform them that you are tuned to only answer questions about the Blendle Employee Handbook.
{context}
Question: {question}
Helpful Answer:"""
# Create the Conversational Chain
chain = ConversationalRetrievalChain.from_llm(llm=llm,
retriever=retriever,
memory=memory,
get_chat_history=lambda h : h,
verbose=True)
# Add systemp prompt to chain
# Can only add it at the end for ConversationalRetrievalChain
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template)
chain.combine_docs_chain.llm_chain.prompt.messages[0] = SystemMessagePromptTemplate(prompt=QA_CHAIN_PROMPT)
return chain
Let’s go over the code:
- We load the OpenAI API key stored in
.streamlit/secrets.toml
- We create the function
load_chain()
that returns aConversational Retrieval Chain object
. We usest.cache_resource
to make our application more efficient. Using this, we only run theload_chain()
function at the very beginning and store the result in a local cache. Later, it will skip its execution as no changes have been made. Very convenient! - We load the OpenAI embedding model that will convert the user’s queries into vectors.
- We load the OpenAI chat model that will generate the answers. To do so, it will use the stand-alone question (combining the user’s question and chat history) and relevant documents. We specify a temperature of 0, meaning that the model will always select the highest probability word. A higher temperature means that the model might select a word with a slightly lower probability, leading to more variation, randomness, and creativity. Play with it to see what works best for you.
- We load our local FAISS index as a retriever, meaning that the chain will use it to search for relevant information. We define
k=3
, meaning that we’ll look for the 3 most relevant documents in the vector database. - We create the memory of our chatbot using
ConversationBufferWindowMemory
. We definek=3
, meaning the chatbot will look at the last 3 interactions when creating the stand-alone question. This is useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large. - We create the system prompt, which acts as the guidelines for our chatbot. We specify how the chatbot should behave, what it should do when it cannot find an answer, or when the user’s question is not within its scope.
- We finally create the chain using
ConversationalRetrievalChain
, linking all the previous elements together. We setverbose=True
to see what’s happening under the hood when running the chain. This makes it easier to see what information the chatbot uses to answer user’s questions. - We add the system prompt to the chain. Currently, it appears that we can only add it after defining the chain when using
ConversationalRetrievalChain.from_llm
.
4. Chatbot application
4.1 Streamlit application
Now that the brain of our chatbot is built, let’s put it all in a Streamlit application!
# app.py
import time
import streamlit as st
from utils import load_chain
# Custom image for the app icon and the assistant's avatar
company_logo = 'https://www.app.nl/wp-content/uploads/2019/01/Blendle.png'
# Configure Streamlit page
st.set_page_config(
page_title="Your Notion Chatbot",
page_icon=company_logo
)
# Initialize LLM chain in session_state
if 'chain' not in st.session_state:
st.session_state['chain']= load_chain()
# Initialize chat history
if 'messages' not in st.session_state:
# Start with first message from assistant
st.session_state['messages'] = [{"role": "assistant",
"content": "Hi human! I am Blendle's smart AI. How can I help you today?"}]
# Display chat messages from history on app rerun
# Custom avatar for the assistant, default avatar for user
for message in st.session_state.messages:
if message["role"] == 'assistant':
with st.chat_message(message["role"], avatar=company_logo):
st.markdown(message["content"])
else:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# Chat logic
if query := st.chat_input("Ask me anything"):
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": query})
# Display user message in chat message container
with st.chat_message("user"):
st.markdown(query)
with st.chat_message("assistant", avatar=company_logo):
message_placeholder = st.empty()
# Send user's question to our chain
result = st.session_state['chain']({"question": query})
response = result['answer']
full_response = ""
# Simulate stream of response with milliseconds delay
for chunk in response.split():
full_response += chunk + " "
time.sleep(0.05)
# Add a blinking cursor to simulate typing
message_placeholder.markdown(full_response + "▌")
message_placeholder.markdown(full_response)
# Add assistant message to chat history
st.session_state.messages.append({"role": "assistant", "content": response})
Let’s go over the code:
- From
utils.py
, we importload_chain()
that loads the Conversational Retrieval Chain we created earlier. - We load an image from a URL, that we use as our app’s page icon as well as our assistant’s avatar in the chat application.
- We initialize our chain in the session state.
- We initialize the chat history in the session state. We also include a first message from the assistant, welcoming the user.
- We display all the messages of the chat history. We specify a custom avatar for the assistant, while the user’s avatar is the default one
- We create the chat logic. It
- receives the query from the user and stores it in the chat history
- displays the user’s query in the chat
- passes the user’s query to our chain
st.session_state['chain']({"question": query})
- gets a response back
- displays the response in the chat, and simulates a human-speed response by slowing down the typing speed
- stores the response in the chat history
4.2 Deployment on Streamlit Cloud
When all has been tested and you are happy with your chatbot, it’s time to go live! To deploy it on Streamlit Cloud:
- Create a
requirements.txt
file to store the required dependencies
# requirements.txt
openai
langchain
faiss-cpu
tiktoken
- Deploy the app and click on
Advanced settings
. There, specify your Python version and your OpenAI API key (same as in your localsecrets.toml
file)
4.3 Embed your Streamlit app in Notion
- Once your app has been successfully deployed, copy the URL of your app
- Head to your Notion page
- Embed your app by selecting
Embed
in the block options - Paste your app’s URL and click on
Embed link
- Voilà! Have fun interacting with your content using your new Notion chatbot!
Congratulations!
In this tutorial, you have learned to
- Convert your Notion content to vectors using OpenAI embedding model and store them in a FAISS index
- Build a Conversation Retrieval Chain using LangChain, with custom prompt and memory
- Build and deploy a Streamlit chat application using its latest chat features
- Embed a Streamlit chat application into your Notion page
If you have any questions, please post them in the comments below. And if you want to learn more about AI and LLMs, let’s connect on LinkedIn!