Putting Gen AI to good use: building an internal HR chatbot

Onboarding everywhere, at once

When I joined Arinti, I used to get many onboarding emails. As the information was scattered all over the place, it was not easy to find relevant pieces of information when needed. I often had to contact my manager to answer my questions. You can tell that this process was rather inefficient.

With the rise of Large Language Models (‘LLM’) like ChatGPT, it seemed like the perfect opportunity to try these new technologies.

The goal was simple: build a chatbot to answer any of our employees’ questions in an efficient way, and replace our current onboarding process.

(This contribution is made by Logan Vendrix, one of our amazing in-house data scientists, who’s on top of everything Large Language Model related!)

TL;DR

Our onboarding process leaves room for questions, which we want answered to everyone, whenever they have them.
So, we developed a smart chatbot to interact with our internal onboarding content, which is stored in various Notion pages on our tenant
We used technologies like OpenAI GPT to create a seamless chat experience, to make Q&A more convenient and fast
As the backbone of the LLM pipelines, we implemented LangChain
Notion content is converted into vectors, using OpenAI embedding models, and stored them in a vector database
Azure Fuctions automatically fetch any new Notion content on a daily basis, to keep our chatbot’s content up-to-date
The bot has memory states, so it can keep track of previous conversations
We deployed everything on Streamlit to build the front-end chat interface and we embed the app into our Notion page

Scope of this realisation

Breaking it down

Our project consists of three main parts:

We start with the Document Ingestion phase, where we convert all the content from our Notion onboarding page into numerical representations (vectors). Because Large Language Models like GPT can’t handle long text, we first need to split the content into smaller chunks before doing any conversion. To do this conversion, we are using an embedding model from OpenAI. Finally, we store those vectors in a vector database, that we access in the second phase.

In the second phase Query, the user enters a question in plain text, which we convert into a vector using the same embedding model we used in the first phase. We then use this new vector to look for any relevant/similar vectors in the vector database created earlier. The content linked to those relevant vectors is then passed along the user’s question to an LLM, like OpenAI GPT, to create a meaningful answer.

To create a better user experience, we add memory to our chatbot, so it keeps track of previous messages.

Finally, we create a chatbot interface where the user can interact with the bot. For this project, we use Streamlit for the UI. We then embed the app into our original onboarding Notion page.

Follow along to understand how we achieved all this!

Step 1: Document ingestion

Gathering content from our Notion

Goal: Convert Notion content into vectors and store them in a vector database.

Notion content

Firstly, we transfer all the content of our internal onboarding emails into a Notion page. For this, I create one main page and then sub-pages for each topic (Expenses, Parking, Internet Plan, Team, …). In each sub-page, more detailed information is given about each topic.

Using the Notion API, we can export all the content as markdown files, which keeps the text structured in titles, paragraphs, … Luckily, LangChain has many integrations, including one for Notion. That means we can easily integrate the content of our Notion page into a LangChain pipeline.

Split content into chunks

Large Language Models (LLMs) such as OpenAI GPT can’t handle big pieces of information. You can’t simply feed a whole book to GPT and ask questions. Indeed, there is a token size limit. In our case, we are using GPT 3.5 Turbo, which has a limit of 4,096 tokens.

Tokens can be thought as pieces of words. In general,

1 token ~= 4 chars in English
1 token ~= ¾ words
1-2 sentences ~= 30 tokens

For example, the sentence ‘I want a chatbot to talk with my data.’ represents 11 tokens for 38 characters.

There are different ways to split text into smaller pieces. Simple methods consist in defining a text size and using that size to split our original text. For example, if your defined text size is 500 words and our original text consists of 5000 words, we’ll end up with 5000/500 = 10 chunks. Each containing 500 words.

This technique is very simple but has one main downside. What if you split in the middle of a sentence? The sentence might lose its meaning. To counter that, you can introduce an overlap, meaning that the last words of the previous text are also in the next text, so that you don’t cut in the middle.

The technique we’ll be using is slightly more advanced as we define specific characters to split on. Our content is using markdowns for the titles, so we specify that we want to split on ‘#’. This means that the content of each paragraph will remain together and won’t be split in the middle.

Notion content with a red line showing we split content just before a title, rather than in the middle — Splitting content right before titles, rather than in the middle

Now that we have split our text into smaller pieces, we’ll convert them into vectors. Computers don’t understand words as humans do. So, we need to convert those texts into vectors so that computers are able to ‘understand’ them.

For this, we used embeddings models, like the ones offered by OpenAI. OpenAI has trained models that are not only able to convert text into vectors, but also manage to capture the meaning of text and store it in the vectors! This is very useful as similar pieces of text have similar vector embeddings. When looking at different vectors, we look at how ‘close’ they are to each other to know whether they are similar or not.

In the image below, we see that the vector representing ‘A sad boy is walking’ is closer to ‘A little boy is walking’ than ‘Look at my little cat!’. This is very useful for different ML tasks such as classification, recommendation, clustering, …

(For those interested: you can find a deeper explanation on vector embeddings on the Pinecone website.)

Great! Now we know why we should convert text to vectors: computers don’t understand words but understand numbers, so vectors representation helps us retain information about the meaning of the text.

When the conversion is done, we stored the vectors into a vector database. To make this whole process easier, we use the framework LangChain.

Automation

To keep the content of our chatbot up to date, we need to automate the Document Ingestion process. To do so, we use Azure Functions. They enable us to both manually trigger this process through a HTTP request, as well as trigger an automatic execution on a daily basis.

Step 2: Query the content

Retrieving the content from the vector database

Query base

Goal: Based on the user’s question, retrieve the most relevant pieces of information and answer the user’s question.

Graph showing we use a user's question in vector format to retrieve information from the vector database and convert it to text with GPT — Answering a question with the vector database

Flow

As our all content is now stored as vectors in a vector database, we can now get a question from the user, and interact with the database to retrieve useful content. To do this, we are once again using LangChain to create connections between different parts of the pipeline. The process is as follow:

We convert the user’s question into a vector (using the same embedding model we used in the Document Ingestion phase)
We pass the vector to the vector database and perform a similarity search (or vector search). Given a set of vectors (in red) and a query vector (user’s question in blue), we need to find the most similar items to the query. In the example below, we show the nearest neighbour search, which looks for the 3 closest points to the query vector.
With the most similar vectors found, we give the content linked to these vectors with the user’s question to an LLM, in our case OpenAI GPT 3.5 Turbo.
GPT formulates an answer based on these elements and a prompt.

You can think of a prompt as a guideline that the LLM has to follow. In this example, our prompt looks something like “You are a helpful AI assistant working at Arinti. Based on the given documents, answer the user’s question. Do not try to answer if you are not sure.”

Query with memory

Goal: Add memory capabilities to the chatbot, to keep track of previous messages.

Re-use previous messages to understand follow-up questions

To create a human-like interaction, we need to create some sort of memory, to track all previous messages of the conversation. It adds a bit of complexity to our previous flow but is worth it. To achieve this, we follow this process:

A chat history is initially created. Whenever the user asks a question or gets an answer back from the chatbot, we store the information in the chat history. As the conversation grows, the chat history keeps getting bigger. This chat history is the memory of our chatbot!
The user writes a question. The question is immediately stored in the chat history.
We combine the question and chat history into a stand-alone question.
It gets converted into a vector.
We pass it to the vector database to perform Search Similarity.
With the most similar vectors found, we give the content linked to these vectors with the user’s question to an LLM.
GPT formulates an answer based on these elements and a prompt.
The chatbot replies with the answer from GPT.
We pass the chatbot’s answer to the chat history.
Repeat!

Our HR chatbot showcasing memory persistance

Step 3: Interface building

Making it all come together in Notion

The chat interface

Goal: Create a chatbot interface that receives a user’s question and gives back an answer. The app should be available in Notion.

We use Streamlit to put the ‘Query with memory’ process into a nice chat interface, and then make it available in Notion with a frame.

A chat interface showing a question and answer

Deployment and Notion

After making sure our chatbot application works perfectly, we finally deploy it online. We also protect it with a password, so that only our employees can use it.

To make the experience even better, we embed the Streamlit app into Notion, so that employees don’t have to go to another place to use the bot.

And that’s it! We now have our own internal ChatGPT linked to our own Notion content!

Continuous development

This was the first version of our bot. We are planning to extend its knowledge to all of our documentation: projects, ideas, … Making it a one-stop for all our internal questions!
We used an onboarding knowledge base as the base of our chatbot, but we could have used a FAQ page as the base of it. As long as the content is stored somewhere, we can use as the base of a bot!

Well, that’s a wrap on our deep dive into chatbot development! Now that you’ve had an insider’s view on how these virtual helpers are crafted, can you imagine the possibilities they could unlock for your own business?

Picture this: Streamlined HR processes, instant responses to FAQs, and maybe even a digital receptionist welcoming visitors to your website. The future is here and it’s incredibly exciting!

Our amazing team, with Logan leading the charge on LLM’s, has both the expertise and enthusiasm to tailor a chatbot that’s just right for you. Think of it as adding an extra gear to your business machinery – a gear that runs smoothly, tirelessly, and adds that zing of efficiency.

Feeling curious? Inspired? Or just want a chatbot to share virtual jokes with? Whatever your reason, reach out to us. We’re all set to transform your digital journey with a chatbot sidekick!

Till then, happy innovating and here’s to embracing the future!

Reach out to us!