Building Agents With LangGraph Part 3/4: RAG and Long-Term…

In the previous part of our series, we’ve added web search and intelligent shutdown to our chatbot to explore tools and structured output.

In this tutorial, we’re going to enrich our chatbot with two functionalities — Retrieval Augmented Generation and long-term memory. You can find scripts from this part here.

If you’re not familiar with how Retrieval Augmented Generation works, make sure to read our in-depth article on RAG.

Adding Retrieval Augmented Generation to our LangGraph chatbot

Let’s start by loading the doc(s) — as an example, we’re going to use the React Native ExecuTorch documentation. The library lets you run AI models natively on edge devices:

from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://docs.swmansion.com/react-native-executorch/")
docs = loader.load()

Then, we need to split the loaded docs into chunks:

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

We also need to create a vector store for chunks, which is based on an embedding model. Embeddings don’t have to match those used by the foundational model, since they’re only needed for retrieving text from the vector store. A popular option here is all-MiniLM-L6-v2.

from langchain_core.vectorstores import InMemoryVectorStore
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = InMemoryVectorStore(embeddings)

Next, we add the chunks to the vector store:

_ = vector_store.add_documents(documents=all_splits)

The only thing left is to retrieve the relevant pieces for the user’s prompt. To do that, we update ask_llm. In this example, we just pass the retrieved text directly as context. Alternatively, you could call the LLM separately to generate a summary of that context — feel free to try that approach as an exercise.

def ask_llm(state: State) -> State:
    user_query = input("query: ")

    retrieved_docs = vector_store.similarity_search(user_query)
    context = "\n\n".join([doc.page_content for doc in retrieved_docs])
    user_message = HumanMessage(f"Context:\n{context}\n\nUser question:\n{user_query}")

    # homework: try to add an intermediate step, where a prompt is created by the LLM based on the query and context, before passing it to the LLM below.

    answer_message: AIMessage = model_with_search.invoke(
        state["messages"] + [user_message]
    )

    return {
        "messages": [user_message, answer_message],
    }

)

Check our GitHub repository to see what the final code should look like. The parts added in this tutorial are highlighted below:

from langchain_core.vectorstores import InMemoryVectorStore
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = InMemoryVectorStore(embeddings)

loader = WebBaseLoader("https://docs.swmansion.com/react-native-executorch/")
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

_ = vector_store.add_documents(documents=all_splits)

def ask_llm(state: State) -> State:
    user_query = input("query: ")

    retrieved_docs = vector_store.similarity_search(user_query)
    context = "\n\n".join([doc.page_content for doc in retrieved_docs])
    user_message = HumanMessage(f"Context:\n{context}\n\nUser question:\n{user_query}")

    # homework: try to add an intermediate step, where a prompt is created by the LLM based on the query and context, before passing it to the LLM below.

    answer_message: AIMessage = model_with_search.invoke(
        state["messages"] + [user_message]
    )

    return {
        "messages": [user_message, answer_message],
    }

)

query: Hi
answer:  Hello! How can I assist you today?
query: what's react native executorch?
answer:  React Native ExecuTorch is an on-device AI and large language model (LLM) toolkit tailored for the React Native ecosystem. It allows developers to run AI models and LLMs locally on mobile devices, leveraging Meta’s ExecuTorch AI framework. Here are some key aspects of React Native ExecuTorch:

1. **On-Device Model Execution**: It enables AI models and LLMs to operate directly on the device, ensuring that user data remains private and eliminating the need for external API calls.

2. **Cost-Effective**: By performing computations on-device, it reduces dependency on cloud infrastructure, which can lower server costs and minimize latency.

3. **Privacy-Focused**: By executing models locally, it ensures that data stays on the device, offering maximum privacy for users.

4. **Developer-Friendly**: With a declarative API for on-device inference, developers do not need extensive AI expertise to implement these models in their React Native applications.

5. **Part of PyTorch Edge**: It is built on the ExecuTorch foundation, which is part of the PyTorch Edge ecosystem, enabling efficient AI deployment for cross-platform applications in React Native.

This toolkit is ideal for developers looking to integrate AI capabilities within their React Native apps without relying on external cloud services.
query: Can I see it in action?
answer:  To see React Native ExecuTorch in action, you can download and use "Private Mind," which is an on-device AI chatbot. This application operates entirely offline and showcases the capabilities of React Native ExecuTorch. Here’s how you can interact with it:

1. **Download and Install**: You can download Private Mind from the App Store or relevant app distribution platforms.

2. **Features to Explore**:
   - **Chat Freely**: Engage in unrestricted conversations with the AI chatbot.
   - **Privacy Assurance**: Experience the privacy benefits as all data processing occurs locally on your device.
   - **Model Testing and Benchmarking**: Browse, test, and evaluate local language models supported by React Native ExecuTorch.
   - **Customization**: Customize AI assistants to fit your workflow and personal style.

This application serves as a practical demonstration of how React Native ExecuTorch enables the deployment of AI models on mobile devices. It illustrates the privacy, cost-effectiveness, and variety of models you can leverage through this toolkit.

Adding long-term memory

So far, you’ve seen how to add short-term memory — the message history within a single conversation. There’s also long-term memory, which persists across workflow invocations. In this article, we’ll focus on a simple example of resuming a conversation, though the topic is much broader. You can read more about it here.

First, we need to set up a checkpointer and a store, which we’ll pass as parameters when compiling the graph:

from langgraph.checkpoint.memory import InMemorySaver
from langgraph.store.memory import InMemoryStore

checkpointer = InMemorySaver()  # used to store individual states
store = InMemoryStore()  # used to store states acrross threads
workflow = graph.compile(checkpointer=checkpointer, store=store)

To be able to resume the conversation, you need to identify it with thread_id:

config = {"recursion_limit": 100, "configurable": {"thread_id": "default-session"}}

Now, you pass the config when invoking the workflow, and you can resume thanks to get_state:

# we start a thread
workflow.invoke(
    {"iteration": 0},
    config=config,
)

# now we resume the conversation (thread) in another invocation
workflow.invoke(
    workflow.get_state(config),  # last state of the agent
    config=config,
)

# homework: modify the agent so that it stores the state externally and loads the summarization or truncated history of your previous discussions when you run the script and resumes from there.

Your script should look like this, with the part added in this tutorial shown below:

from langgraph.checkpoint.memory import InMemorySaver
from langgraph.store.memory import InMemoryStore

checkpointer = InMemorySaver()
store = InMemoryStore()
workflow = graph.compile(checkpointer=checkpointer, store=store)

config = {"recursion_limit": 100, "configurable": {"thread_id": "default-session"}}

workflow.invoke(
    {"iteration": 0},
    config=config,
)

workflow.invoke(
    workflow.get_state(config),
    config=config,
)

Congrats! You’ve finished building your agentic AI chatbot in LangGraph.

What’s next in the series?

In part 4, we’re diving into a real-world agentic system we built to identify marketing opportunities. This case study offers a hands-on look at how a LangGraph-based system can help your team in practice. So, be sure to check it out here.

We are Software Mansion — software development consultants, a team of React Native core contributors, multimedia and AI experts. Drop us a line at projects@swmansion.com and let’s find out how we can help you with your project.

Building Agents With LangGraph Part 3/4: RAG and Long-Term Memory

Adding Retrieval Augmented Generation to our LangGraph chatbot

Adding long-term memory

What’s next in the series?