Getting Started with LangChain in 2026
Building Production-Ready RAG with LangChain
Retrieval-Augmented Generation (RAG) continues to be the most reliable way to ground language models in private, proprietary enterprise data. While the core concept of RAG is simple—find relevant documents and stuff them into the prompt—building a system that works reliably in production requires careful engineering.
In this comprehensive guide, we will walk through building an advanced RAG pipeline using the latest LangChain v0.3 updates, incorporating semantic routing, advanced chunking strategies, and self-querying retrievers.
Prerequisites
- Python 3.10+
- An OpenAI API Key
- A Pinecone or Chroma Vector Database instance
Step 1: Environment Setup
First, install the necessary LangChain packages. LangChain has recently modularized its ecosystem to keep the core library lightweight.
pip install langchain langchain-openai langchain-community chromadb tiktoken
Import the core modules in your Python script:
import os
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
os.environ["OPENAI_API_KEY"] = "your-api-key-here"
Step 2: Advanced Document Loading & Chunking
The biggest mistake developers make with RAG is poor chunking. If your chunks are too small, the LLM lacks context. If they are too large, you dilute the semantic meaning of the vector, leading to poor retrieval accuracy.
We will use the RecursiveCharacterTextSplitter. This algorithm looks for the largest logical separators first (like double newlines \n\n representing paragraphs), ensuring it doesn't split a sentence in half unless absolutely necessary.
# 1. Load the proprietary data
loader = PyPDFLoader("enterprise_handbook_2026.pdf")
raw_documents = loader.load()
# 2. Configure the Splitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200, # A 20% overlap ensures context isn't lost between chunks
length_function=len,
is_separator_regex=False,
)
# 3. Process the documents
chunks = text_splitter.split_documents(raw_documents)
print(f"Split document into {len(chunks)} chunks.")
Step 3: Embedding and Vector Storage
Now we need to convert these text chunks into mathematical vectors using OpenAI's latest embedding model (text-embedding-3-small), which provides incredibly high dimensionality at a low cost.
We will store these vectors in Chroma, an open-source, local vector database perfect for getting started.
# Initialize the embedding model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create the Vector Store
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)
# Create a retrieval interface
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
Note: Setting k=4 tells the database to only return the top 4 most semantically relevant chunks for any given query.
Step 4: Constructing the LCEL Pipeline
LangChain Expression Language (LCEL) is the modern way to chain components together. It automatically handles streaming, batching, and parallel execution under the hood.
We will design a prompt that strictly instructs the LLM to only use the retrieved context.
# Define the Chat Model
llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
# Define the Prompt
template = """You are an expert internal assistant for the company. Use the following pieces of retrieved context to answer the question.
If you don't know the answer based on the context, say "I don't have enough information in the handbook to answer that."
Do not make up information.
Context:
{context}
Question: {question}
Answer:"""
prompt = PromptTemplate.from_template(template)
# Format the documents to raw text
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
# Build the Chain
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
)
Step 5: Invoking the Agent
Your RAG pipeline is fully assembled! You can now query it. The chain will automatically embed the user's question, do a similarity search in Chroma, format the results, insert them into the prompt, and generate an answer.
user_question = "What is the company policy on remote work expenses?"
response = rag_chain.invoke(user_question)
print(response.content)
Next Level: Adding Memory
This basic pipeline is stateless. If you want the agent to remember conversational history, you will need to implement a create_retrieval_chain combined with a create_history_aware_retriever. But mastering this single-turn LCEL pipeline is the crucial first step to becoming a Senior AI Engineer.