LangChain
The go-to framework for wiring up LLM apps with chains, agents, and retrieval
Use Cases
Architecture
Why It Exists
The moment an LLM app goes beyond a single API call, the glue code starts piling up. Retrieval logic, prompt templates, output parsers, retry handling, agent loops, evaluation scripts. Every team was building this stuff from scratch before Harrison Chase released LangChain in late 2022.
Early LangChain deserved a lot of the criticism it got. Too many abstractions. Simple LLM calls buried under layers of indirection. I've seen teams rip it out after realizing they'd spent more time fighting the framework than building features. Fair enough.
But the ecosystem has changed a lot since then. Modern LangChain (2025-2026) is genuinely more modular. LangChain Core provides the basics: Runnables, prompt templates, output parsers. LangGraph handles the hard stuff like agent orchestration. LangSmith handles production observability. Teams can pull in just the piece they need and ignore the rest.
How It Works
LangChain Core is built around the Runnable protocol. Any component that takes input and produces output is a Runnable. Runnables expose three methods: invoke() for single inputs, batch() for multiple inputs, and stream() for streaming output. Compose them with the pipe operator.
A typical RAG chain looks like this: retriever | prompt | llm | parser. The retriever fetches relevant documents, the prompt template slots the question and context together, the LLM generates an answer, and the parser pulls out structured output. Each step's output feeds directly into the next step's input. Clean and predictable.
LangGraph is where things get interesting. It models agent workflows as state machines. Nodes are Python functions that transform state. Edges define transitions, including conditional edges that branch based on state values. The graph carries a persistent state object through each node.
This matters because it enables patterns that linear chains simply can't handle. Agent loops where the LLM calls a tool, looks at the result, and decides whether to try another tool. Human-in-the-loop flows where execution pauses and waits for user input. Parallel branches that run multiple retrievers at the same time. These are real production patterns, not academic exercises.
LangSmith is a SaaS platform (self-hosted option available) that traces every step in a LangChain or LangGraph application. Each trace shows the full execution tree: what prompts were sent, what came back, how long each step took, how many tokens were burned. Traces can be marked as correct or incorrect, evaluation datasets can be built, and automated evals can compare different prompt versions or models.
I'll be honest: LangSmith is probably the strongest argument for using LangChain at all. LLM debugging without tracing is miserable. A wrong output with no idea which of five steps produced it is the result.
Architecture Deep Dive
Document Processing Pipeline: LangChain ships 160+ document loaders (PDF, HTML, Notion, Confluence, GitHub, S3, and more) that normalize content into a standard Document object with page_content and metadata. Text splitters chunk documents using different strategies. RecursiveCharacterTextSplitter is the default and usually the right starting point. It tries paragraph boundaries first, then sentences, then characters. Chunks get embedded and stored in any of 50+ supported vector stores.
One practical detail: document loader quality matters more than most people realize. A bad PDF parser that mangles tables or drops headers will poison the entire RAG pipeline downstream. Test loaders on real documents, not just clean examples.
Agent Architecture: A typical LangGraph agent has three nodes: an LLM node that decides what to do, a tool execution node that runs the chosen tool, and a routing edge that checks whether the LLM wants to call another tool or return a final answer. State holds the conversation history, available tools, and intermediate results. When the LLM returns a tool call, the router sends execution to the tool node. When it returns a final answer, the router sends it to the output.
This loop is simple on paper. In practice, getting an agent to reliably choose the right tool and stop looping at the right time is the real engineering challenge. Prompt engineering and guardrails matter more than the framework here.
Memory and Persistence: LangGraph supports checkpointing, saving the full state of a graph execution to a database (PostgreSQL, SQLite, or custom backends). This unlocks long-running conversations that survive server restarts, human-in-the-loop workflows where execution might pause for days, and time-travel debugging where execution can be replayed from any checkpoint.
Checkpointing sounds like a nice-to-have until the first time a production conversation crashes mid-flow and the user has to start over. Then it becomes essential.
When to Use LangChain vs. Direct SDK
Use the provider SDK directly for simple, single-turn LLM calls. When the lowest possible latency overhead matters. When the team prefers explicit control and doesn't want another dependency.
Use LangChain when the pipeline has 3+ steps (retrieval, processing, generation). When it needs to plug into multiple vector stores or LLM providers. When production tracing through LangSmith matters. When building anything that uses tools.
Use LangGraph when the workflow has conditional logic based on LLM output. When agents need to loop (try a tool, evaluate, try another). When persistent state across sessions is a requirement. When building multi-agent systems.
The honest answer: teams unsure whether they need LangChain probably don't yet. Start with direct SDK calls. When the same retrieval-prompt-parse boilerplate gets written for the third time, that is when LangChain starts earning its keep.
Pros
- • Widest integration ecosystem out there (150+ LLMs, 50+ vector stores, 100+ tools)
- • LangGraph lets you build agent workflows with cycles, branching, and persistent state
- • LangSmith gives you real observability: tracing, evaluation, and debugging in production
- • Good built-in abstractions for RAG, agents, and structured output
- • Very active community, ships fast, docs are solid
Cons
- • Abstraction layers make debugging harder than you'd expect
- • Breaking changes between versions happen more than they should
- • Overkill for simple use cases. Sometimes a direct API call is all you need
- • Performance overhead from chain composition and serialization
- • LangGraph's state machine model takes time to click
When to use
- • Your LLM pipeline has multiple steps, tools, and data sources
- • You need pre-built integrations with vector stores, LLMs, or tools
- • You want to prototype LLM apps fast using well-known patterns
- • You're building multi-agent systems that need state management and tool coordination
When NOT to use
- • You're making a single LLM call with a static prompt. Just use the provider SDK.
- • Latency matters so much that framework overhead is a problem
- • Your team wants minimal dependencies and full control over every LLM interaction
- • You need stability more than you need the latest features
Key Points
- •LCEL uses pipe syntax (chain = prompt | llm | parser) to compose chains. Each component implements the Runnable interface with invoke/stream/batch methods, delivering streaming and parallel execution without extra work.
- •LangGraph extends LangChain with graph-based execution. Nodes are functions, edges define control flow. Unlike simple chains, LangGraph supports cycles (agent loops), conditional branching, and persistent state.
- •LangSmith traces every LLM call, retrieval, and tool invocation with inputs, outputs, latency, and token counts. When a 5-step chain gives a wrong answer at 2 AM, this is how to figure out which step broke.
- •The retriever abstraction puts vector stores, BM25 indexes, SQL databases, and custom retrieval behind one interface. Multiple sources can be combined (ensemble retrieval) or contextual compression can be added on top.
- •with_structured_output() uses function calling or JSON schema to force LLM responses into a Pydantic model. If downstream code expects typed data, this is non-negotiable.
Common Mistakes
- ✗Reaching for LangChain when it is not needed. For a single LLM call with a static prompt, the provider SDK is simpler and faster. LangChain pays off in composition and integration, not simple calls.
- ✗Skipping LangSmith. LLM chains are non-deterministic. Without tracing, figuring out why a multi-step chain produced garbage is basically guesswork.
- ✗Using linear chains when the logic actually needs branching. If the next step depends on what the LLM said, LangGraph's conditional edges are the right tool, not a sequential chain.
- ✗Ignoring token costs in long chains. Each step may call the LLM. A 4-step chain at 2K tokens per call costs 4x a single call. Track total tokens per request with LangSmith.
- ✗Not pinning integration package versions. LangChain ships fast, and integrations can break between minor versions. Pin langchain-openai, langchain-anthropic, and friends to exact versions.