LangChain

Why It Exists

The moment an LLM app goes beyond a single API call, the glue code starts piling up. Retrieval logic, prompt templates, output parsers, retry handling, agent loops, evaluation scripts. Every team was building this stuff from scratch before Harrison Chase released LangChain in late 2022.

Early LangChain deserved a lot of the criticism it got. Too many abstractions. Simple LLM calls buried under layers of indirection. I've seen teams rip it out after realizing they'd spent more time fighting the framework than building features. Fair enough.

But the ecosystem has changed a lot since then. Modern LangChain (2025-2026) is genuinely more modular. LangChain Core provides the basics: Runnables, prompt templates, output parsers. LangGraph handles the hard stuff like agent orchestration. LangSmith handles production observability. Teams can pull in just the piece they need and ignore the rest.

How It Works

LangChain Core is built around the Runnable protocol. Any component that takes input and produces output is a Runnable. Runnables expose three methods: invoke() for single inputs, batch() for multiple inputs, and stream() for streaming output. Compose them with the pipe operator.

A typical RAG chain looks like this: retriever | prompt | llm | parser. The retriever fetches relevant documents, the prompt template slots the question and context together, the LLM generates an answer, and the parser pulls out structured output. Each step's output feeds directly into the next step's input. Clean and predictable.

LangGraph is where things get interesting. It models agent workflows as state machines. Nodes are Python functions that transform state. Edges define transitions, including conditional edges that branch based on state values. The graph carries a persistent state object through each node.

This matters because it enables patterns that linear chains simply can't handle. Agent loops where the LLM calls a tool, looks at the result, and decides whether to try another tool. Human-in-the-loop flows where execution pauses and waits for user input. Parallel branches that run multiple retrievers at the same time. These are real production patterns, not academic exercises.

LangSmith is a SaaS platform (self-hosted option available) that traces every step in a LangChain or LangGraph application. Each trace shows the full execution tree: what prompts were sent, what came back, how long each step took, how many tokens were burned. Traces can be marked as correct or incorrect, evaluation datasets can be built, and automated evals can compare different prompt versions or models.

I'll be honest: LangSmith is probably the strongest argument for using LangChain at all. LLM debugging without tracing is miserable. A wrong output with no idea which of five steps produced it is the result.

Architecture Deep Dive

Document Processing Pipeline: LangChain ships 160+ document loaders (PDF, HTML, Notion, Confluence, GitHub, S3, and more) that normalize content into a standard Document object with page_content and metadata. Text splitters chunk documents using different strategies. RecursiveCharacterTextSplitter is the default and usually the right starting point. It tries paragraph boundaries first, then sentences, then characters. Chunks get embedded and stored in any of 50+ supported vector stores.

One practical detail: document loader quality matters more than most people realize. A bad PDF parser that mangles tables or drops headers will poison the entire RAG pipeline downstream. Test loaders on real documents, not just clean examples.

Agent Architecture: A typical LangGraph agent has three nodes: an LLM node that decides what to do, a tool execution node that runs the chosen tool, and a routing edge that checks whether the LLM wants to call another tool or return a final answer. State holds the conversation history, available tools, and intermediate results. When the LLM returns a tool call, the router sends execution to the tool node. When it returns a final answer, the router sends it to the output.

This loop is simple on paper. In practice, getting an agent to reliably choose the right tool and stop looping at the right time is the real engineering challenge. Prompt engineering and guardrails matter more than the framework here.

Memory and Persistence: LangGraph supports checkpointing, saving the full state of a graph execution to a database (PostgreSQL, SQLite, or custom backends). This unlocks long-running conversations that survive server restarts, human-in-the-loop workflows where execution might pause for days, and time-travel debugging where execution can be replayed from any checkpoint.

Checkpointing sounds like a nice-to-have until the first time a production conversation crashes mid-flow and the user has to start over. Then it becomes essential.

When to Use LangChain vs. Direct SDK

Use the provider SDK directly for simple, single-turn LLM calls. When the lowest possible latency overhead matters. When the team prefers explicit control and doesn't want another dependency.

Use LangChain when the pipeline has 3+ steps (retrieval, processing, generation). When it needs to plug into multiple vector stores or LLM providers. When production tracing through LangSmith matters. When building anything that uses tools.

Use LangGraph when the workflow has conditional logic based on LLM output. When agents need to loop (try a tool, evaluate, try another). When persistent state across sessions is a requirement. When building multi-agent systems.

The honest answer: teams unsure whether they need LangChain probably don't yet. Start with direct SDK calls. When the same retrieval-prompt-parse boilerplate gets written for the third time, that is when LangChain starts earning its keep.

Why It Exists

How It Works

Architecture Deep Dive

Checkpointing sounds like a nice-to-have until the first time a production conversation crashes mid-flow and the user has to start over. Then it becomes essential.

When to Use LangChain vs. Direct SDK

Use the provider SDK directly for simple, single-turn LLM calls. When the lowest possible latency overhead matters. When the team prefers explicit control and doesn't want another dependency.

Use Cases

Architecture

Why It Exists

How It Works

Architecture Deep Dive

When to Use LangChain vs. Direct SDK

Pros

Cons

When to use

When NOT to use

Key Points

Common Mistakes

Related Technologies

LangChain

Use Cases

Architecture

Why It Exists

How It Works

Architecture Deep Dive

When to Use LangChain vs. Direct SDK

Pros

Cons

When to use

When NOT to use

Key Points

Common Mistakes

Related Technologies