Overview
LangChainβs streaming system lets you surface live feedback from agent runs to your application. Whatβs possible with LangChain streaming:- Stream agent progress β get state updates after each agent step.
- Stream LLM tokens β stream language model tokens as theyβre generated.
- Stream thinking / reasoning tokens β surface model reasoning as itβs generated.
- Stream custom updates β emit user-defined signals (e.g.,
"Fetched 10/100 records"). - Stream multiple modes β choose from
updates(agent progress),messages(LLM tokens + metadata), orcustom(arbitrary user data).
Supported stream modes
Pass one or more of the following stream modes as a list to thestream or astream methods:
| Mode | Description |
|---|---|
updates | Streams state updates after each agent step. If multiple updates are made in the same step (e.g., multiple nodes are run), those updates are streamed separately. |
messages | Streams tuples of (token, metadata) from any graph nodes where an LLM is invoked. |
custom | Streams custom data from inside your graph nodes using the stream writer. |
Agent progress
To stream agent progress, use thestream or astream methods with stream_mode="updates". This emits an event after every agent step.
For example, if you have an agent that calls a tool once, you should see the following updates:
- LLM node:
AIMessagewith tool call requests - Tool node:
ToolMessagewith execution result - LLM node: Final AI response
Streaming agent progress
Output
LLM tokens
To stream tokens as they are produced by the LLM, usestream_mode="messages". Below you can see the output of the agent streaming tool calls and the final response.
Streaming LLM tokens
Output
Custom updates
To stream updates from tools as they are executed, you can useget_stream_writer.
Streaming custom updates
Output
If you add
get_stream_writer inside your tool, you wonβt be able to invoke the tool outside of a LangGraph execution context.Stream multiple modes
You can specify multiple streaming modes by passing stream mode as a list:stream_mode=["updates", "custom"].
The streamed outputs will be tuples of (mode, chunk) where mode is the name of the stream mode and chunk is the data streamed by that mode.
Streaming multiple modes
Output
Common patterns
Below are examples showing common use cases for streaming.Streaming thinking / reasoning tokens
Some models perform internal reasoning before producing a final answer. You can stream these thinking / reasoning tokens as theyβre generated by filtering standard content blocks for thetype "reasoning".
Reasoning output must be enabled on the model.See the reasoning section and your providerβs integration page for configuration details.To quickly check a modelβs reasoning support, see models.dev.
stream_mode="messages" and filter for reasoning content blocks:
Output
thinking blocks, OpenAI reasoning summaries, etc.) into a standard "reasoning" content block type via the content_blocks property.
To stream reasoning tokens directly from a chat model (without an agent), see streaming with chat models.
Streaming tool calls
You may want to stream both:- Partial JSON as tool calls are generated
- The completed, parsed tool calls that are executed
stream_mode="messages" will stream incremental message chunks generated by all LLM calls in the agent. To access the completed messages with parsed tool calls:
- If those messages are tracked in the state (as in the model node of
create_agent), usestream_mode=["messages", "updates"]to access completed messages through state updates (demonstrated below). - If those messages are not tracked in the state, use custom updates or aggregate the chunks during the streaming loop (next section).
Refer to the section below on streaming from sub-agents if your agent includes multiple LLMs.
Output
Accessing completed messages
In some cases, completed messages are not reflected in state updates. If you have access to the agent internals, you can use custom updates to access these messages during streaming. Otherwise, you can aggregate message chunks in the streaming loop (see below).
Consider the below example, where we incorporate a stream writer into a simplified guardrail middleware. This middleware demonstrates tool calling to generate a structured βsafe / unsafeβ evaluation (one could also use structured outputs for this):
Output
Streaming with human-in-the-loop
To handle human-in-the-loop interrupts, we build on the above example:- We configure the agent with human-in-the-loop middleware and a checkpointer
- We collect interrupts generated during the
"updates"stream mode - We respond to those interrupts with a command
Output
Output
Output
Streaming from sub-agents
When there are multiple LLMs at any point in an agent, itβs often necessary to disambiguate the source of messages as they are generated. To do this, pass aname to each agent when creating it. This name is then available in metadata via the lc_agent_name key when streaming in "messages" mode.
Below, we update the streaming tool calls example:
- We replace our tool with a
call_weather_agenttool that invokes an agent internally - We add a
nameto each agent - We specify
subgraphs=Truewhen creating the stream - Our stream processing is identical to before, but we add logic to keep track of what agent is active using
create_agentβsnameparameter
Output
Disable streaming
In some applications you might need to disable streaming of individual tokens for a given model. This is useful when:- Working with multi-agent systems to control which agents stream their output
- Mixing models that support streaming with those that do not
- Deploying to LangSmith and wanting to prevent certain model outputs from being streamed to the client
streaming=False when initializing the model.
Not all chat model integrations support the
streaming parameter. If your model doesnβt support it, use disable_streaming=True instead. This parameter is available on all chat models via the base class.Related
- Frontend streaming β Build React UIs with
useStreamfor real-time agent interactions - Streaming with chat models β Stream tokens directly from a chat model without using an agent or graph
- Reasoning with chat models β Configure and access reasoning output from chat models
- Standard content blocks β Understand the normalized content block format used for reasoning, text, and other content types
- Streaming with human-in-the-loop β Stream agent progress while handling interrupts for human review
- LangGraph streaming β Advanced streaming options including
values,debugmodes, and subgraph streaming
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

