P

Streaming: Definition and Examples

Streaming is a technique for transmitting AI model responses in real time, token by token, rather than waiting for the complete generation before display.

Full definition

Streaming, in the context of generative artificial intelligence, refers to the transmission mode where the response of a language model (LLM) is sent progressively to the user, word by word or token by token, as it is generated. Unlike the classic mode (called 'batch') where the user must wait for the entire response to be produced, streaming displays text in real time, creating a smooth and interactive experience.

This approach relies on communication protocols such as Server-Sent Events (SSE) or WebSockets, which maintain an open connection between the model server and the client. Each generated text fragment is transmitted immediately, significantly reducing the perceived wait time for the user, even though the total generation time remains the same.

Streaming has become a standard in modern conversational interfaces like ChatGPT, Claude, or Gemini. It not only improves user experience by providing a sense of responsiveness but also allows developers to implement advanced features: progressive display with formatting, early interruption of generation, or intermediate processing of received tokens.

For developers integrating AI APIs, streaming involves a different handling of responses: instead of receiving a single JSON object, they process a stream of events that must be assembled on the client side. Most modern SDKs (OpenAI, Anthropic, Google) offer dedicated helpers to simplify this management.

Etymology

The term 'streaming' comes from English 'stream' (flow, current). Borrowed from the multimedia domain where it refers to the continuous transmission of audio or video content without prior download, it has been adopted in generative AI to describe the continuous and progressive transmission of tokens generated by a language model.

Concrete examples

API integration with streaming enabled

Use the Claude API with stream=true to display the response progressively in my chat interface.

User experience improvement

Generate a detailed analysis of this document. I want to see your response appear in real time so I can start reading while you continue writing.

Developing an interruptible chatbot

Implement a React component that displays streaming responses and allows the user to cancel the ongoing generation with a Stop button.

Practical usage

In prompt engineering, streaming does not affect response quality but transforms the interaction experience. Enable it systematically in your conversational applications to reduce perceived wait time. On the development side, always include a cancellation mechanism and an accumulation buffer to properly handle progressive Markdown rendering.

Related concepts

TokenLatencyServer-Sent Events (SSE)API

FAQ

Does streaming change the quality of AI responses?
No, streaming is only a transmission mode. The model generates exactly the same response as in batch mode; only the way it is transmitted to the user changes. Tokens are sent as they are generated instead of being grouped into a single response.
Does streaming consume more tokens or cost more?
No, the number of tokens consumed is identical in streaming and classic modes. Billing is the same for all major providers (OpenAI, Anthropic, Google). Streaming adds no extra cost; it only changes the delivery protocol of the response.
Can streaming be used with tools (function calling / tool use)?
Yes, modern APIs support streaming combined with tool use. Tool calls are transmitted progressively, and the developer can detect when a tool call is in progress to adapt the display. This allows, for example, showing a loading indicator during tool execution, then resuming streaming text display.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.