Streaming: Definition and Examples
Streaming is a technique for transmitting AI model responses in real time, token by token, rather than waiting for the complete generation before display.
Full definition
Streaming, in the context of generative artificial intelligence, refers to the transmission mode where the response of a language model (LLM) is sent progressively to the user, word by word or token by token, as it is generated. Unlike the classic mode (called 'batch') where the user must wait for the entire response to be produced, streaming displays text in real time, creating a smooth and interactive experience.
This approach relies on communication protocols such as Server-Sent Events (SSE) or WebSockets, which maintain an open connection between the model server and the client. Each generated text fragment is transmitted immediately, significantly reducing the perceived wait time for the user, even though the total generation time remains the same.
Streaming has become a standard in modern conversational interfaces like ChatGPT, Claude, or Gemini. It not only improves user experience by providing a sense of responsiveness but also allows developers to implement advanced features: progressive display with formatting, early interruption of generation, or intermediate processing of received tokens.
For developers integrating AI APIs, streaming involves a different handling of responses: instead of receiving a single JSON object, they process a stream of events that must be assembled on the client side. Most modern SDKs (OpenAI, Anthropic, Google) offer dedicated helpers to simplify this management.
Etymology
The term 'streaming' comes from English 'stream' (flow, current). Borrowed from the multimedia domain where it refers to the continuous transmission of audio or video content without prior download, it has been adopted in generative AI to describe the continuous and progressive transmission of tokens generated by a language model.
Concrete examples
API integration with streaming enabled
Use the Claude API with stream=true to display the response progressively in my chat interface.
User experience improvement
Generate a detailed analysis of this document. I want to see your response appear in real time so I can start reading while you continue writing.
Developing an interruptible chatbot
Implement a React component that displays streaming responses and allows the user to cancel the ongoing generation with a Stop button.
Practical usage
In prompt engineering, streaming does not affect response quality but transforms the interaction experience. Enable it systematically in your conversational applications to reduce perceived wait time. On the development side, always include a cancellation mechanism and an accumulation buffer to properly handle progressive Markdown rendering.
Related concepts
FAQ
Does streaming change the quality of AI responses?
Does streaming consume more tokens or cost more?
Can streaming be used with tools (function calling / tool use)?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
Synthetic Media: Definition and Examples
Synthetic media refers to any content — text, image, audio, or video — generated or manipulated by artificial intelligence algorithms, particularly through
System Prompt: Definition and Examples
The system prompt is an initial hidden instruction, defined by the developer, that configures the behavior, tone, and limits of an AI model before
Temperature (AI): Definition and Examples
Temperature is a parameter that controls the degree of randomness and creativity in AI responses.
Test Time Compute: Definition and Examples
Test Time Compute refers to the computing power used by an AI model during inference (response generation), as opposed to the resources consumed during training.
Text Classification: Definition and Examples
Text classification is a natural language processing (NLP) technique that assigns one or more categories to a given text.
Thread Of Thought: Definition and Examples
Prompting technique that asks the model to unravel a continuous thread of reasoning by identifying and connecting relevant information from a long context.
Get new prompts every week
Join our newsletter.