Whisper: Definition and Examples
Whisper is an automatic speech recognition (ASR) model developed by OpenAI, capable of transcribing and translating speech into text with remarkable accuracy in many languages.
Full definition
Whisper is an automatic speech recognition (ASR) system created by OpenAI and released as open source in September 2022. Trained on over 680,000 hours of multilingual audio collected from the web, it can transcribe speech to text, automatically detect the spoken language, and translate from many languages into English.
Unlike traditional ASR systems that require language- or domain-specific training, Whisper adopts a large-scale multitask supervised approach. Its architecture is based on an encoder-decoder Transformer model: audio is divided into 30-second segments, converted to a log-Mel spectrogram, processed by the encoder, and then the decoder generates the corresponding text. This architecture gives it exceptional robustness to accents, background noise, and technical jargon.
Whisper is available in several model sizes (tiny, base, small, medium, large), allowing a trade-off between accuracy and speed. The large model achieves near-human transcription performance on many benchmarks. Its open-source nature has spawned a rich ecosystem: faster-whisper for optimized inference, whisper.cpp for local CPU execution, and integrations into many productivity tools.
In the field of prompt engineering, Whisper plays a key role by converting voice instructions into text usable by language models. It thus serves as an essential link in multimodal pipelines where voice becomes a natural interface for interacting with AI.
Etymology
The name "Whisper" refers to the model's ability to capture and transcribe even the subtlest speech. It also evokes the system's discretion and precision, capable of operating effectively even in difficult audio conditions.
Concrete examples
Transcription of a meeting to feed a summary by LLM
Transcribe this meeting audio recording with Whisper, then summarize the key decisions and action items using the following format: Decision | Responsible | Deadline.
Creating multilingual subtitles for video content
Use Whisper to transcribe this video into French, then translate the subtitles into English and Spanish while preserving the timecode for each segment.
Voice interface for an AI assistant
Set up a voice pipeline: audio capture → Whisper transcription → send text to Claude → speech synthesis of the response. The user should be able to ask questions about their documents by speaking naturally.
Practical usage
In prompt engineering, Whisper is primarily used to convert voice inputs into text before submitting them to an LLM. It enables building voice-to-text-to-AI pipelines where users dictate their prompts instead of typing them. You can also use it to transcribe large audio corpora (podcasts, interviews, lectures) for indexing and querying via a RAG system.
Related concepts
FAQ
Is Whisper free and open source?
How accurate is Whisper in French?
Can Whisper be used in real time?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
World Model: Definition and Examples
A world model is an internal representation that an AI system builds of the external world, allowing it to simulate, predict, and reason about the consequences of its actions without having to execute them in reality.
Zero-Shot Prompting: Definition and Examples
Zero-shot prompting gives the AI an instruction without any examples. Discover when and how to use this technique.
A2A Agent To Agent: Definition and Examples
A2A (Agent-to-Agent) is an open protocol developed by Google that allows autonomous AI agents to communicate, collaborate, and delegate tasks between each other.
Accuracy: Definition and Examples
Accuracy (or exactness) measures the proportion of correct answers produced by an AI model compared to all generated responses. It is one of the fundamental metrics for evaluating the reliability of an artificial intelligence system.
Agent: Definition and Examples
An agent is an AI system capable of acting autonomously to accomplish complex tasks, planning its actions, using tools, and…
Agentic Workflow: Definition and Examples
An agentic workflow is a workflow in which one or more AI agents autonomously make decisions, chain actions, and adapt
Get new prompts every week
Join our newsletter.