P

Whisper: Definition and Examples

Whisper is an automatic speech recognition (ASR) model developed by OpenAI, capable of transcribing and translating speech into text with remarkable accuracy in many languages.

Full definition

Whisper is an automatic speech recognition (ASR) system created by OpenAI and released as open source in September 2022. Trained on over 680,000 hours of multilingual audio collected from the web, it can transcribe speech to text, automatically detect the spoken language, and translate from many languages into English.

Unlike traditional ASR systems that require language- or domain-specific training, Whisper adopts a large-scale multitask supervised approach. Its architecture is based on an encoder-decoder Transformer model: audio is divided into 30-second segments, converted to a log-Mel spectrogram, processed by the encoder, and then the decoder generates the corresponding text. This architecture gives it exceptional robustness to accents, background noise, and technical jargon.

Whisper is available in several model sizes (tiny, base, small, medium, large), allowing a trade-off between accuracy and speed. The large model achieves near-human transcription performance on many benchmarks. Its open-source nature has spawned a rich ecosystem: faster-whisper for optimized inference, whisper.cpp for local CPU execution, and integrations into many productivity tools.

In the field of prompt engineering, Whisper plays a key role by converting voice instructions into text usable by language models. It thus serves as an essential link in multimodal pipelines where voice becomes a natural interface for interacting with AI.

Etymology

The name "Whisper" refers to the model's ability to capture and transcribe even the subtlest speech. It also evokes the system's discretion and precision, capable of operating effectively even in difficult audio conditions.

Concrete examples

Transcription of a meeting to feed a summary by LLM

Transcribe this meeting audio recording with Whisper, then summarize the key decisions and action items using the following format: Decision | Responsible | Deadline.

Creating multilingual subtitles for video content

Use Whisper to transcribe this video into French, then translate the subtitles into English and Spanish while preserving the timecode for each segment.

Voice interface for an AI assistant

Set up a voice pipeline: audio capture → Whisper transcription → send text to Claude → speech synthesis of the response. The user should be able to ask questions about their documents by speaking naturally.

Practical usage

In prompt engineering, Whisper is primarily used to convert voice inputs into text before submitting them to an LLM. It enables building voice-to-text-to-AI pipelines where users dictate their prompts instead of typing them. You can also use it to transcribe large audio corpora (podcasts, interviews, lectures) for indexing and querying via a RAG system.

Related concepts

Automatic speech recognition (ASR)Automatic speech processing (audio NLP)Multimodal modelText-to-Speech (TTS)

FAQ

Is Whisper free and open source?
Yes, Whisper is released under the MIT license by OpenAI. The source code and model weights are freely available on GitHub. You can run it locally without any API cost. OpenAI also offers a paid Whisper API for those who prefer not to manage the infrastructure.
How accurate is Whisper in French?
Whisper large-v3 achieves excellent performance in French, with a word error rate (WER) comparable to leading commercial solutions. It handles regional accents and technical vocabulary well, although errors may occur on rare proper nouns or highly specialized jargon.
Can Whisper be used in real time?
The original Whisper model processes in 30-second segments, which introduces latency. However, optimized implementations like faster-whisper or whisper.cpp enable near real-time transcription, especially with smaller models (tiny, base) and GPU acceleration. For true streaming, derivative solutions like whisper-streaming exist.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.