Embedding: Definition and Examples
An embedding is a numerical representation of text, image, or other data type as a vector of numbers, enabling AI models to measure semantic similarity between different items.
Full definition
An embedding (or vector embedding) is a fundamental technique in artificial intelligence that consists of transforming complex data — such as words, sentences, images, or documents — into fixed-size numerical vectors. These vectors capture the meaning and semantic relationships between items: two texts close in meaning will have vectors close in the vector space.
Concretely, an embedding model analyzes a text and produces a list of numbers (e.g., 1536 dimensions for OpenAI's text-embedding-ada-002). These numbers are not individually interpretable by humans, but their arrangement encodes the meaning of the text. One can then calculate the distance or cosine similarity between two vectors to determine how semantically close two texts are.
Embeddings are at the heart of many modern applications: semantic search (finding relevant documents even without exact keyword matches), recommendation systems, text classification, and especially RAG (Retrieval-Augmented Generation), which enriches an LLM's responses with external knowledge.
In prompt engineering, understanding embeddings is essential for designing systems that efficiently leverage large knowledge bases. Rather than sending the entire context into a prompt, embeddings are used to identify the most relevant passages and provide only the necessary information to the model.
Etymology
The term "embedding" comes from the English "to embed" (to insert, to set in). In mathematics, an embedding denotes an injection of one structure into another that preserves certain properties. In AI, the term was popularized by work on Word2Vec (2013) by Tomas Mikolov at Google, which demonstrated that words could be represented in a continuous vector space where semantic relationships are preserved.
Concrete examples
Semantic search in a knowledge base
You are an assistant that answers based solely on the following documents, retrieved via semantic search using embeddings. Relevant documents:
[{retrieved documents}]
User question: {question}
Answer by citing your sources.
Automatic classification of support tickets
Here is a customer support ticket. Using semantic proximity with our predefined categories (whose embeddings are pre-computed), this ticket has been classified into the category '{category}'. Write an appropriate response for this category.
Duplicate detection in a FAQ
Here is a new question submitted by a user: '{question}'. The existing questions closest according to cosine similarity of their embeddings are: {list}. Determine if this question is a duplicate or if it deserves a new entry.
Practical usage
In prompt engineering, embeddings are primarily used to power RAG systems: you vectorize your document base, then for each user query, you retrieve the most relevant passages to inject into the prompt. This allows building specialized assistants capable of answering on private or recent data without fine-tuning. The choice of embedding model, text chunk size, and splitting strategy directly influence the quality of the answers obtained.
Related concepts
FAQ
What is the difference between an embedding and a token?
Do I need a different embedding model from the LLM used to generate responses?
How much does using embeddings cost?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
Gemini Gem: Definition and Creation (Google)
Understand Google's Gemini Gems: preconfigured Gemini assistants. Creation, Google Workspace integration, comparison with Custom GPT and Claude Skills.
Gemini Pro: Definition and Examples
Gemini Pro is a multimodal language model developed by Google DeepMind, designed to handle complex tasks of reasoning, text generation,
Grouped Query Attention: Definition and Examples
Attention mechanism that groups multiple query heads to share the same keys and values, thereby reducing memory and computational cost during inference.
Model Registry: Definition and Examples
A Model Registry is a centralized system for storing, versioning, and managing machine learning models throughout their lifecycle, from training to production deployment.
Runway ML: Definition and Examples
Runway ML is a generative AI platform specialized in creating and editing visual content (video, image, 3D) from text prompts or multimodal inputs.
Semantic Cache: Definition and Examples
A semantic cache is a caching system that stores and retrieves AI model responses based on the semantic similarity of queries, rather than exact word matches.
Get new prompts every week
Join our newsletter.