P

Embedding: Definition and Examples

An embedding is a numerical representation of text, image, or other data type as a vector of numbers, enabling AI models to measure semantic similarity between different items.

Full definition

An embedding (or vector embedding) is a fundamental technique in artificial intelligence that consists of transforming complex data — such as words, sentences, images, or documents — into fixed-size numerical vectors. These vectors capture the meaning and semantic relationships between items: two texts close in meaning will have vectors close in the vector space.

Concretely, an embedding model analyzes a text and produces a list of numbers (e.g., 1536 dimensions for OpenAI's text-embedding-ada-002). These numbers are not individually interpretable by humans, but their arrangement encodes the meaning of the text. One can then calculate the distance or cosine similarity between two vectors to determine how semantically close two texts are.

Embeddings are at the heart of many modern applications: semantic search (finding relevant documents even without exact keyword matches), recommendation systems, text classification, and especially RAG (Retrieval-Augmented Generation), which enriches an LLM's responses with external knowledge.

In prompt engineering, understanding embeddings is essential for designing systems that efficiently leverage large knowledge bases. Rather than sending the entire context into a prompt, embeddings are used to identify the most relevant passages and provide only the necessary information to the model.

Etymology

The term "embedding" comes from the English "to embed" (to insert, to set in). In mathematics, an embedding denotes an injection of one structure into another that preserves certain properties. In AI, the term was popularized by work on Word2Vec (2013) by Tomas Mikolov at Google, which demonstrated that words could be represented in a continuous vector space where semantic relationships are preserved.

Concrete examples

Semantic search in a knowledge base

You are an assistant that answers based solely on the following documents, retrieved via semantic search using embeddings. Relevant documents:
[{retrieved documents}]

User question: {question}

Answer by citing your sources.

Automatic classification of support tickets

Here is a customer support ticket. Using semantic proximity with our predefined categories (whose embeddings are pre-computed), this ticket has been classified into the category '{category}'. Write an appropriate response for this category.

Duplicate detection in a FAQ

Here is a new question submitted by a user: '{question}'. The existing questions closest according to cosine similarity of their embeddings are: {list}. Determine if this question is a duplicate or if it deserves a new entry.

Practical usage

In prompt engineering, embeddings are primarily used to power RAG systems: you vectorize your document base, then for each user query, you retrieve the most relevant passages to inject into the prompt. This allows building specialized assistants capable of answering on private or recent data without fine-tuning. The choice of embedding model, text chunk size, and splitting strategy directly influence the quality of the answers obtained.

Related concepts

RAG (Retrieval-Augmented Generation)Cosine similarityVector databaseTokenization

FAQ

What is the difference between an embedding and a token?
A token is a fragment of text (word or subword) split by the model's tokenizer. An embedding is the numerical vector representation of that token or a set of tokens. The tokenizer splits the text, then the embedding model transforms these tokens into vectors that capture their semantic meaning.
Do I need a different embedding model from the LLM used to generate responses?
Yes, they are generally two distinct models. The embedding model (like OpenAI's text-embedding-3-small or Voyage AI) specializes in creating semantic vectors. The LLM (like Claude or GPT) specializes in text generation. In a RAG pipeline, the embedding model is used for retrieval and the LLM for generating the response.
How much does using embeddings cost?
Embeddings are significantly cheaper than text generation with an LLM. For example, vectorizing one million tokens typically costs a few euro cents. The main cost lies in storing the vectors (vector database like Pinecone, Weaviate, or pgvector) and computing similarity at scale, but these costs remain modest for most use cases.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.