Embedding: Definition and Examples

An embedding is a numerical representation of text, image, or other data type as a vector of numbers, enabling AI models to measure semantic similarity between different items.

Full definition

An embedding (or vector embedding) is a fundamental technique in artificial intelligence that consists of transforming complex data — such as words, sentences, images, or documents — into fixed-size numerical vectors. These vectors capture the meaning and semantic relationships between items: two texts close in meaning will have vectors close in the vector space.

Concretely, an embedding model analyzes a text and produces a list of numbers (e.g., 1536 dimensions for OpenAI's text-embedding-ada-002). These numbers are not individually interpretable by humans, but their arrangement encodes the meaning of the text. One can then calculate the distance or cosine similarity between two vectors to determine how semantically close two texts are.

Embeddings are at the heart of many modern applications: semantic search (finding relevant documents even without exact keyword matches), recommendation systems, text classification, and especially RAG (Retrieval-Augmented Generation), which enriches an LLM's responses with external knowledge.

In prompt engineering, understanding embeddings is essential for designing systems that efficiently leverage large knowledge bases. Rather than sending the entire context into a prompt, embeddings are used to identify the most relevant passages and provide only the necessary information to the model.

Etymology

The term "embedding" comes from the English "to embed" (to insert, to set in). In mathematics, an embedding denotes an injection of one structure into another that preserves certain properties. In AI, the term was popularized by work on Word2Vec (2013) by Tomas Mikolov at Google, which demonstrated that words could be represented in a continuous vector space where semantic relationships are preserved.

Concrete examples

Semantic search in a knowledge base

You are an assistant that answers based solely on the following documents, retrieved via semantic search using embeddings. Relevant documents:
[{retrieved documents}]

User question: {question}

Answer by citing your sources.

Automatic classification of support tickets

Here is a customer support ticket. Using semantic proximity with our predefined categories (whose embeddings are pre-computed), this ticket has been classified into the category '{category}'. Write an appropriate response for this category.

Duplicate detection in a FAQ

Here is a new question submitted by a user: '{question}'. The existing questions closest according to cosine similarity of their embeddings are: {list}. Determine if this question is a duplicate or if it deserves a new entry.

Practical usage

In prompt engineering, embeddings are primarily used to power RAG systems: you vectorize your document base, then for each user query, you retrieve the most relevant passages to inject into the prompt. This allows building specialized assistants capable of answering on private or recent data without fine-tuning. The choice of embedding model, text chunk size, and splitting strategy directly influence the quality of the answers obtained.

Related concepts

RAG (Retrieval-Augmented Generation)Cosine similarityVector databaseTokenization

FAQ

What is the difference between an embedding and a token?

A token is a fragment of text (word or subword) split by the model's tokenizer. An embedding is the numerical vector representation of that token or a set of tokens. The tokenizer splits the text, then the embedding model transforms these tokens into vectors that capture their semantic meaning.

Do I need a different embedding model from the LLM used to generate responses?

Yes, they are generally two distinct models. The embedding model (like OpenAI's text-embedding-3-small or Voyage AI) specializes in creating semantic vectors. The LLM (like Claude or GPT) specializes in text generation. In a RAG pipeline, the embedding model is used for retrieval and the LLM for generating the response.

How much does using embeddings cost?

Embeddings are significantly cheaper than text generation with an LLM. For example, vectorizing one million tokens typically costs a few euro cents. The main cost lies in storing the vectors (vector database like Pinecone, Weaviate, or pgvector) and computing similarity at scale, but these costs remain modest for most use cases.

How to use this prompt

Copy the prompt with the button above.
Paste it into ChatGPT, Claude or your favorite AI assistant.
Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

Prompt library Learn prompting Prompt builder Prompt optimizer

More definitions

Emotional Prompting: Definition and Examples

A prompt engineering technique that involves incorporating emotional elements into instructions given to an AI model to improve the quality and

Encoder Decoder: Definition and Examples

Neural network architecture composed of two complementary modules: an encoder that compresses the input into an intermediate representation, and a decoder that generates the output from this representation.

European AI Act: Definition and Examples

The European AI Act is the world's first regulatory framework dedicated to artificial intelligence, adopted by the European Union to govern the development,

Existential AI Risk: Definition and Examples

Existential AI risk refers to the possibility that advanced artificial intelligence could cause human extinction or irreversible degradation

F1 Score: Definition and Examples

The F1 Score is an evaluation metric that combines precision and recall into a single value, calculated as their harmonic mean. It is particularly useful for assessing model performance on imbalanced datasets.

FastText: Definition and Examples

FastText is an open-source library developed by Facebook AI Research (FAIR) for learning word vector representations and text classification.

Get new prompts every week

Join our newsletter.