P

SAM (Segment Anything Model): Definition and Examples

SAM (Segment Anything Model) is an image segmentation model developed by Meta AI, capable of automatically identifying and cutting out any object in an image from a simple click, a bounding box, or a text prompt.

Full definition

SAM, short for Segment Anything Model, is a foundation model for computer vision published by Meta AI Research in April 2023. Its goal is to solve image segmentation in a universal way: rather than being trained to recognize specific object categories, SAM can segment any visual element in any image, including objects it has never seen during training.

The model works through a three-component architecture: an image encoder (based on a Vision Transformer), a prompt encoder that interprets user input (point, box, mask, or text), and a lightweight mask decoder that produces the final segmentation. This architecture enables real-time interaction: the image is encoded once, then the user can quickly iterate by providing different prompts.

SAM was trained on the SA-1B dataset, one of the largest segmentation datasets ever created, containing over one billion masks on 11 million images. This volume of data, combined with the massive pre-training approach, gives SAM its remarkable zero-shot generalization capabilities.

Since its release, SAM has been adapted into several versions (SAM 2 for video, HQ-SAM for better accuracy, FastSAM for speed) and has become a fundamental building block in many computer vision pipelines, from automatic data annotation to photo retouching, robotics, and medical imaging.

Etymology

The acronym SAM stands for "Segment Anything Model." The name reflects the project's ambition: to create a foundation model for segmentation, akin to what GPT did for text. The term "Segment Anything" was chosen by the Meta AI team (formerly FAIR) to emphasize the model's universal generalization capability.

Concrete examples

Automatic annotation of an image dataset for training an object detection model

Use SAM to automatically segment all objects in this image, then export the masks in COCO format to annotate my vehicle detection dataset.

Photo retouching and object cutout in a creative application

Apply SAM to this product photo to isolate the main object from the background. I want a precise mask that I can use to change the background.

Medical imaging analysis to identify anatomical structures

Use SAM with a point prompt placed on the tumor visible in this brain MRI to generate a segmentation mask of the lesion.

Practical usage

In prompt engineering applied to vision, SAM is used as a segmentation building block in multimodal pipelines: you can combine SAM with a vision-language model (like GPT-4V or Claude) to first identify areas of interest via text description, then precisely segment those areas. For developers, the SAM API accepts prompts in the form of coordinates (x, y), bounding boxes, or text, making it easy to integrate into automated annotation, retouching, or image analysis workflows.

Related concepts

Vision Transformer (ViT)Semantic segmentationZero-shot learningFoundation model

FAQ

What is the difference between SAM and SAM 2?
SAM (2023) only works on still images, whereas SAM 2 (2024) extends segmentation capabilities to video. SAM 2 can track a segmented object across video frames in real time, thanks to a temporal memory mechanism. SAM 2 is also faster and more accurate than SAM on still images.
Can SAM be used for free?
Yes, SAM is an open source model published by Meta under the Apache 2.0 license. The code, model weights, and SA-1B dataset are publicly available. It can be used freely for commercial and research purposes, and many implementations are available through libraries like Hugging Face Transformers.
What are the limitations of SAM?
SAM may lack precision on very fine contours (hair, fur, transparent objects) and does not semantically understand what it segments — it cuts out visual regions without naming them. For tasks requiring classification of segmented objects, SAM must be coupled with a recognition model. Additionally, performance can drop on highly specialized domains (satellite imagery, microscopy) without fine-tuning.

See also

How to use this prompt

  1. Copy the prompt with the button above.
  2. Paste it into ChatGPT, Claude or your favorite AI assistant.
  3. Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

More definitions

Get new prompts every week

Join our newsletter.