Llama 3: Definition and Examples

Llama 3 is a family of open-source large language models developed by Meta (formerly Facebook), designed to compete with the best proprietary models while remaining freely accessible to the community.

Full definition

Llama 3 (Large Language Model Meta AI, 3rd generation) is a family of language models published by Meta in April 2024. It represents a significant leap over Llama 2, with performance rivaling proprietary models like GPT-4 and Claude on many benchmarks. Llama 3 is available in several sizes — notably 8B and 70B parameters — allowing it to adapt to various use cases, from deployment on a personal computer to the most demanding cloud infrastructures.

Meta's philosophy with Llama 3 is based on open source: the model weights are freely downloadable, allowing anyone to use, fine-tune, or integrate them into commercial applications (under a permissive license). This approach has catalyzed an entire ecosystem of tools, adaptations, and derivative models created by the community, making Llama 3 one of the most widely adopted open-source models in the world.

On the technical side, Llama 3 builds on an optimized Transformer architecture, an improved tokenizer (128K token vocabulary), and training on a massive corpus of over 15 trillion tokens. The model excels in reasoning, code generation, instruction following, and multilingual understanding. Meta has also released Llama 3.1 (with a 405B parameter version) and Llama 3.2 (integrating multimodal vision capabilities and lightweight versions for edge computing), solidifying Llama 3 as an ever-evolving platform.

For prompt engineering practitioners, Llama 3 offers the major advantage of being runnable locally or on private infrastructure, ensuring full data control. Its structured prompt format (with system, user, and assistant role tags) is compatible with advanced prompting techniques such as few-shot, chain-of-thought, and RAG.

Etymology

"Llama" stands for Large Language Model Meta AI. The number 3 denotes the third major generation of this model family. The name also winks at the llama, the animal, which Meta uses as an informal mascot for the project.

Concrete examples

Local deployment for a confidential corporate chatbot

You are a legal assistant specialized in French labor law. Answer precisely and cite the relevant legal articles. Question: what are the conditions for a valid mutually agreed termination?

Fine-tuning Llama 3 for a specific domain

Using the Alpaca format, generate 50 instruction/response pairs to train a model specialized in veterinary medical diagnosis for cattle.

Usage via a compatible API (Ollama, vLLM, Together AI)

Practical usage

In prompt engineering, Llama 3 is mainly used when data confidentiality, deep customization via fine-tuning, or control over inference costs are required. It can be deployed locally with tools like Ollama or llama.cpp, or used via compatible cloud providers. Standard prompting techniques (system prompt, few-shot, chain-of-thought) work effectively, following Llama 3's specific prompt format with its role tags.

Related concepts

Open-source language modelFine-tuningQuantization (GGUF, GPTQ)Local inference

FAQ

What is the difference between Llama 3, Llama 3.1, and Llama 3.2?

Llama 3 (April 2024) introduced the 8B and 70B models. Llama 3.1 (July 2024) added a massive 405B parameter model and extended the context window to 128K tokens. Llama 3.2 (September 2024) brought multimodal capabilities (vision) and ultra-lightweight models (1B and 3B) designed for mobile and edge computing.

Can Llama 3 be used commercially?

Yes. Meta distributes Llama 3 under a permissive community license that allows commercial use, including for companies with fewer than 700 million monthly active users. Beyond this threshold, a special license is required. It is advisable to read the license carefully before any production deployment.

How to run Llama 3 on your own computer?

The easiest way is to use Ollama (ollama run llama3) or LM Studio, which automatically handle downloading and quantization of the model. For a GPU with 8 GB VRAM, the 8B version quantized to 4 bits works well. The 70B version requires at least 40 GB VRAM or can be distributed across multiple GPUs. Optimized formats like GGUF also allow CPU execution, albeit slower.

How to use this prompt

Copy the prompt with the button above.
Paste it into ChatGPT, Claude or your favorite AI assistant.
Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

Prompt library Learn prompting Prompt builder Prompt optimizer

More definitions

LlamaIndex: Definition and Examples

LlamaIndex is an open-source framework that connects language models (LLMs) to external data sources to build AI applications

Long Context Model: Definition and Examples

A Long Context Model is a language model capable of processing and reasoning over very large amounts of text in a single interaction, with a window...

LoRA: Definition and Examples

LoRA (Low-Rank Adaptation) is an efficient fine-tuning technique that allows adapting a large language model or image generation model to a specific task.

Loss Function: Definition and Examples

A loss function is a mathematical formula that measures the gap between an AI model's predictions and the expected results. It guides

Machine Translation: Definition and Examples

Machine Translation refers to the use of software and artificial intelligence algorithms to automatically translate a text from one language to another, preserving meaning. This glossary entry explores its definition, history, examples, and practical use in prompt engineering.

MCP Model Context Protocol: Definition and Examples

The Model Context Protocol (MCP) is an open standard that allows AI models to connect to external data sources, tools, and services.

Get new prompts every week

Join our newsletter.