Ollama: Definition and Examples

Ollama is an open source tool that allows you to run large language models (LLMs) locally on your own computer, without internet connection or dependency on a cloud service.

Full definition

Ollama is an open source platform designed to simplify the execution of large language models (LLMs) directly on a local machine. Where using models like Llama, Mistral, or Gemma previously required complex setups, Ollama offers an intuitive command-line interface that automates downloading, configuration, and running models in a few commands.

One of the main advantages of Ollama is data privacy: all interactions with the model remain on the user's machine, with no data passing through external servers. This makes it a particularly popular tool among developers, researchers, and companies concerned about protecting their sensitive data.

Ollama automatically handles model optimization according to available hardware (CPU, GPU, RAM) and provides a local REST API compatible with many tools and frameworks. It supports a large catalog of pre-quantized models, allowing powerful LLMs to run even on consumer hardware without a high-end graphics card.

Thanks to its "Modelfile" system inspired by Dockerfiles, Ollama also allows customizing models by defining system prompts, generation parameters, and conversation templates. This approach makes the tool accessible to both beginners and advanced users who want to create customized AI assistants.

Etymology

The name "Ollama" is a pun combining "llama" (in reference to Meta's LLaMA model family) and a sound evoking ease of use. The project was created in 2023 and quickly established itself as the reference for local LLM execution.

Concrete examples

Install and run a model locally for prompt engineering

ollama run llama3 "Explain the concept of chain-of-thought in prompt engineering with concrete examples"

Create a specialized assistant with a custom Modelfile

FROM mistral
SYSTEM You are an expert in SEO copywriting. You always write in French, with a professional tone and structured paragraphs. You include keywords naturally in the text.

Use the local Ollama API in a prompt engineering application

curl http://localhost:11434/api/generate -d '{"model": "llama3", "prompt": "Generate 5 variants of this prompt to improve creativity of responses: Describe the advantages of generative AI"}'

Practical usage

In prompt engineering, Ollama enables rapid testing and iteration on prompts without API costs or network latency. It is ideal for experimenting with different models, comparing their responses to the same prompt, and developing complex prompt systems in complete privacy before deploying them in production.

Related concepts

LLM (Large Language Model)Local inferenceModel quantizationSystem prompt

FAQ

Is Ollama free?

Yes, Ollama is completely free and open source. The available models are also free to download and use. There are no API costs since everything runs locally on your machine.

What hardware configuration is needed to use Ollama?

The minimum configuration depends on the chosen model. For lightweight models (7B parameters), 8 GB of RAM is sufficient. For larger models (70B), 64 GB of RAM or more is required. A CUDA or Metal compatible GPU significantly accelerates generation, but is not mandatory: Ollama also works on CPU only.

What is the difference between Ollama and a cloud API like OpenAI?

With Ollama, the model runs on your machine: your data remains private, there is no cost per token, and you can work offline. On the other hand, local models are generally less powerful than the most advanced cloud models (like GPT-4 or Claude), and generation speed depends on your hardware.

How to use this prompt

Copy the prompt with the button above.
Paste it into ChatGPT, Claude or your favorite AI assistant.
Replace the bracketed variables with your details, then refine the result.

About Prompt Guide

Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.

Prompt library Learn prompting Prompt builder Prompt optimizer

More definitions

OpenAI: Definition and Examples

OpenAI is an American artificial intelligence research and deployment company, founded in 2015, best known for creating ChatGPT and the

Overfitting: Definition and Examples

Overfitting (or overtraining) refers to the phenomenon where an AI model adapts too precisely to the training data, to the point of losing its ability to generalize to new data.

Perplexity Metric: Definition and Examples

Perplexity is an evaluation metric for language models that measures how "surprised" a model is by a given text. The lower the perplexity, the more effectively the model predicts the word sequence.

Phi 3: Definition and Examples

Phi 3 is a family of small language models (SLMs) developed by Microsoft Research, designed to deliver performance close to large models while being compact enough to run on local devices.

Pinecone: Definition and Examples

Pinecone is a cloud-native vector database designed to store, index, and search embeddings at scale, used particularly in

Positional Encoding: Definition and Examples

Positional Encoding is a technique used in Transformer architectures to inject information about the position of each token in a sequence.

Get new prompts every week

Join our newsletter.