Ollama: Definition and Examples
Ollama is an open source tool that allows you to run large language models (LLMs) locally on your own computer, without internet connection or dependency on a cloud service.
Full definition
Ollama is an open source platform designed to simplify the execution of large language models (LLMs) directly on a local machine. Where using models like Llama, Mistral, or Gemma previously required complex setups, Ollama offers an intuitive command-line interface that automates downloading, configuration, and running models in a few commands.
One of the main advantages of Ollama is data privacy: all interactions with the model remain on the user's machine, with no data passing through external servers. This makes it a particularly popular tool among developers, researchers, and companies concerned about protecting their sensitive data.
Ollama automatically handles model optimization according to available hardware (CPU, GPU, RAM) and provides a local REST API compatible with many tools and frameworks. It supports a large catalog of pre-quantized models, allowing powerful LLMs to run even on consumer hardware without a high-end graphics card.
Thanks to its "Modelfile" system inspired by Dockerfiles, Ollama also allows customizing models by defining system prompts, generation parameters, and conversation templates. This approach makes the tool accessible to both beginners and advanced users who want to create customized AI assistants.
Etymology
The name "Ollama" is a pun combining "llama" (in reference to Meta's LLaMA model family) and a sound evoking ease of use. The project was created in 2023 and quickly established itself as the reference for local LLM execution.
Concrete examples
Install and run a model locally for prompt engineering
ollama run llama3 "Explain the concept of chain-of-thought in prompt engineering with concrete examples"
Create a specialized assistant with a custom Modelfile
FROM mistral
SYSTEM You are an expert in SEO copywriting. You always write in French, with a professional tone and structured paragraphs. You include keywords naturally in the text.
Use the local Ollama API in a prompt engineering application
curl http://localhost:11434/api/generate -d '{"model": "llama3", "prompt": "Generate 5 variants of this prompt to improve creativity of responses: Describe the advantages of generative AI"}'
Practical usage
In prompt engineering, Ollama enables rapid testing and iteration on prompts without API costs or network latency. It is ideal for experimenting with different models, comparing their responses to the same prompt, and developing complex prompt systems in complete privacy before deploying them in production.
Related concepts
FAQ
Is Ollama free?
What hardware configuration is needed to use Ollama?
What is the difference between Ollama and a cloud API like OpenAI?
See also
How to use this prompt
- Copy the prompt with the button above.
- Paste it into ChatGPT, Claude or your favorite AI assistant.
- Replace the bracketed variables with your details, then refine the result.
About Prompt Guide
Prompt Guide is a free library of 2500+ ready-to-use prompts for ChatGPT, Claude and other AIs, with guides to learn prompting and tools to build and optimize your own prompts.
More definitions
OpenAI: Definition and Examples
OpenAI is an American artificial intelligence research and deployment company, founded in 2015, best known for creating ChatGPT and the
Overfitting: Definition and Examples
Overfitting (or overtraining) refers to the phenomenon where an AI model adapts too precisely to the training data, to the point of losing its ability to generalize to new data.
Perplexity Metric: Definition and Examples
Perplexity is an evaluation metric for language models that measures how "surprised" a model is by a given text. The lower the perplexity, the more effectively the model predicts the word sequence.
Phi 3: Definition and Examples
Phi 3 is a family of small language models (SLMs) developed by Microsoft Research, designed to deliver performance close to large models while being compact enough to run on local devices.
Pinecone: Definition and Examples
Pinecone is a cloud-native vector database designed to store, index, and search embeddings at scale, used particularly in
Positional Encoding: Definition and Examples
Positional Encoding is a technique used in Transformer architectures to inject information about the position of each token in a sequence.
Get new prompts every week
Join our newsletter.