Ollama is the easiest way to run a local LLM, and setting a system prompt is the most important configuration step for any local model. This tutorial covers all four ways to do it: the Modelfile, the /set command, the API, and the Python client. Examples work with Llama 3.x, Mistral, Qwen 3, Phi-4, and any other model in the Ollama library.
Generate a system prompt for your local model.
Open System Prompt Generator →The Modelfile is Ollama's equivalent of a Dockerfile. You define a base model, a system prompt, and other parameters, then build a custom model.
FROM llama3.2 SYSTEM """You are a senior code reviewer for Python projects. Always: - Comment on naming, structure, and potential bugs - Suggest specific line edits with code blocks - Flag deprecated functions - Mention security concerns when relevant Never: - Praise code without offering at least one improvement - Use jargon without explaining it - Suggest rewriting more than necessary""" PARAMETER temperature 0.3 PARAMETER num_ctx 8192
Save this as Modelfile and build:
ollama create code-reviewer -f Modelfile
Now run it:
ollama run code-reviewer
Every conversation with this custom model starts with that system prompt baked in. You can share the Modelfile with teammates or commit it to git.
For ad-hoc testing without creating a Modelfile:
ollama run llama3.2 >>> /set system You are a Python tutor for absolute beginners. Always explain code line by line. >>> Write a function to find prime numbers
The /set system command updates the system prompt for the current session. This is perfect for trying different prompts quickly without building new models.
Note: changing the system prompt mid-chat clears the conversation history. The new system prompt takes effect from the next user message.
For app integration, use the HTTP API directly:
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{"role": "system", "content": "You are a Python tutor for beginners. Explain code line by line."},
{"role": "user", "content": "Write a function to find prime numbers."}
],
"stream": false
}'
The format is identical to OpenAI's chat completions API — Ollama deliberately mirrors it for easy migration.
Using the official ollama Python package:
import ollama
response = ollama.chat(
model='llama3.2',
messages=[
{
'role': 'system',
'content': 'You are a Python tutor for beginners. Explain code line by line.'
},
{
'role': 'user',
'content': 'Write a function to find prime numbers.'
}
]
)
print(response['message']['content'])
| Model | System prompt support | Notes |
|---|---|---|
| Llama 3.x | Excellent | Built-in support, follows instructions reliably |
| Mistral | Good | Follows system prompts, slightly weaker on long lists |
| Qwen 3 | Excellent | Strong instruction following, good for structured outputs |
| Phi-4 | Good | Microsoft's small model, surprisingly capable |
| Gemma 2 | Good | Google's open model, follows persona instructions well |
| CodeLlama | Good | Specialized for code, system prompts shape coding style |
| DeepSeek | Excellent | Strong reasoning, follows complex multi-rule prompts |
Local models are typically smaller than frontier API models, so they need clearer prompts to behave well. Tips:
Generate a Modelfile-ready system prompt now.
Open System Prompt Generator →