Is the system prompt the first message or a separate parameter in the OpenAI API?

In OpenAI's API, the system prompt is the first message in the messages array, with role "system." It is not a separate top-level parameter (that's how Anthropic does it). The OpenAI format is: messages: [{role: "system", content: "..."}, {role: "user", content: "..."}].

Does the system prompt count toward my token budget?

Yes. The system prompt is part of the input tokens for every request. If your system prompt is 500 tokens and you make 1000 requests, you pay for 500,000 tokens of system prompt input. Use prompt caching to reduce this cost when reusing the same system prompt many times.

Can I change the system prompt mid-conversation?

Technically you can update the system message on subsequent API calls, but the model treats it as the new instruction going forward — it doesn't retroactively reinterpret prior turns. Best practice: pick a system prompt and stick with it for the entire conversation. If behavior needs to change drastically, start a new conversation.

How to Add a System Prompt to the OpenAI API — Tutorial With Code

Last updated: April 20266 min readAI Tools

The OpenAI API is the most common production endpoint for AI features. Setting a system prompt correctly is the difference between a model that does what you want and one that drifts into off-topic, off-format, or off-brand responses. This tutorial walks through the message format, code examples in Python and Node.js, and best practices for cost and reliability.

Generate the prompt itself in 2 minutes.

Open System Prompt Generator →

The OpenAI message format

OpenAI's chat completions API uses a messages array. The system prompt is the first element with role "system":

{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a senior support agent for Acme. Always confirm the user's plan before discussing pricing."},
    {"role": "user", "content": "Can I get a refund for last month?"}
  ]
}

The system message can appear in any position in the array, but convention (and best practice) is to put it first. Put any few-shot examples after the system message and before the user's actual message.

Python tutorial

Using the official openai Python SDK:

from openai import OpenAI

client = OpenAI(api_key="sk-...")

system_prompt = """You are a senior customer support agent for Acme SaaS.

Always:
- Confirm the user's plan tier before discussing features or pricing
- Use a warm, helpful tone
- End each response with a yes/no question to keep the conversation moving

Never:
- Promise refunds — escalate to a human
- Mention competitor product names
- Invent feature roadmap items"""

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "I want a refund for last month."}
    ]
)

print(response.choices[0].message.content)

That's the full pattern. Most production code follows this structure, with the system prompt loaded from a constant or file rather than inlined in the function.

Node.js tutorial

Using the official openai Node.js SDK:

import OpenAI from "openai";

const client = new OpenAI({ apiKey: "sk-..." });

const systemPrompt = `You are a senior customer support agent for Acme SaaS.

Always:
- Confirm the user's plan tier before discussing features or pricing
- Use a warm, helpful tone

Never:
- Promise refunds — escalate to a human`;

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: systemPrompt },
    { role: "user", content: "I want a refund for last month." }
  ]
});

console.log(response.choices[0].message.content);

curl tutorial

For quick testing without an SDK:

curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ]
  }'

Multi-turn conversations

For ongoing conversations, you append each new user message and each new assistant response to the messages array. The system prompt stays at index 0 throughout:

messages = [
    {"role": "system", "content": system_prompt},  # always first
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi! How can I help?"},
    {"role": "user", "content": "I need a refund"},
    {"role": "assistant", "content": "I understand. Which plan are you on?"},
    {"role": "user", "content": "Pro plan, billed monthly"},
    # ... new request goes here
]

Every time you call the API, you send the full array. The system prompt is always present in every request, which is why it's included in your token cost on every call.

Prompt caching to reduce cost

OpenAI supports automatic prompt caching for prefixes longer than 1024 tokens. When you reuse the same system prompt across many requests, OpenAI caches it and discounts subsequent reads to 50% of normal input price.

Two implementation tips to maximize cache hits:

Stable prefix: The cache key is the exact string of the system prompt prefix. Even minor changes (a different timestamp, a user-specific note) invalidate the cache. Keep the system prompt deterministic.
Long enough: Caching only kicks in for 1024+ tokens. Shorter system prompts don't get cached. If your prompt is 800 tokens, consider whether adding 200-300 tokens of stable few-shot examples would push you over the threshold.

Token counting

Before you deploy, count how many tokens your system prompt uses. The free token counter shows the count for any text you paste. For a typical chatbot, expect 200-500 tokens for the system prompt. Coding assistants often run 500-1000. Complex agents can hit 2000-5000.

Testing checklist

Before shipping a system prompt to production:

Test with at least 5 in-scope queries that should work
Test with 5 out-of-scope queries that should be redirected
Test with 3 ambiguous queries that should trigger clarification
Test with 3 adversarial inputs ("ignore your previous instructions," etc.)
Run a multi-turn conversation of at least 10 turns
Check that every response follows your output format
Check that every constraint is respected

If any test fails, refine the prompt and re-run all tests. Iterate until clean.

Generate a tested system prompt in 2 minutes.

Open System Prompt Generator →

How to Add a System Prompt to the OpenAI API — Tutorial With Code

The OpenAI message format

Python tutorial

Node.js tutorial

curl tutorial

Multi-turn conversations

Prompt caching to reduce cost

Token counting

Testing checklist

Related Posts

AIPRM Alternative

Free AI Prompt Builder

System Prompt Generator

Token Counter