Prompting

The Minimal Prompt Toolkit — What You Actually Need to Know

Rza

05 Feb 2026 — 8 min read

This is the last article in the series, and it's designed to make most of the previous ones optional. If you read nothing else about prompt engineering — if the phrase itself makes you tired — this is the one piece to internalize. Five techniques, a short list of settings to leave alone, and a handful of anti-patterns to drop. This covers 90% of what changes LLM output quality. The remaining 10% is for people building production systems, and they'll find it in the earlier articles.

The fundamental truth about prompting is that it's not a discipline — it's a communication skill. The same principles that make you good at explaining things to people make you good at explaining things to models. Be specific. Provide context. Show what you want. Say what format you need it in. Check the result and adjust. That's it. Everything else is elaboration on those five ideas.

Technique 1 — Be Specific and Direct

This handles 60% of all prompting needs, and it's the technique that the entire prompt engineering ecosystem undervalues because it's too obvious to sell courses on. Tell the model exactly what you want. Include the context it needs to do the job. Specify the format you want the output in. Don't hedge, don't pad, don't write a paragraph of preamble before getting to the actual request.

The difference between a vague prompt and a specific one is concrete. "Write me something about marketing" is vague. "Write a 500-word blog post about email marketing open rates for B2B SaaS companies, with 3 actionable recommendations and specific benchmarks" is specific. The second prompt isn't fancier or more "engineered" — it just contains the information the model needs to give you what you want. The model can't read your mind. It can only work with what you provide.

Context is the most commonly missing ingredient. When you ask the model a question, it doesn't know your background, your audience, your constraints, or your preferences unless you state them. "How should I structure this presentation" is a different question when the audience is your team versus the board versus a conference of 500 people. "Explain how DNS works" has a different answer for a network engineer than for a product manager. The model defaults to generic — your context makes it specific.

Format specification is the easiest lever and the most underused by casual users. "Respond in bullet points." "Use markdown headers." "Keep each section under 100 words." "Format as a table with columns for X, Y, and Z." These instructions work reliably across every major model because output formatting was a core part of instruction-tuning. If you don't specify a format, the model picks one — and it picks the generic one, which is usually a wall of prose that buries the information you need.

Technique 2 — Give Examples

Few-shot prompting — showing the model 2-3 examples of the input/output pair you want — is the single most reliable technique for getting consistent, correctly-formatted results. It works because the model pattern-matches against your examples, picking up format, tone, length, and implicit rules that would take paragraphs to describe in words. This was covered extensively earlier in the series, but the condensed version is simple: when the model's default output isn't in the format or style you need, show it what you want instead of explaining it.

Two or three examples is the sweet spot for most tasks. One example establishes the format. Two examples confirm the pattern and show the model it's not a fluke. Three examples handle edge cases and reinforce implicit rules. Beyond three, you hit diminishing returns for most use cases — the model got the pattern after two, and extra examples are consuming context window space without adding signal.

The quality of your examples matters more than the quantity. Bad examples teach bad patterns. If your example contains formatting inconsistencies, the model will reproduce those inconsistencies. If your example is sloppy, the output will be sloppy in the same ways. The best few-shot examples are ones you've actually polished — real outputs that represent what you want, not rough drafts tossed together to fill the prompt.

Where to use it: data extraction, classification, style matching, report formatting, email templates, code generation in a specific style, content that needs to match an existing voice. Where to skip it: one-off creative tasks, questions, brainstorming, anything where you want the model to surprise you rather than conform to a pattern.

Technique 3 — Ask for Reasoning

Chain of thought — "think through this step by step" or "show your reasoning" — genuinely improves accuracy on tasks that have intermediate steps. Math problems, logical deduction, code debugging, complex analysis where the answer depends on getting intermediate conclusions right. The effect is well-documented in research and holds up in practice. Adding those six words can turn a wrong answer into a right one on reasoning-heavy tasks.

The mechanism is not mysterious. When the model generates intermediate steps, each step becomes context for the next step. The model can "see" its own reasoning as it goes, which prevents it from jumping to conclusions that skip important logic. Without chain of thought, the model goes straight from question to answer in one prediction — and for multi-step problems, that single prediction is often wrong because the answer space is too large to land on correctly without working through the pieces.

When to use it: math, logic puzzles, code debugging, multi-step analysis, anything where you'd expect a human to show their work. When to skip it: simple factual questions, creative writing, summarization, translation, formatting tasks — anything that doesn't have intermediate reasoning steps. Using chain of thought on a task that doesn't need it just makes the model write more words without thinking better, and you pay for those tokens in latency and cost.

The distinction between prompt-level CoT ("think step by step") and model-level reasoning (Claude's extended thinking, OpenAI's o-series) matters for hard problems. Prompt-level CoT is free and works for most tasks. Model-level reasoning uses dedicated computation for harder problems and is measurably better on math and logic benchmarks — but it costs more and takes longer. Use prompt-level CoT as your default. Escalate to reasoning models when the task is genuinely hard and accuracy is critical.

Technique 4 — Set the Format

Explicit format requests work reliably across all major models. "Respond in JSON with the following fields." "Use markdown with H2 headers for each section." "Return a bullet list, no prose." "Format as a CSV with columns: name, category, score." These aren't advanced techniques — they're basic instructions that the model follows because formatting compliance was drilled into it during training.

For programmatic use cases — when you're parsing the output in code — use the platform's native structured output features when available. OpenAI's JSON mode and response format parameter, Anthropic's tool use for structured output, Gemini's response schemas. These are more reliable than prompt-only format requests because they enforce the structure at the generation level rather than hoping the model follows your instructions. For interactive use, a format instruction in the prompt is usually sufficient.

The most common format failure is extra text — the model adds a preamble ("Here's the JSON you requested:") or an epilogue ("Let me know if you need any changes") around the structured data you asked for. Addressing this is simple: "Return only the JSON, no additional text." This instruction works on every major model and eliminates the most frequent format compliance issue.

Technique 5 — Iterate Instead of Over-Engineering

This is the technique that saves the most time and gets the least respect. Start with a simple, clear prompt. Read the output. If it's good enough, you're done. If it missed something, add one constraint addressing that specific miss. Read again. Repeat.

This beats front-loaded complexity for a structural reason: each constraint you add is based on an observed problem rather than an anticipated one. You're not guessing what the model might get wrong — you're correcting what it actually got wrong. This produces leaner, more effective prompts because every instruction earns its place by solving a real problem.

The typical arc looks like this. First attempt: clear request, minimal constraints. The output is 80% right but too long and uses the wrong tone. Second attempt: same request plus "keep it under 300 words and use a casual, direct tone." The output is now the right length and tone but missing a key point. Third attempt: add "make sure to address X." Done. Three rounds, each taking 30 seconds, producing a better result than a 10-minute prompt-engineering session that tries to anticipate every failure mode upfront.

Two to three rounds covers most tasks. If you're on round five and the output still isn't right, the problem likely isn't your prompt — it's either the wrong model for the task, a task that hits the model's limitations (see the previous article), or a request that's genuinely ambiguous in ways you haven't resolved in your own thinking.

What to Leave at Default

Temperature. Unless you have a specific reason to change it, leave it at the model's default. Low temperature (0.0-0.3) for tasks where consistency matters — classification, extraction, factual questions. High temperature (0.7-1.0) for creative tasks where you want variation. But the default — which varies by model and is usually around 0.7-1.0 — is calibrated for general use and works for most tasks. Adjusting temperature is the last optimization, not the first.

Top-p. Leave it alone. The interaction between temperature and top-p is confusing, adjusting both simultaneously makes behavior unpredictable, and the default works. Of all the parameters available in API settings, top-p is the one least likely to improve your results through manual adjustment.

Frequency and presence penalties. Leave them at 0. These parameters exist to address repetition in long-form generation, and the default of 0 works for most tasks. If the model is being repetitive, addressing it in the prompt ("don't repeat points") is usually more effective than tuning a penalty parameter.

Max tokens. Leave it unset unless you need to cap output length for cost or parsing reasons. The model naturally produces output of appropriate length for most tasks. Setting max tokens too low cuts off responses mid-thought; setting it too high wastes nothing because the model stops when it's done.

Anti-Patterns to Drop

These are the techniques that populate "ultimate prompt" threads and don't reliably improve output:

"You are the world's leading expert in X." The effect on output quality is minimal for most models and most tasks. If you want domain expertise, provide domain-specific context — don't just claim the model has it.

Emotional manipulation. "This is critical for my career." "My job depends on getting this right." "I'll tip you $200 for a good answer." These don't change the model's computation. [VERIFY] There have been some published results suggesting that emotional framing can marginally affect output quality on certain benchmarks, but the effect size is small and inconsistent across models. It's not worth the prompt space.

"Take a deep breath." This went viral because one study showed a marginal improvement on math tasks. The improvement, to the extent it existed, was likely an artifact of the phrase appearing in training data contexts where careful reasoning followed. It's not a reliable technique — it's a meme that occasionally correlates with slightly better output.

500-word system prompts for simple tasks. If your task is "summarize this article," a 500-word system prompt defining your persona, values, communication style, and output philosophy is not helping. It's consuming context window space with information the model doesn't need for the task at hand.

Magic keywords and secret syntax. There are no secret words that unlock hidden capabilities. The model doesn't have a cheat code. If someone is selling you secret prompt syntax, they're selling you nothing.

The One-Page Version

Be specific. Give examples when format matters. Ask for reasoning when logic matters. Set the format explicitly. Iterate based on what you see, not what you imagine. Leave the settings alone. Drop the theater.

That's prompt engineering. Everything else is either elaboration for edge cases or marketing.

This is part of CustomClanker's Prompting series — what actually changes output quality.

The Minimal Prompt Toolkit — What You Actually Need to Know

Rza

Technique 1 — Be Specific and Direct

Technique 2 — Give Examples

Technique 3 — Ask for Reasoning

Technique 4 — Set the Format

Technique 5 — Iterate Instead of Over-Engineering

What to Leave at Default

Anti-Patterns to Drop

The One-Page Version

Read more

The YouTube + AI Pipeline

The Weekly Drop

The Tool Collector's Guide to Owning Nothing

Self-Hosting & Tinkering