Build a Local LLM Zero-Shot Classifier You Can Actually Deploy
In this article
Zero-shot classification lets you categorize text without training a task-specific model. Instead of building a dedicated classifier for every queue, language, or department, you give a model a candidate label set and ask it to pick the best fit.
For real systems, that only works if output is consistent, auditable, and safe to route automatically. This guide shows a production-ready local setup with a strict JSON contract, confidence gates, and deterministic behavior.
1) What We Are Building
We will build a local classifier that:
- runs on a local LLM served through Ollama,
- accepts arbitrary candidate labels per request,
- returns a single best label plus alternatives,
- applies confidence + margin gates before auto-routing,
- falls back to manual review for ambiguous cases.
| Layer | Responsibility | Failure Mode if Missing |
|---|---|---|
| Prompt contract | Force strict JSON and label constraints. | Unstructured responses that break downstream parsing. |
| Parser + validator | Reject malformed payloads and out-of-taxonomy labels. | Silent misroutes caused by invalid model output. |
| Confidence policy | Auto-route only high-confidence predictions. | Low-quality automation with expensive corrections. |
| Manual review path | Catch uncertain edge cases. | Forced wrong decisions under uncertainty. |
2) Runtime Setup
Install and run Ollama locally, then pull a model suitable for classification-style reasoning.
ollama pull qwen2.5:7b-instruct
The classifier script used in this guide lives in:
scripts/zero-shot-local-ollama.mjs
3) Strict Prompt Contract
The classifier prompt uses explicit rules so the model returns machine-readable JSON only.
function buildPrompt(text, labels) {
return [
"You are a strict zero-shot classifier.",
"Return ONLY valid JSON with this schema:",
'{"label":"<one label>","score":<number 0..1>,"alternatives":[{"label":"...","score":0.0}]}',
"Rules:",
"1) label must be one of the candidate labels exactly.",
"2) score is confidence between 0 and 1.",
"3) alternatives must contain at most 3 labels sorted by descending score.",
"4) No markdown, no extra keys, no prose.",
`Candidate labels: ${JSON.stringify(labels)}`,
`Text: ${JSON.stringify(text)}`,
].join("\n");
}
This dramatically reduces prompt-format drift and parsing failures.
4) Local Inference Call (Ollama)
The real classification call is a standard non-streaming /api/generate request with deterministic parameters.
async function classifyWithOllama(text, labels) {
const response = await fetch("http://127.0.0.1:11434/api/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "qwen2.5:7b-instruct",
prompt: buildPrompt(text, labels),
stream: false,
options: { temperature: 0, top_p: 0.9 },
}),
});
const payload = await response.json();
const jsonText = extractJsonObject(payload.response);
const parsed = JSON.parse(jsonText);
return normalizeResult(parsed, labels);
}
Two non-negotiables for production:
- validate label membership against the input taxonomy,
- reject scores outside
[0, 1].
5) Confidence-Aware Routing
Never auto-route every model answer. Use confidence + top-2 margin gates.
function routeDecision(result, minConfidence = 0.6, minMargin = 0.08) {
const top = result.alternatives[0] ?? { label: result.label, score: result.score };
const second = result.alternatives[1] ?? { label: "", score: 0 };
if (top.score < minConfidence) return "manual_review";
if ((top.score - second.score) < minMargin) return "manual_review";
return `auto:${top.label}`;
}
This policy is simple, explainable, and easy to tune per queue.
6) Full Runnable Script
Use the full implementation at:
scripts/zero-shot-local-ollama.mjs
It supports two modes:
USE_MOCK=1: offline deterministic test mode,- default mode: real local inference against Ollama.
Mock validation command
$env:USE_MOCK="1"
node scripts/zero-shot-local-ollama.mjs
Real local model command
Remove-Item Env:USE_MOCK -ErrorAction SilentlyContinue
$env:OLLAMA_MODEL="qwen2.5:7b-instruct"
node scripts/zero-shot-local-ollama.mjs
7) Production Hardening Checklist
- Add per-label precision/recall metrics on a held-out labeled set.
- Track abstain rate (
manual_review) to detect taxonomy drift. - Log top-3 alternatives for postmortem analysis.
- Keep a human override path for all auto-routed actions.
- Version your label taxonomy and threshold config together.
8) Why This Design Works
The key is separation of concerns:
- local model handles semantic understanding,
- parser enforces data shape,
- policy layer controls operational risk.
That structure makes behavior easier to debug and safer to run at scale than raw prompt-only routing.
When you need higher accuracy, improve in this order:
- Better label definitions and examples.
- Better threshold tuning per business queue.
- Better model or model size.
Most teams skip 1 and 2, then overspend on 3.
Related Guides
The Complete Developer Guide to Running LLMs Locally: From Ollama to Production
Everything you need to run LLMs on your own hardware in 2026: VRAM sizing, model formats, an 8-tool comparison table, a full local RAG pipeline, and Docker production deployment with GPU passthrough and Nginx auth.
Event-Driven Architecture for Agentic AI: The Architect's Guide
A comprehensive architectural guide to designing resilient, real-time agentic AI systems using event-driven architecture — covering loose coupling, fault isolation, reference architecture, and governance patterns.
Cursor AI: Complete Setup and Practical Coding Guide
Everything developers need to use Cursor AI effectively — installation, the full keyboard shortcut map, inline code generation, chat with codebase context, tab autocomplete, @ mentions, custom rules, and how it compares to GitHub Copilot.