Build a Local LLM Zero-Shot Classifier You Can Actually Deploy

June 7, 2026guides

Zero-shot classification lets you categorize text without training a task-specific model. Instead of building a dedicated classifier for every queue, language, or department, you give a model a candidate label set and ask it to pick the best fit.

For real systems, that only works if output is consistent, auditable, and safe to route automatically. This guide shows a production-ready local setup with a strict JSON contract, confidence gates, and deterministic behavior.

1) What We Are Building

We will build a local classifier that:

  • runs on a local LLM served through Ollama,
  • accepts arbitrary candidate labels per request,
  • returns a single best label plus alternatives,
  • applies confidence + margin gates before auto-routing,
  • falls back to manual review for ambiguous cases.
Layer Responsibility Failure Mode if Missing
Prompt contract Force strict JSON and label constraints. Unstructured responses that break downstream parsing.
Parser + validator Reject malformed payloads and out-of-taxonomy labels. Silent misroutes caused by invalid model output.
Confidence policy Auto-route only high-confidence predictions. Low-quality automation with expensive corrections.
Manual review path Catch uncertain edge cases. Forced wrong decisions under uncertainty.

2) Runtime Setup

Install and run Ollama locally, then pull a model suitable for classification-style reasoning.

ollama pull qwen2.5:7b-instruct

The classifier script used in this guide lives in:

  • scripts/zero-shot-local-ollama.mjs

3) Strict Prompt Contract

The classifier prompt uses explicit rules so the model returns machine-readable JSON only.

function buildPrompt(text, labels) {
  return [
    "You are a strict zero-shot classifier.",
    "Return ONLY valid JSON with this schema:",
    '{"label":"<one label>","score":<number 0..1>,"alternatives":[{"label":"...","score":0.0}]}',
    "Rules:",
    "1) label must be one of the candidate labels exactly.",
    "2) score is confidence between 0 and 1.",
    "3) alternatives must contain at most 3 labels sorted by descending score.",
    "4) No markdown, no extra keys, no prose.",
    `Candidate labels: ${JSON.stringify(labels)}`,
    `Text: ${JSON.stringify(text)}`,
  ].join("\n");
}

This dramatically reduces prompt-format drift and parsing failures.

4) Local Inference Call (Ollama)

The real classification call is a standard non-streaming /api/generate request with deterministic parameters.

async function classifyWithOllama(text, labels) {
  const response = await fetch("http://127.0.0.1:11434/api/generate", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "qwen2.5:7b-instruct",
      prompt: buildPrompt(text, labels),
      stream: false,
      options: { temperature: 0, top_p: 0.9 },
    }),
  });

  const payload = await response.json();
  const jsonText = extractJsonObject(payload.response);
  const parsed = JSON.parse(jsonText);
  return normalizeResult(parsed, labels);
}

Two non-negotiables for production:

  • validate label membership against the input taxonomy,
  • reject scores outside [0, 1].

5) Confidence-Aware Routing

Never auto-route every model answer. Use confidence + top-2 margin gates.

function routeDecision(result, minConfidence = 0.6, minMargin = 0.08) {
  const top = result.alternatives[0] ?? { label: result.label, score: result.score };
  const second = result.alternatives[1] ?? { label: "", score: 0 };

  if (top.score < minConfidence) return "manual_review";
  if ((top.score - second.score) < minMargin) return "manual_review";
  return `auto:${top.label}`;
}

This policy is simple, explainable, and easy to tune per queue.

6) Full Runnable Script

Use the full implementation at:

  • scripts/zero-shot-local-ollama.mjs

It supports two modes:

  • USE_MOCK=1: offline deterministic test mode,
  • default mode: real local inference against Ollama.

Mock validation command

$env:USE_MOCK="1"
node scripts/zero-shot-local-ollama.mjs

Real local model command

Remove-Item Env:USE_MOCK -ErrorAction SilentlyContinue
$env:OLLAMA_MODEL="qwen2.5:7b-instruct"
node scripts/zero-shot-local-ollama.mjs

7) Production Hardening Checklist

  • Add per-label precision/recall metrics on a held-out labeled set.
  • Track abstain rate (manual_review) to detect taxonomy drift.
  • Log top-3 alternatives for postmortem analysis.
  • Keep a human override path for all auto-routed actions.
  • Version your label taxonomy and threshold config together.

8) Why This Design Works

The key is separation of concerns:

  • local model handles semantic understanding,
  • parser enforces data shape,
  • policy layer controls operational risk.

That structure makes behavior easier to debug and safer to run at scale than raw prompt-only routing.

When you need higher accuracy, improve in this order:

  1. Better label definitions and examples.
  2. Better threshold tuning per business queue.
  3. Better model or model size.

Most teams skip 1 and 2, then overspend on 3.

Related Guides