The Developer's Guide to Running Claude Code for Free: Ollama, OpenRouter, and Local Proxies

May 15, 2026 • guides

Claude Code has rapidly emerged as the OS for agentic coding. It doesn't just suggest snippets; it audits your project, executes terminal commands, applies multi-file refactors, and reasons through complex bugs—all from your terminal. However, because it is natively wired to Anthropic’s Tier 1 pricing, heavy professional usage can generate a significant monthly bill. The good news is that Claude Code is built with engineering flexibility in mind. By overriding a single environment variable, you can decouple the CLI interface from the Anthropic backend.

graph TD
    classDef primary fill:#10a37f,stroke:#fff,stroke-width:2px,color:#fff;
    classDef secondary fill:#0b0e14,stroke:#10a37f,stroke-width:2px,color:#fff;
    classDef bridge fill:#0b0e14,stroke:#4d6eff,stroke-width:2px,color:#fff;

    A[Claude Code CLI]:::primary -- "export ANTHROPIC_BASE_URL" --> B{Routing Logic}:::secondary
    
    B -- "Option 1: Local" --> C[LiteLLM Proxy<br>localhost:4000]:::bridge
    C -- "Translate: Anthropic -> OpenAI" --> D[Ollama Engine<br>localhost:11434]:::primary
    D --> E[Local GPU/CPU<br>Qwen 2.5 Coder]:::secondary
    
    B -- "Option 2: Cloud" --> F[OpenRouter API<br>api/v1]:::primary
    F -- "Model Selection Flag" --> G[Free Tier Aggregator]:::secondary
    G --> H[Llama 3.3 70B<br>Remote Weights]:::secondary

Understanding the Architecture: The Redirection Mechanic

Under the hood, Claude Code is a standard client that sends JSON payloads to an API endpoint. By default, this is Anthropic’s production server. However, the CLI respects the ANTHROPIC_BASE_URL environment variable.

As long as the endpoint you provide speaks a compatible "Messages API" format, Claude Code will function perfectly. This is the core principle that enables us to use any model—local or remote—as the "brain" for the Claude CLI.

The Two Strategic Approaches

Before committing to a setup, consider the trade-offs between local and cloud-aggregated routing.

Aspect	Method 1: Local Ollama	Method 2: OpenRouter Free Tier
Value	Privacy & Offline	S-Tier Reasoning
Security	100% On-Prem	Third-party Egress
Latency	GPU-dependent	Network-dependent
Ideal Use	Sensitive Code	Architectural Tasks

Method 1: Local Execution with Ollama & LiteLLM

Ollama allows you to run weights like Qwen2.5-Coder and DeepSeek-Coder locally. However, Ollama’s API defaults to an OpenAI-compatible structure, while Claude Code requires the Anthropic format. To fix this, we use LiteLLM as a persistent translation proxy.

Step 1: Install and Initialize Ollama

Download the latest binary from ollama.com. Once installed, verify the service is active:

ollama --version

Step 2: Selecting the Right Model

Coding is a high-precision task. For local execution, we recommend the Qwen2.5-Coder series, which currently leads the open-source benchmarks for instruction-following and code generation.

# Pull the 14B model (The sweet spot for 16GB-32GB RAM machines)
ollama pull qwen2.5-coder:14b

Step 3: Launching the LiteLLM Bridge

LiteLLM accepts Anthropic-formatted requests from Claude Code and translates them for Ollama in real-time.

# Install the bridge
pip install litellm

# Start the proxy. Port 4000 is our standard for Claude Code redirection.
litellm --model ollama/qwen2.5-coder:14b --port 4000

Step 4: Running Claude Code locally

Open a new terminal window in your project directory and set the redirection variables.

export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=ollama

# Navigate to your project and run
cd /your/project
claude

Hardware Requirements for Local Parity

[!IMPORTANT] GPU Performance Matters: Coding is a high-precision task. Running local models isn't just about RAM; it's about tokens-per-second (TPS).

8GB RAM: Limited to 7B or 8B models. Speed ~5-10 TPS (Slow for large file reads).

16GB - 32GB RAM: 14B models (Qwen2.5-Coder) become viable. Speed ~20-30 TPS (Smooth).

16GB+ VRAM (NVIDIA GPU): 32B+ models at production speeds. 1:1 feel with Sonnet.

Method 2: High-Parameter Routing via OpenRouter

OpenRouter is an aggregator that exposes high-parameter models (like Llama 3.3 70B) via a unified API. Many of these models are available on a Free Tier, supported by specific providers.

Because OpenRouter natively supports the Anthropic Messages format, you don't need LiteLLM. You can point Claude Code directly at the OpenRouter endpoint.

Step 1: Setup and API Keys

Sign up at openrouter.ai.
Generate a key in the API Keys section.
Identify free models by looking for the :free suffix (e.g., meta-llama/llama-3.3-70b-instruct:free).

Step 2: Configure Redirection

Set your variables to target the OpenRouter V1 endpoint.

export ANTHROPIC_BASE_URL=https://openrouter.ai/api/v1
export ANTHROPIC_API_KEY=your_openrouter_key_here
export CLAUDE_MODEL=meta-llama/llama-3.3-70b-instruct:free

Step 3: Launch with Explicit Model Selection

You can also pass the model directly as a flag to ensure the CLI doesn't default back to Sonnet:

claude --model meta-llama/llama-3.3-70b-instruct:free

Managing Gaps in Performance

While this setup is free, it is important to manage expectations. Open-source models, even at 70B, are currently in a different class than Claude 3.5 Sonnet.

[!TIP] The Hybrid Balance: Use the OpenRouter 70B Free Tier for complex reasoning and the Local 14B Ollama setup for repetitive boilerplate and privacy-sensitive file operations.

What works exceptionally well:

Standard refactoring and linting fixes.
Writing unit tests for isolated functions.
Explaining legacy code logic within a single file.
Searching and navigating a filesystem.

Where you will notice gaps:

Context Retention: Smaller local models (8B-14B) may lose the thread of a multi-file architectural change.
Subtle Bug Detection: Claude 3.5 is uniquely good at spotting race conditions or logical fallacies that open-source models often miss.
Agentic Stability: In complex chains, open-source models are more likely to hallucinate a terminal command or break the chain.

Advanced Configuration: Aliases and Environment Management

To switch between "Free Mode" and "Production Mode" instantly, add these aliases to your .zshrc or .bashrc:

# Use Ollama Locally
alias claude-local='export ANTHROPIC_BASE_URL="http://localhost:4000" && export ANTHROPIC_API_KEY="local" && claude'

# Use OpenRouter Free Tier
alias claude-free='export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1" && export ANTHROPIC_API_KEY="your_key" && export CLAUDE_MODEL="meta-llama/llama-3.3-70b-instruct:free" && claude'

# Use Standard Anthropic (Paid)
alias claude-pro='unset ANTHROPIC_BASE_URL && unset ANTHROPIC_API_KEY && unset CLAUDE_MODEL && claude'

Troubleshooting and FAQ

1. Claude Code says the API key is invalid.

This usually means the ANTHROPIC_BASE_URL isn’t set or isn’t being picked up. Confirm the variable is exported in the same terminal session you’re running Claude Code from.

2. Claude Code doesn’t recognize the model flag

Some older Claude Code versions handle model specification differently. Try setting CLAUDE_MODEL as an environment variable instead of using the --model flag.

3. LiteLLM returns a format error.

Ensure that your Ollama model is actually running (ollama ps) and that LiteLLM started cleanly. Restart LiteLLM if needed.

Final Takeaway: The Hybrid Strategy

For professional engineering, the most cost-effective strategy is a Hybrid Workflow. Use the local Ollama setup for 80% of your day—searching files, writing tests, and refactoring boilerplate. Reserve your Anthropic tokens (or high-parameter OpenRouter models) for the 20% of tasks that require deep architectural reasoning and cross-file stability.

Why Developers are Making the Switch

The primary driver isn't just the monthly token bill—it's Privacy and Sovereignty. In many enterprise environments, piping proprietary source code to a third-party LLM provider is a compliance non-starter. By redirecting the Claude CLI to a local Ollama instance, you maintain a "Zero Egress" policy while still benefiting from the industry's best agentic interface.

Key Takeaways for Your Stack

[!IMPORTANT] Core Redirection Patterns

CLI Redirection: The ANTHROPIC_BASE_URL is the ultimate lever for infrastructure control.

LiteLLM: The essential bridge for OpenAI-to-Anthropic format translation.

Local Sovereignty: Ollama provides a safe harbor for proprietary codebases.

Cloud Aggregation: OpenRouter provides a high-reasoning fallback without a subscription.

By mastering these redirection patterns, you ensure that your engineering velocity is limited only by your hardware, not by your API budget.

Share this guide:

𝕏 in r/

Join the AI Mastery Insider

Get "Gold Standard" AI guides, news, and engineering deep-dives delivered to your inbox every week.

Related Guides

Building a Production RAG Pipeline with Bedrock and OpenSearch Serverless

guides

Shan • 2026-05-14

AWSAmazon BedrockOpenSearch ServerlessRAGEnterprise AI

Building a Production RAG Pipeline with Bedrock and OpenSearch Serverless

Everyone has shipped a RAG demo. Shipping one that survives real traffic, security audits, and finance reviews requires a different architecture. Explore the enterprise-grade RAG stack on AWS.

Secure RAG on Google Cloud: From Private Data to Safe Answers

guides

Shan • 2026-05-14

Google CloudVertex AISecurityRAGGenerative AI

Secure RAG on Google Cloud: From Private Data to Safe Answers

Production RAG is not just a way to 'chat with your docs'—it is a high-risk data access system. Learn how to build a secure RAG architecture on GCP using Vertex AI, Model Armor, and VPC Service Controls.

guides

Shan • 2026-05-14

LTRRAGMachine LearningSearchRe-ranking

Learning to Rank: The Hidden Layer Powering Modern RAG

Your RAG pipeline is only as good as your ranker. Explore the technical depths of Learning to Rank (LTR), from LambdaMART's 'Physics Trick' to the Cross-Encoder tradeoff that determines production success.