PLAYBOOKS

Running Claude Code Free On Your Laptop

Step-by-step setup to run Claude Code locally with Ollama. No subscription, no API bill, full capability on your own machine. Plus why every serious operator should know this, even if you don't use it as your daily driver.

Steve Tan

Steve Tan

June 11, 2026 · 5 min read

TL;DR

This is the setup for running Claude Code locally with Ollama on Mac, free, on your own laptop. About 10 minutes to install. I'm not personally using this as my daily driver, frontier models are still ahead for what I do. But the operator point is bigger than the install. Companies like Airbnb run a lot of their AI on local and open-source models from China. Customer support, classification, routine tasks, none of it needs frontier-level reasoning. The savings are massive at scale. If you're tied to one model, you're at the mercy of one company's pricing. Knowing how to run local is operator diversification.

Here's the setup that's been going around for running Claude Code completely free on your laptop.

Heads up: this guide is Mac only. The terminal commands and the ~/.zshrc setup are written for macOS. The same install works on Linux with small tweaks, and on Windows via WSL, but neither is covered here. Mac users, you're good. Everyone else, this'll point you in the right direction but you'll need to translate the commands.

Quick honest note up front. I'm not personally using this as my daily driver. Frontier models are still ahead for the kind of work I'm doing right now. But the gap is closing fast, and the operator point here is bigger than whether you swap your daily setup tonight.

The bigger story is this. A lot of serious SaaS companies are already running on local and open-source models. Airbnb runs significant chunks of its AI infrastructure on Alibaba's Qwen models. A growing number of large startups are doing customer support and classification with DeepSeek. None of that work needs frontier-level reasoning. Classification, routine support tickets, internal tooling, automation pipelines, all of it runs perfectly well on cheaper open-source models. The unit economics are 50x better and the output quality is the same for those use cases.

The lesson for any operator: if you're tied to one model, you're at the mercy of one company's pricing, one company's rate limits, one company's roadmap. Knowing how to run local isn't about saving $20 a month. It's about diversifying your AI stack the same way you diversify suppliers, payment processors, or hosting providers. Concentration risk is concentration risk.

This setup takes about 10 minutes if you've never opened a terminal. Less if you have.

Setup, step by step (Mac)

You'll use your terminal once. Press Command + Space, type "Terminal", hit enter. That's it, you're in.

Step 1. Install Ollama

Go to ollama.com, click download, install like any other Mac app. It runs quietly in the background. You won't notice it's there.

Step 2. Pull a coding model

In your terminal, paste one of these and hit enter:

If your Mac is newer (last 3-4 years, with decent RAM):

ollama pull qwen3-coder:30b

If your Mac is older or lower-spec (8GB RAM or less):

ollama pull qwen2.5-coder:1.5b

Takes a few minutes to download. You only do this once. The 30B model is more capable. The 1.5B runs on almost anything.

Create a free account to continue reading

Every Framework, Playbook,
and Prompt — Free, Forever.

The operator's library for building with AI.

“The most actionable AI resource library
I've found. Thanks Steve!”

James.H — Member since 2026

Join 2,845+ leaders, builders, and innovators

Already have an account?

Step 3. Point Claude Code at your local model

Paste these three lines in your terminal, one at a time, hit enter after each:

export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""

You just told Claude Code to call your laptop instead of Anthropic's servers.

Step 4. Give the model working memory

export OLLAMA_CONTEXT_LENGTH=32768

Without this, Claude Code loses track of what it was doing mid-task.

Claude Code adds a small invisible label to every message. Your local model uses that label to recognize whether it's already loaded the setup. The label changes per message, so the local model re-reads everything every time. Result: 40 seconds per response. Most people try it, decide it's broken, quit.

The fix:

export CLAUDE_CODE_ATTRIBUTION_HEADER=0

Speed goes back to normal.

Step 6. Make it permanent

The settings vanish when you close the terminal. To make them stick, open ~/.zshrc in any text editor:

open ~/.zshrc

Paste these at the bottom:

export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export OLLAMA_CONTEXT_LENGTH=32768
export CLAUDE_CODE_ATTRIBUTION_HEADER=0

Save, close terminal, reopen.

Step 7. Run Claude Code

claude

It's now talking to your laptop. No internet needed, no API billing.

What "free" actually means here

You still need Claude Code installed and authenticated, which means Claude Pro or API credits to start. This setup replaces the per-message cost that runs forever, not the access gate.

Think of it like a car. You still need the license. But once you're driving, the per-message cost drops to zero.

Local model honesty. What works, what breaks

The viral takes claim local models are "in the same league as Sonnet." They're not, at least not yet.

What the local model handles fine:

  • Bug fixes and small functions
  • Refactoring and cleaning up code
  • "Explain this code" questions
  • Routine tasks where good-enough beats perfect
  • Anything you'd run hundreds of times per day in an automation pipeline

Where you'll feel the gap:

  • Complex multi-file reasoning
  • Large codebases with cross-file dependencies
  • Architectural problems that need frontier-level thinking
  • Anything where one wrong answer cascades into bigger problems

The gap is real. Local handles routine tasks well. Don't expect it to replace Sonnet on hard architectural work. The trade-off is no cost and complete privacy in exchange for a less capable model.

Why operators should know this even if they don't use it daily

This is the part I want to spend a minute on, because it's the bigger lesson.

Airbnb isn't running its entire AI infrastructure on Qwen because Qwen is the best model. They're running it on Qwen because the unit economics work. When you process a billion classification calls a year, the difference between $0.001 and $0.05 per call is the difference between a $1M and $50M annual bill. The frontier model isn't 50x better at classifying a customer ticket. It's the same.

Big startups doing customer support are reaching the same conclusion. DeepSeek and similar models can handle 80% of support volume at 5% of the cost. The hard cases get routed to a human or a frontier model. Most of the work doesn't need the expensive option.

The pattern that's emerging:

  • Frontier models for hard reasoning, creative work, anything customer-facing where quality matters
  • Local or open-source models for routine processing, classification, batch work, anything that happens hundreds of times a day

If you're building anything that scales, this matters. The companies that figure out which work needs the expensive model and which work can run cheap are going to have permanent unit-economics advantages over the companies still running everything on the frontier.

This setup is your way in. Even if you don't use it as your daily driver, getting it running gives you the muscle memory for the bigger move. The move is diversification. Same logic as not running your whole business on one supplier or one payment processor. Concentration risk applies to AI providers too. The teams that can switch models like changing a setting are going to outmaneuver the teams that can't.

Who this setup is for

Use it if:

  • You travel and need to code offline (planes, remote spots, no signal)
  • You're hitting API overages on Pro and want a free fallback for routine work
  • You're building automations that would run hundreds of calls a day on the API
  • You want your code completely off the cloud
  • You want to understand local AI inference before it becomes a real business decision

Skip it if:

  • Your work needs frontier-quality multi-file reasoning where accuracy matters more than cost
  • You're not comfortable opening a terminal at all
  • Your Mac is more than 5-6 years old without enough RAM
  • You're on Windows or Linux (this guide assumes Mac)

What this changes

A year ago, "running AI locally" was a hobbyist play. Slow, capability-limited, only useful for tinkering.

That's not where we are anymore. The local models are good enough for a huge category of real work. The companies that lead the next decade aren't the ones who pay the most for the best frontier model. They're the ones who match the right model to the right job and run the rest cheap.

If you're going to operate at any scale, this is the muscle to build. Start by knowing how to run local. The rest follows.

Steve Tan

Steve Tan

Builder · Operator · Advisor

20+ years building businesses the hard way across eCommerce, SaaS, agency, education, and supply chain. $200M+ in revenue. Now I help business owners turn AI into their unfair advantage.

More about Steve
Running Claude Code Free On Your Laptop — Steve Tan