intermediate

Local AI Models

The Pocket Library

7 min read

The Analogy

The Pocket Library

Everything you need — offline, private, free after setup — running entirely on your own machine.

For years, AI required internet access and API costs. Ollama changed this: one command and you're running Llama 3, Mistral, or Gemma locally — no internet, no API key, no per-token cost. Your data never leaves your machine. Response times are fast on modern hardware. For privacy-sensitive work, local AI is increasingly the right choice.

In Plain English

Local AI models run entirely on your own hardware — no internet required. Tools like Ollama make it simple to download and run open source models like Llama, Mistral, or Gemma on your laptop. Free after setup, completely private.

The Technical Picture

Local inference uses tools like Ollama, LM Studio, or llama.cpp to run quantised GGUF models via CPU or local GPU. Performance depends on RAM (7B models need ~8GB, 13B need ~16GB) and GPU VRAM. The OpenAI-compatible API exposed by Ollama allows existing code to switch from cloud to local with a URL change.

Real-World Examples

Ollama running Llama 3.1 8B on a MacBook Pro M3 at 50 tokens/second
LM Studio providing a GUI for running and testing local models
Cursor can be configured to use a local Ollama model instead of Claude or GPT-4

Key Takeaway

Ollama + Llama 3 on your laptop = free, private, offline AI that's good enough for most tasks.

Related Concepts

Open Source vs Closed Source AI

Model Quantisation

Inference vs Training

Back to All Concepts