Build guide
Not sure what hardware you need? This guide explains your options in plain English — from Mac Mini to DIY builds — and points you to the right setup for your situation.
Budget
$0
Profile
LOCAL DEV
Target model
Llama 3.1 8B
Running AI models locally means your computer does all the thinking — no subscription fees, no data leaving your machine. The first question isn't 'which GPU?' — it's 'build or buy?' If you want zero setup, start with a Mac Mini M4 ($799). Plug it in, install Ollama, and you're running Llama 3.1 8B in 15 minutes. The downside: you can't upgrade individual parts, and you hit a ceiling at 7B–12B models on the base 16GB config. If you want to run bigger models (34B–70B) without building a PC, the NVIDIA DGX Spark ($3,999) is a desktop appliance with 128GB of unified memory. It supports the full CUDA toolchain out of the box. If you're comfortable assembling a PC (a 2–4 hour weekend project), a DIY build gives you the most performance per dollar. A $1,800 build with an RTX 4080 SUPER runs 7B models faster than any Mac Mini and can be upgraded later. Key concepts in plain English: • 'Model size' (7B, 14B, 70B) — bigger = smarter but slower and needs more memory • 'VRAM' — the GPU's dedicated memory; models that don't fit run slowly or not at all • 'Quantization' — how compressed the model is; Q4 is the standard balance of speed vs quality • 'tok/s' — tokens per second; above ~10 tok/s feels like real-time conversation Use the wizard below to answer 5 quick questions and get a personalized recommendation.