Build guide

Start Here: I'm New to Local AI

Not sure what hardware you need? This guide explains your options in plain English — from Mac Mini to DIY builds — and points you to the right setup for your situation.

Budget

Profile

LOCAL DEV

Target model

Llama 3.1 8B

Customize All guides

Performance data is unavailable for this guide yet.

Why this build

Running AI models locally means your computer does all the thinking — no subscription fees, no data leaving your machine. The first question isn't 'which GPU?' — it's 'build or buy?' If you want zero setup, start with a Mac Mini M4 ($799). Plug it in, install Ollama, and you're running Llama 3.1 8B in 15 minutes. The downside: you can't upgrade individual parts, and you hit a ceiling at 7B–12B models on the base 16GB config. If you want to run bigger models (34B–70B) without building a PC, the NVIDIA DGX Spark ($3,999) is a desktop appliance with 128GB of unified memory. It supports the full CUDA toolchain out of the box. If you're comfortable assembling a PC (a 2–4 hour weekend project), a DIY build gives you the most performance per dollar. A $1,800 build with an RTX 4080 SUPER runs 7B models faster than any Mac Mini and can be upgraded later. Key concepts in plain English: • 'Model size' (7B, 14B, 70B) — bigger = smarter but slower and needs more memory • 'VRAM' — the GPU's dedicated memory; models that don't fit run slowly or not at all • 'Quantization' — how compressed the model is; Q4 is the standard balance of speed vs quality • 'tok/s' — tokens per second; above ~10 tok/s feels like real-time conversation Use the wizard below to answer 5 quick questions and get a personalized recommendation.

Start Here: I'm New to Local AI

Why this build

Parts list

Start Here: I'm New to Local AI

Why this build

Parts list