Build guide
Cost-optimized build using a used RTX 3090 for 70B experimentation at Q3 quant.
Budget
$2,200
Profile
LOCAL DEV
Target model
Llama 3.3 70B
35.1tok/s
29.8–40.4 tok/s decode on Llama 3.3 70B
Value: 15.95 tok/s per $1k
The only way to touch Llama 3.3 70B under $2,200 is a used RTX 3090. At 24 GB VRAM, it runs 70B at Q3_K_S quantization — quality is noticeably reduced compared to Q4, but it generates coherent output and is a legitimate way to explore large models before committing to a dual-GPU build. The 3090 also handles 14B at Q8 with ease if 70B feels too slow. We spec 64 GB of system RAM here because Q3 70B leaves almost no VRAM headroom for KV cache; with a large context window the runtime will spill to system memory, and you want that to be fast DDR5 rather than a bottleneck. The 1000W PSU accounts for the 3090's higher-than-expected peak draw on used cards. This build is for experimentation, not daily production use. Expect 4–8 tok/s on 70B — usable for reading long outputs, not interactive chat.