Build guide

Used GPU Budget Build

Cost-optimized build using a used RTX 3090 for 70B experimentation at Q3 quant.

Budget

$2,200

Profile

LOCAL DEV

Target model

Llama 3.3 70B

View saved build Customize All guides

Performance

LOW

35.1tok/s

29.8–40.4 tok/s decode on Llama 3.3 70B

Value: 15.95 tok/s per $1k

Why this build

The only way to touch Llama 3.3 70B under $2,200 is a used RTX 3090. At 24 GB VRAM, it runs 70B at Q3_K_S quantization — quality is noticeably reduced compared to Q4, but it generates coherent output and is a legitimate way to explore large models before committing to a dual-GPU build. The 3090 also handles 14B at Q8 with ease if 70B feels too slow. We spec 64 GB of system RAM here because Q3 70B leaves almost no VRAM headroom for KV cache; with a large context window the runtime will spill to system memory, and you want that to be fast DDR5 rather than a bottleneck. The 1000W PSU accounts for the 3090's higher-than-expected peak draw on used cards. This build is for experimentation, not daily production use. Expect 4–8 tok/s on 70B — usable for reading long outputs, not interactive chat.

Parts list

GPU
NVIDIA GeForce RTX 3090 (Used)
Amazon · $546
$546
CPU
AMD Ryzen 7 7800X3D
Amazon · $346
$346
Motherboard
MSI MAG B650 TOMAHAWK WIFI
Amazon · $220
$220