Home Inference Workstation
RTX 4090 powerhouse for 8B–34B models with headroom for agent workflows.
~$4,200
~81.9 tok/s · 19.50 tok/s per $1k (Qwen 2.5 14B)
View guideBuild templates
Curated AI workstation templates with tok/s and value ratings — customize any guide in the builder.
RTX 4090 powerhouse for 8B–34B models with headroom for agent workflows.
~$4,200
~81.9 tok/s · 19.50 tok/s per $1k (Qwen 2.5 14B)
View guideSingle RTX 4080 SUPER build for running 7B–14B models locally with llama.cpp or Ollama.
~$2,800
~52 tok/s · 18.57 tok/s per $1k (Llama 3.1 8B)
View guideTwin 4090s for high-throughput 34B–70B inference with NVLink-ready parts.
~$6,500
~68.8 tok/s · 10.58 tok/s per $1k (Llama 3.3 70B)
View guideTeam-grade dual 4090 rig targeting Llama 3.3 70B at Q4.
~$12,000
~59 tok/s · 4.92 tok/s per $1k (Llama 3.3 70B)
View guideDual-GPU workstation tuned for agent workloads with 16K context depth.
~$4,500
~130.2 tok/s · 28.93 tok/s per $1k (Qwen 3 Coder)
View guideMinimal spend path to a solid Llama 3.1 8B daily driver.
~$1,800
~52 tok/s · 28.89 tok/s per $1k (Llama 3.1 8B)
View guideCost-conscious 14B inference box with modern single-GPU VRAM.
~$2,000
~52 tok/s · 26.00 tok/s per $1k (Qwen 2.5 14B)
View guideCost-optimized build using a used RTX 3090 for 70B experimentation at Q3 quant.
~$2,200
~35.1 tok/s · 15.95 tok/s per $1k (Llama 3.3 70B)
View guide64GB RAM and RTX 4090 for LoRA fine-tuning on 8B–14B models.
~$4,800
~66.3 tok/s · 13.81 tok/s per $1k (Qwen 2.5 14B)
View guideHigh-memory build for concurrent team inference with vLLM on large models.
~$5,000
~59 tok/s · 11.80 tok/s per $1k (Llama 3.3 70B)
View guide