Skip to content
ClankerBuilder
Sign in

Hardware for

LLM Models

VRAM requirements and GPU recommendations for running popular models locally.

Llama 3.1 8B

4.9 GB weights

VRAM Required5.4 GB
Model Size4.9 GB
KV Cache0.5 GB
See recommended GPUs

Mistral 7B

4.4 GB weights

VRAM Required4.8 GB
Model Size4.4 GB
KV Cache0.4 GB
See recommended GPUs

Qwen 2.5 14B

9 GB weights

VRAM Required9.8 GB
Model Size9 GB
KV Cache0.8 GB
See recommended GPUs

Qwen 3 Coder

18 GB weights

VRAM Required19.2 GB
Model Size18 GB
KV Cache1.2 GB
See recommended GPUs

Llama 3.3 70B

40 GB weights

VRAM Required43 GB
Model Size40 GB
KV Cache3 GB
See recommended GPUs

Understanding VRAM Requirements

VRAM Required includes model weights, KV cache, and runtime overhead for Q4_K_M quantization at typical context lengths.

Model Size is the raw weight data size. Quantized models use less VRAM but may have slightly reduced quality.

KV Cache grows with context length. Longer conversations or documents require additional VRAM headroom.