Hardware for

LLM Models

VRAM requirements and GPU recommendations for running popular models locally.

Llama 3.1 8B

4.9 GB weights

VRAM Required5.4 GB

Model Size4.9 GB

KV Cache0.5 GB

4.4 GB weights

VRAM Required4.8 GB

Model Size4.4 GB

KV Cache0.4 GB

9 GB weights

VRAM Required9.8 GB

Model Size9 GB

KV Cache0.8 GB

18 GB weights

VRAM Required19.2 GB

Model Size18 GB

KV Cache1.2 GB

40 GB weights

VRAM Required43 GB

Model Size40 GB

KV Cache3 GB

VRAM Required includes model weights, KV cache, and runtime overhead for Q4_K_M quantization at typical context lengths.

Model Size is the raw weight data size. Quantized models use less VRAM but may have slightly reduced quality.

KV Cache grows with context length. Longer conversations or documents require additional VRAM headroom.