Hardware for
LLM Models
VRAM requirements and GPU recommendations for running popular models locally.
Understanding VRAM Requirements
VRAM Required includes model weights, KV cache, and runtime overhead for Q4_K_M quantization at typical context lengths.
Model Size is the raw weight data size. Quantized models use less VRAM but may have slightly reduced quality.
KV Cache grows with context length. Longer conversations or documents require additional VRAM headroom.