Can I fine-tune a 7B model on an RTX 3080 (10GB)?

With QLoRA and careful settings, it is possible but tight. The minimum for 7B QLoRA is approximately 12GB, so a 10GB card is under the comfortable threshold. You can reduce batch size to 1 and use gradient checkpointing to cut VRAM further, but training will be slow and you will need short sequence lengths. An RTX 3080 Ti (12GB) or higher is a more comfortable starting point for 7B fine-tuning.

How long does QLoRA fine-tuning take on a single RTX 4090?

Highly variable — it depends on dataset size, model size, number of training steps, and sequence length. A small domain-specific fine-tune of a 7B model on a few thousand examples can complete in an hour or two on an RTX 4090. A larger dataset at 70B scale on dual 3090s can take days. Plan for iteration: the first few runs are usually experiments to find good hyperparameters, not production training runs.

Do I need NVLink for dual-GPU fine-tuning?

For 70B models, NVLink makes a significant difference over PCIe for training. During backpropagation, gradients must be synchronized between GPUs, and NVLink's higher bandwidth reduces the synchronization overhead substantially. For inference only, PCIe is more acceptable. If your primary goal is fine-tuning 70B models on dual 3090s, the NVLink bridge ($120–200) is a worthwhile addition.

Explainer

Fine-Tuning LLMs at Home: Hardware Requirements (2026)

June 16, 20267 min readBy the ClankerBuilder editorial team · how we rate

What hardware you need to fine-tune LLMs locally with QLoRA — VRAM, system RAM, storage, and which GPU tiers work for 7B, 13B, and 70B models.

Fine-tuning a large language model requires meaningfully more VRAM than running one. During inference, you store only the model weights and the KV cache. During training, you store the weights, the gradients for each parameter, the optimizer state (which can be two to three times the size of the weights alone), and the activation tensors for the forward pass — all simultaneously. A model that runs comfortably on 8GB of VRAM for inference may need 24GB or more to fine-tune.

QLoRA changes this dramatically. By freezing the base model in 4-bit quantization and training only a small set of adapter layers, QLoRA reduces the active memory during fine-tuning to a fraction of full fine-tuning. It is not magic — you still need more VRAM than inference — but it makes fine-tuning a 7B model possible on a single consumer GPU and 70B feasible on two. This guide covers what "fine-tuning at home" actually means in practice, the hardware tiers that work, and what system RAM and storage are often underestimated.

Fine-Tuning vs Inference: Why VRAM Requirements Differ

Inference is read-only: you load the model weights into VRAM, generate tokens by passing inputs through the network, and the weights never change. The memory footprint is essentially fixed: weights plus KV cache plus a small runtime overhead.

Training requires writing, not just reading. For every parameter in the model, you must store not just its value (the weight) but also its gradient — how much the loss changes with respect to that weight — so the optimizer can update it. Common optimizers like AdamW maintain two additional per-parameter values (first and second moment estimates), tripling the memory needed just for optimizer state. Then there are the activation tensors from the forward pass, which must be retained until the backward pass completes. Full fine-tuning of a 7B model at FP16 can require 60–80GB of VRAM, which is entirely impractical on consumer hardware. QLoRA's approach of freezing the base model in 4-bit and training only adapter layers collapses this to something manageable.

QLoRA: The Practical Approach

QLoRA (Quantized Low-Rank Adaptation) works by loading the base model in 4-bit quantization — too low precision to train directly — and adding a small set of trainable adapter layers (LoRA adapters) on top. The gradients only flow through the adapters, not the frozen base model, so you avoid storing gradients and optimizer state for the base model's billions of parameters. The adapters are small enough that their training overhead is manageable.

In practice, QLoRA lets you fine-tune a 7B model on a single RTX 4090 (24GB) or RTX 3090 (24GB). A 13B model fits on one 24GB card with reduced batch size. A 70B model needs two 24GB cards at minimum — specifically two RTX 3090s with NVLink, since the 48GB pooled VRAM covers the base model plus adapter training overhead. The main community implementations are Axolotl (recommended for flexibility and configuration) and LLaMA-Factory (more UI-friendly). Both support QLoRA out of the box and have active development communities.

One important caveat: the quality of the fine-tune depends on dataset size and quality, not just hardware. QLoRA makes fine-tuning accessible, but a well-prepared dataset is the actual bottleneck for most home fine-tuning projects.

Base model frozen in 4-bit — no gradients through the base weights
LoRA adapters: small, trainable — gradient and optimizer state only for adapters
7B on 1× RTX 4090 or 3090 (24GB) — the sweet spot for home fine-tuning
70B needs 2× 24GB with NVLink minimum — dual 3090 is the budget path
Community tools: Axolotl, LLaMA-Factory — both support QLoRA

VRAM Requirements by Model Size (QLoRA)

The table below shows approximate minimum VRAM for QLoRA fine-tuning at typical batch sizes. These figures assume a sequence length of 2,048 tokens and small batch sizes (1–4). Larger batch sizes, longer sequences, or gradient accumulation over fewer steps increase VRAM usage. Treat these as starting-point estimates, not guarantees.

Approximate minimum VRAM for QLoRA fine-tuning by model size.
Model Size	QLoRA Min VRAM	Recommended GPU
7B	~12 GB	RTX 3080 / RTX 4070 (12GB)
13B	~18 GB	RTX 4090 or RTX 3090 (24GB)
34B	~36 GB	2× 24GB GPU
70B	~48 GB	2× RTX 3090 with NVLink

System RAM: The Often-Ignored Requirement

VRAM gets all the attention, but system RAM is frequently the bottleneck that people hit first. During fine-tuning, the data loader prefetches training batches into system RAM, tokenized datasets are cached in memory for fast access, and if you use gradient checkpointing (a common VRAM-reduction technique), some activations are offloaded to system RAM. The practical rule of thumb: plan for at least 2× your VRAM in system RAM for comfortable fine-tuning without data loading bottlenecks.

For a 7B fine-tuning setup with 24GB VRAM, 32GB of system RAM is the minimum and 64GB is comfortable. For 13B, 64GB minimum. For 70B on dual 3090s (48GB combined VRAM), 128GB of system RAM is recommended. Storage speed also matters: a fast NVMe SSD reduces dataset loading time and matters when iterating on training runs with frequent checkpoint saves.

7B QLoRA: 32GB system RAM minimum, 64GB recommended
13B QLoRA: 64GB system RAM minimum
70B QLoRA (dual GPU): 128GB system RAM recommended
Fast NVMe SSD: reduces dataset prefetch time on large datasets
Rule of thumb: 2× VRAM in system RAM for comfortable training

Build Recommendations by Budget

At $1,500, an RTX 4090 (24GB) paired with 64GB of DDR5 system RAM is the practical home fine-tuning sweet spot. It handles 7B models comfortably and 13B with reduced batch size. This is the build most developers doing domain-specific fine-tuning on modest datasets should target.

At $3,000, a dual RTX 3090 build with an NVLink bridge and 128GB of system RAM opens up 70B QLoRA fine-tuning — the most capable consumer home setup available. The dual 3090 with NVLink approach is meaningfully better for training than dual 4090s via PCIe, because the NVLink bandwidth reduces inter-GPU gradient synchronization overhead. Use the build configurator to spec a compatible motherboard, CPU, and power supply for either setup.

$1,500 path: RTX 4090 + 64GB DDR5 — fine-tunes up to 13B with QLoRA
$3,000 path: dual RTX 3090 (NVLink) + 128GB RAM — fine-tunes up to 70B with QLoRA
NVLink preferred over PCIe for training (gradient sync overhead)
CPU: any modern high-core-count CPU; data loading is CPU-bound

Frequently asked questions

Can I fine-tune a 7B model on an RTX 3080 (10GB)?: With QLoRA and careful settings, it is possible but tight. The minimum for 7B QLoRA is approximately 12GB, so a 10GB card is under the comfortable threshold. You can reduce batch size to 1 and use gradient checkpointing to cut VRAM further, but training will be slow and you will need short sequence lengths. An RTX 3080 Ti (12GB) or higher is a more comfortable starting point for 7B fine-tuning.
How long does QLoRA fine-tuning take on a single RTX 4090?: Highly variable — it depends on dataset size, model size, number of training steps, and sequence length. A small domain-specific fine-tune of a 7B model on a few thousand examples can complete in an hour or two on an RTX 4090. A larger dataset at 70B scale on dual 3090s can take days. Plan for iteration: the first few runs are usually experiments to find good hyperparameters, not production training runs.
Do I need NVLink for dual-GPU fine-tuning?: For 70B models, NVLink makes a significant difference over PCIe for training. During backpropagation, gradients must be synchronized between GPUs, and NVLink's higher bandwidth reduces the synchronization overhead substantially. For inference only, PCIe is more acceptable. If your primary goal is fine-tuning 70B models on dual 3090s, the NVLink bridge ($120–200) is a worthwhile addition.