Training

LoRA (Low-Rank Adaptation)

LoRA drastically reduces the computational cost and memory footprint required to fine-tune massive language models. Instead of updating billions of parameters, LoRA freezes the pre-trained weights and injects trainable rank decomposition matrices into the layers of the Transformer architecture. This allows practitioners to fine-tune models like Llama 3 on consumer GPUs, producing adapter weights that are often just a few megabytes in size.