AI Engineering Glossary
The definitive dictionary for modern AI architecture, training, and inference terminology.
Prompting
Chain of Thought (CoT)
A prompting strategy that forces the model to articulate its intermediate reasoning steps before providing a final answer.
Read Full Definition →
Inference
KV Cache (Key-Value Cache)
A mechanism used during autoregressive generation to store previously computed Keys and Values, preventing redundant calculations.
Read Full Definition →
Training
LoRA (Low-Rank Adaptation)
A highly efficient fine-tuning technique that freezes the base model weights and trains a small set of injected low-rank matrices.
Read Full Definition →
Architecture
MoE (Mixture of Experts)
A neural network architecture that utilizes multiple specialized sub-networks ("experts"), routing tokens only to the most relevant ones.
Read Full Definition →
Architecture
RAG (Retrieval-Augmented Generation)
A framework that grounds an LLM by fetching external, up-to-date information from a database before generating a response.
Read Full Definition →