Data Augmentation
Techniques that expand training data by creating modified versions of existing examples — like rotating images or paraphrasing text.
Data augmentation artificially increases dataset size by applying label-preserving transformations to existing examples. For images this includes rotations, crops, color jitter, and flips. For text it includes paraphrasing, word substitution, and back-translation.
Augmentation acts as a powerful regularizer by exposing the model to more variations of the same underlying concepts. It's especially valuable when labeled data is limited.
Modern image training uses aggressive augmentation pipelines. Text augmentation is trickier since small changes can alter meaning, but techniques like back-translation and LLM paraphrasing are common.