Adversarial Training
A defense technique where adversarial examples are included in the training data to make the model more robust against attacks. The model learns to handle both normal and adversarial inputs.
Why It Matters
Adversarial training is the most effective known defense against adversarial attacks, making models significantly more robust for safety-critical applications.
Example
Generating adversarial versions of training images and including them in training, teaching the classifier to correctly identify objects even when adversarial noise is present.
Think of it like...
Like a martial artist who practices against opponents who use unconventional techniques — the unexpected practice makes them better prepared for real fights.
Related Terms
Adversarial Attack
An input deliberately crafted to fool an AI model into making incorrect predictions. Adversarial examples often look normal to humans but cause models to fail spectacularly.
Robustness
The ability of an AI model to maintain reliable performance when faced with unexpected inputs, adversarial attacks, data distribution changes, or edge cases.
Data Augmentation
Techniques for artificially expanding a training dataset by creating modified versions of existing data. This helps models generalize better, especially when training data is limited.
Regularization
Techniques used to prevent overfitting by adding constraints or penalties to the model during training. Regularization discourages the model from becoming too complex or fitting noise in the training data.
AI Safety
The research field focused on ensuring AI systems operate reliably, predictably, and without causing unintended harm. It spans from technical robustness to long-term existential risk concerns.