Ethical Hacking of AI
The practice of systematically testing AI systems for vulnerabilities, biases, and failure modes with the goal of improving safety and robustness before malicious actors find the same weaknesses.
Why It Matters
Ethical hacking of AI is an emerging discipline that combines traditional security testing with AI-specific attacks like prompt injection and adversarial examples.
Example
A security team probing an AI customer service bot for: prompt injection vulnerabilities, data leakage risks, bias in responses, and harmful output generation.
Think of it like...
Like hiring a locksmith to test your locks — they try everything a burglar would, but they report vulnerabilities to you instead of exploiting them.
Related Terms
Red Teaming
The practice of systematically testing AI systems by attempting to find failures, vulnerabilities, and harmful behaviors before deployment. Red teamers actively try to break the system.
Adversarial Attack
An input deliberately crafted to fool an AI model into making incorrect predictions. Adversarial examples often look normal to humans but cause models to fail spectacularly.
Prompt Injection
A security vulnerability where malicious input is crafted to override or manipulate an LLM's system prompt or instructions, causing it to behave in unintended ways.
AI Safety
The research field focused on ensuring AI systems operate reliably, predictably, and without causing unintended harm. It spans from technical robustness to long-term existential risk concerns.