Edge Inference
Running AI models directly on local devices (phones, IoT sensors, cameras) rather than sending data to the cloud. This reduces latency, preserves privacy, and works without internet connectivity.
Why It Matters
Edge inference enables real-time AI in autonomous vehicles, smart factories, and mobile apps where cloud latency or connectivity is unacceptable.
Example
Your iPhone running Face ID locally on the device — the neural network processes your face on the phone's chip, never sending your biometric data to Apple's servers.
Think of it like...
Like having a doctor on-site at a factory versus calling one remotely — on-site (edge) means instant response, no communication delays, and no sensitive data leaving the building.
Related Terms
Inference
The process of using a trained model to make predictions on new, previously unseen data. Inference is what happens when an AI model is deployed and actively serving results to users.
Quantization
The process of reducing the precision of a model's numerical weights (e.g., from 32-bit to 8-bit or 4-bit), making the model smaller and faster while accepting a small trade-off in accuracy.