Artificial Intelligence

Edge Inference

Running AI models directly on local devices (phones, IoT sensors, cameras) rather than sending data to the cloud. This reduces latency, preserves privacy, and works without internet connectivity.

Why It Matters

Edge inference enables real-time AI in autonomous vehicles, smart factories, and mobile apps where cloud latency or connectivity is unacceptable.

Example

Your iPhone running Face ID locally on the device — the neural network processes your face on the phone's chip, never sending your biometric data to Apple's servers.

Think of it like...

Like having a doctor on-site at a factory versus calling one remotely — on-site (edge) means instant response, no communication delays, and no sensitive data leaving the building.

Related Terms

Inference

The process of using a trained model to make predictions on new, previously unseen data. Inference is what happens when an AI model is deployed and actively serving results to users.

Quantization

The process of reducing the precision of a model's numerical weights (e.g., from 32-bit to 8-bit or 4-bit), making the model smaller and faster while accepting a small trade-off in accuracy.

Back to Glossary