Contextual Bandits
An extension of multi-armed bandits where the agent observes context (features) before making a decision, enabling personalized choices based on the current situation.
Why It Matters
Contextual bandits power personalized recommendations, dynamic pricing, and adaptive user interfaces — any decision that should consider the current context.
Example
A news app choosing which article to show based on user features (time of day, location, reading history) — different users see different content based on their context.
Think of it like...
Like a bartender who recommends different drinks based on whether you are celebrating, stressed, or just casually visiting — the suggestion depends on context.
Related Terms
Multi-Armed Bandit
A simplified reinforcement learning problem where an agent must choose between multiple options (arms) with unknown payoffs, balancing exploration of new options with exploitation of known good ones.
Reinforcement Learning
A type of machine learning where an agent learns to make decisions by taking actions in an environment and receiving rewards or penalties. The agent aims to maximize cumulative reward over time through trial and error.
Recommendation System
An AI system that predicts and suggests items a user might be interested in based on their behavior, preferences, and similarities to other users.
Exploration vs Exploitation
The fundamental tradeoff in reinforcement learning between trying new actions (exploration) to discover potentially better strategies and using known good actions (exploitation) to maximize current reward.