Notable References
- Reinforcement learning with immediate rewards and linear hypotheses
- Explore/exploit schemes for web content optimization
- Online models for content optimization
- Sample mean based index policies with o(log n) regret for the multi-armed bandit problem
- Just-in-time contextual advertising
- Using confidence bounds for exploitation-exploration trade-offs
- Finite-time analysis of the multi-armed bandit problem
- The non-stochastic multi-armed bandit problem
- Bandit Problems: Sequential Allocation of Experiments
- The Adaptive Web — Methods and Strategies of Web Personalization
- Hybrid systems for personalized recommendations
- Personalized recommendation on dynamic content using predictive bilinear models
- A case study of behavior-driven conjoint analysis on Yahoo!: Front Page Today Module
- Google news personalization: scalable online collaborative filtering
- Bandit processes and dynamic allocation indices
- Efficient bandit algorithms for online multi-class prediction
- Asymptotically efficient adaptive allocation rules
- The epoch-greedy algorithm for contextual multi-armed bandits
- Information Theory, Inference, and Learning Algorithms
- Text-learning and related intelligent agents: A survey
- Naïve filterbots for robust cold-start recommendations
- Simulation studies of multi-armed bandits with covariates
- Eligibility traces for off-policy policy evaluation
- Some aspects of the sequential design of experiments
- Recommender systems in e-commerce
- On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
- Exploring compact reinforcement-learning representations with linear regression