Bandit

1 revision
#11 week ago
+6
Auto-generated stub article
+A bandit, in [Decision Theory](/wiki/decision-theory), describes a sequential problem where an agent must choose actions to maximize cumulative reward over time, often with incomplete information. This [Multi-armed Bandit](/wiki/multi-armed-bandit) problem elegantly models scenarios requiring a balance between exploring new options and exploiting known successful choices.
+## See also
+- [Reinforcement Learning](/wiki/reinforcement-learning)
+- [Algorithm](/wiki/algorithm)
+- [Optimization](/wiki/optimization)
... 1 more lines