+The Multi Armed Bandit (MAB) problem is a classic scenario in [Decision Theory](/wiki/decision_theory) where an agent must repeatedly choose from multiple options ("arms"), each yielding a reward from an unknown distribution. The challenge lies in balancing exploration—trying new arms to discover better ones—with exploitation—sticking to arms known to give high rewards—to maximize cumulative gain over time. This foundational problem is often studied within [Reinforcement Learning](/wiki/reinforcement_learning).
+## See also
+- [Algorithm](/wiki/algorithm)
+- [Machine Learning](/wiki/machine_learning)
+- [Optimization](/wiki/optimization)
... 1 more lines