A bandit, in Decision Theory, describes a sequential problem where an agent must choose actions to maximize cumulative reward over time, often with incomplete information. This Multi-armed Bandit problem elegantly models scenarios requiring a balance between exploring new options and exploiting known successful choices.