Bomber — Rajat Goyal

Bomber

Leaderboard

You0 — 0AI

Why Thompson Sampling works

Thompson Sampling continuously learns and relearns. When conditions shift, it re-explores automatically.

In this game, you move, the bot repositions, and craters reshape terrain. Each is a contextual parameter the AI adapts to.

No rules. No retraining. Just Bayesian updating from sparse signals.

In production systems, Thompson Sampling operates across many more dimensions simultaneously, and half the levers or context signals are things you cannot even feature-engineer upfront. The challenge is fine-tuning how quickly the system learns and relearns as end-user context changes.

Even in this simple game, a human using rules or intuition struggles to match a basic Thompson Sampling bot. Now imagine real-world systems with hundreds of levers: channel, timing, content, tone, frequency. The combinatorial decisions become enormous. It is not just the scale that rules cannot handle. It is also the changing situation where existing rules are no longer relevant, and the cadence of being able to observe, decide, and update those rules is far too slow.

And the goal is rarely singular. In this game the objective is to hit the opponent. In the real world, systems must balance multiple business priorities and goals simultaneously while navigating the same combinatorial decision space.

This is what Aampe solves at scale. Thompson Sampling is one of many tools, alongside multi-armed bandits, contextual weighting, normalization, and decay functions, that help per-user AI agents adapt to changing context. The core principle stays the same: explore, exploit, adapt.

repositioning available in next round

Attempt log

AI hasn't fired yet. Move and fire first.

AI posterior beliefs

context: wind calm

hover a bar for details

5%power →100%

As you play, watch this chart converge toward your position. That means the AI has learned your contextual win point. Try moving after a few rounds and you will see how quickly it relearns based on your new location, terrain changes, and its own repositioning.

Play fair

OFF

The AI only sees success/failure signals. It never sees terrain, positions, or trajectories. Toggle this to play with the same blind constraint. You'll only see hit/miss distance after each shot. This mirrors real-world decision-making: sparse feedback, no full picture.

Reward & penalty

Close hit (<60px)α+1

Direct hitα+3

Miss penalty: +1.0β

Off-screen: +2.5β

Higher = faster learning but more risk of over-correction. The AI may abandon a good arm too quickly after a single unlucky miss.