Visual evidence
Strategy Writeup: Robust Meta-Aware Dragapult ex Agent
TL;DR
Our final Strategy entry is a deterministic, rule-based Dragapult ex agent. The policy is built around a simple idea: develop Dragapult ex reliably, plan turns around the prize race rather than raw damage, and use matchup-aware disruption only when it improves the current game state.
The best observed Simulation result for Team TomTom was rank #73 with score 1076.4. A later prepared leaderboard snapshot showed rank #86 / 3243 with score 1068.8. This was the strongest prepared Simulation-backed Strategy candidate.
The main lesson from the project was not simply “Dragapult is good.” The important lesson was that local testing and actual leaderboard performance could differ sharply. We therefore treated the public leaderboard, official episode metadata, local gauntlets, meta-proxy tests, and failed submissions as separate pieces of evidence. The final agent is the one that survived that full process best: a conservative, explainable Dragapult policy that adapts to multiple threat families without over-specializing to one local matchup.
Competition context
This Strategy writeup is for the prize competition pokemon-tcg-ai-battle-challenge-strategy (240,000 USD, deadline 2026-09-13 23:59). The Simulation competition pokemon-tcg-ai-battle has reward Knowledge; we use Simulation results as performance evidence, but the prize target is the Strategy submission.
Final agent
Best observed rank/score: rank #73, score 1076.4
Prepared snapshot: rank #86 / 3243, score 1068.8
Primary archetype: Dragapult ex / Budew / Fezandipiti ex / Latias ex / Meowth ex, supported by Dreepy, Drakloak, Rare Candy, Crushing Hammer, Boss's Orders, Crispin, Lillie's Determination, and Team Rocket's Watchtower.
The policy is deterministic. For each legal-option decision, it:
- Parses the board, hand, discard, available attackers, and visible resources.
- Estimates card counts where possible.
- Builds a prize-race plan around Dragapult ex and Phantom Dive.
- Scores legal actions using setup, evolution, attachment, supporter, retreat, Boss target, bench-damage, and disruption modules.
- Avoids dangerous or low-value actions, including damage into immunity/counter targets and unnecessary draw/search at low deck count.
Naming guide for experiment labels
Some labels in our local logs were short internal experiment names. For the external writeup, the important point is not the label itself but what strategic idea it represented.
| Label used in this writeup | External-facing meaning |
|---|---|
| Final Dragapult policy | The selected deterministic Dragapult ex strategy: stable setup first, prize-race planning, conditional disruption, and matchup-aware target selection. |
| Lucario/Riolu policy | A separate Lucario-based strategy engine used as secondary evidence and as a meta-coverage comparison point. |
| Resource-inference Dragapult variants | Dragapult experiments that emphasized visible-resource tracking, deck-count awareness, and remaining-card inference. |
| Mirror / target-energy / no-Hammer-Boss variants | Focused ablations used to test whether narrower targeting or disruption changes improved live performance. |
| Alternate archetype tests | Hop/Dunsparce, Alakazam/Dunsparce, Crustle wall, Iono/Bellibolt, Abomasnow/Kyogre, and similar families used as opponent anchors or meta probes. |
| Failure labels | Human-readable tags for repeated loss patterns, such as late setup, Lucario pressure, wall pressure, or control/stall pressure. |
| Portfolio evidence | Offline meta-coverage analysis comparing Dragapult and Lucario approaches. It is not claimed as a single Kaggle package. |
The raw local file names are kept in the private evidence pack for reproducibility, but the Strategy narrative uses the strategic descriptions above.
Why this agent was the final choice
We tried many directions before settling on this one: Dragapult resource variants, Lucario, Iono/Bellibolt, Crustle wall, Abomasnow/Kyogre, same-deck mirror tuning, target-energy variants, no-Hammer/Boss variants, Hop/Dunsparce, Alakazam/Dunsparce, and early search/router ideas.
Several of those ideas were reasonable. Some even looked promising locally. But the actual leaderboard punished brittle changes. Focused patches could improve one local matchup while lowering broader Simulation performance. Meta-counter decks could be useful as teachers or anchors without being the right final package.
The final Dragapult direction won because it was deliberately boring in the right places:
- It tries to reach Dragapult ex reliably before greedier support lines.
- It values multi-prize Phantom Dive turns over generic damage.
- It maps Boss targets and bench damage by matchup instead of using one global target rule.
- It treats Lucario, Alakazam/Dunsparce, Crustle, Iono/Bellibolt, Abomasnow/Kyogre, mirrors, and low-HP swarm boards as different threat classes.
- It uses Crushing Hammer, Boss, Unfair Stamp, and Watchtower only when they support the current prize plan or prevent a loaded attacker line.
- It rejects broad overfit patches, even when they improve a single smoke test.
Top-scoring agent breakdown
| Layer | What it does | Why it matters |
|---|---|---|
| Performance anchor | Final Dragapult ex policy, best observed rank #73 / score 1076.4; prepared snapshot score 1068.8 | Establishes the final Dragapult ex policy as our primary Simulation-backed Strategy evidence. |
| Board/resource parser | Tracks active/bench state, hand/discard counts, visible deck resources, prize uncertainty, turn logs, and prior KO/item-lock signals | Lets the agent avoid blind plays and recognize setup/comeback states. |
| Prize planner | main_option_proc() evaluates active targets plus Phantom Dive bench-counter combinations | Makes the policy prize-race first rather than damage-first. |
| Target map | pokemon_score() weights prize value, energy, tools, stage, HP, and known threat pieces | Encodes matchup knowledge for Lucario, Abomasnow/Kyogre, Iono/Bellibolt, Crustle/Dwebble, and swarm boards. |
| Setup engine | hand_score() strongly values Dreepy, Drakloak, Dragapult ex, Rare Candy, Poffin, and search lines when evolution is available | Directly addresses the repeated failure mode of late or missing Dragapult. |
| Support guardrails | Conditional use of Budew, Meowth ex, Latias ex, Fezandipiti ex, and draw/search cards | Prevents over-passive or support-heavy turns when the opponent is already racing. |
| Disruption timing | Boss, Crushing Hammer, Unfair Stamp, and Watchtower are scored by prize plan/comeback/threat state | Keeps disruption conditional instead of blindly spending tempo. |
| Safety guards | Avoids damage into immunity/counter targets and restricts draw/search at low deck count | Reduces wasted Phantom Dive counters and deck-out risk. |
The full local submission pack contains the deck-role table and implementation breakdown used to prepare this writeup.
Development process
This was an iterative strategy-engine project, not a neural-network training run. The loop was:
- Build a candidate deck/agent.
- Validate package safety and deck legality.
- Run local gauntlets against archived high-score agents, public agents, and current local candidates.
- Submit only selected packages to Kaggle Simulation.
- Treat live underperformance as negative evidence.
- Convert repeated failures into narrow heuristic changes.
- Re-test; reject changes that improved one matchup while damaging broader robustness.
The most important part of the process was restraint. We did not treat every local improvement as a submission-worthy improvement. We kept asking: does this change survive broader anchors, or did it only exploit yesterday's local test?
Local-vs-leaderboard meta gap
One of the biggest findings was that local performance and actual leaderboard performance could differ sharply.
The local runner was still essential. It caught deck mistakes, obvious tactical regressions, and gave us enough games to compare ideas cheaply. But it was not a full substitute for the Kaggle leaderboard environment. The live opponent pool was richer, path-dependent, and changing over time.
We saw this gap repeatedly:
- same-deck mirror looked plausible as a targeted idea but scored 710.9 live;
- target-energy scored 723.7;
- no-Hammer-Boss scored 842.6;
- earlier broad meta-teacher / replayfix / metahammer-style ideas produced public-score volatility and stayed below their stronger parent lines;
- the final Dragapult policy itself moved over time: best observed rank #73 / score 1076.4, later prepared snapshot #86 / 3243 / 1068.8, as teams and opponents changed.
This changed how we evaluated agents. We stopped asking only “what wins locally?” and started asking “what is robust to a changing external meta?”
To approximate the external environment, we used a layered process:
- Official/live evidence first when available. Kaggle public score, rank snapshots, and official episode metadata were treated as separate evidence from local games.
- High-score archive as meta proxy. We built local anchor, expanded-pit, and meta-proxy leaderboards using archived high-score agents, public-style agents, and our current candidates.
- Failure labels instead of raw winrate chasing. Repeated losses were grouped into readable threat classes: late Dragapult setup, Lucario pressure, Crustle wall pressure, Alakazam/Dunsparce pressure, and similar patterns. Those labels became target-map and setup-discipline changes.
- Ablation discipline. When a focused patch improved one smoke test but hurt mirror or broader anchors, we rejected it.
- Meta-shift planning. Lucario was kept as a secondary/complementary archetype because it gave coverage information and portfolio evidence, but we did not claim the portfolio as a single-package solution under the
deck.csvconstraint.
The final design is therefore not simply “the best local gauntlet agent.” It is the agent that survived the local-to-live mismatch best: a conservative Dragapult ex policy that recognized meta threat families without over-specializing to yesterday's leaderboard.
The local evidence pack keeps a more detailed table of these environment findings.
Simulation submission trajectory
The live Simulation leaderboard was used as external evidence, not as the final prize submission itself. The most important trajectory was:
| Phase | Strategic idea | Public score / result | What it taught us |
|---|---|---|---|
| Early alternate archetypes | Tested non-Dragapult directions and public-meta counters | Mostly low or unstable public scores | Useful as negative evidence and matchup probes, but not reliable final packages. |
| Resource-focused Dragapult | Improved card/resource awareness and deck-count discipline | Strong prior candidate around 959.0 | Resource tracking mattered, but extra routing/search complexity increased package and robustness risk. |
| Focused Dragapult ablations | Mirror tuning, target-energy logic, and disruption-card ablations | 710.9, 723.7, and 842.6 | Narrow local ideas did not generalize well enough live. |
| Secondary Lucario/Riolu policy | Independent Lucario-based strategy engine | 967.6 | Strong complement and meta probe, but not the top final strategy. |
| Final Dragapult policy | Conservative Dragapult ex setup, prize planning, and matchup-aware targeting | 1068.8 prepared score; best observed rank #73 / 1076.4 | Best balance of live evidence, local robustness, and implementation safety. |
This summary intentionally omits raw local package names from the public narrative. They remain available in the private submission-ready pack if exact reproducibility records are needed.
Local evaluation highlights
| Evaluation | Strategic subject | Record | Winrate | Interpretation |
|---|---|---|---|---|
| Anchor leaderboard | Final Dragapult policy | 15-5-0 | 75.0% | Best local anchor result; selected as the final robust direction. |
| Expanded-pit aggregate | Lucario/Riolu policy | 15-5-0 | 75.0% | Strong independent archetype and useful complement. |
| Portfolio meta-coverage test | Lucario + Dragapult comparison | 245-187-0 | 56.7% | Useful evidence for meta coverage, but not a single-package claim. |
| Control-matchup smoke test | Lucario deck-count guard | 10-2 | 83.3% | Helped one threat but hurt broader reliability, so it was rejected. |
| Anti-Lucario focused patch | Dragapult anti-Lucario overfit test | 6-10 | 37.5% | A targeted patch underperformed the baseline and was rejected. |
| Baseline comparison | Final Dragapult policy vs Lucario pressure | 9-7 | 56.2% | Baseline was stronger than the broad anti-Lucario patch. |
Alternatives considered and debated
We did not simply keep the highest local winrate candidate. Several non-final directions were seriously considered because they exposed different parts of the metagame.
| Candidate family | Why it was attractive | What we learned | Final decision |
|---|---|---|---|
| Lucario/Riolu policy | Strong independent archetype and useful counter-pressure into some Dragapult weaknesses; public score 967.6 and local aggregate 15-5 / 75.0% | Helped reveal which losses were caused by prize-race pressure versus setup weakness | Kept as secondary evidence and meta complement, not the final Strategy agent. |
| Dragapult resource/search/routing variants | Tried to improve resource inference, search routing, disruption-card counts, and support-card tempo | Some variants scored respectably, including a resource-inference candidate at 959.0, but the added complexity did not improve the final robustness tradeoff | Folded useful resource discipline back into the simpler guarded Dragapult policy. |
| Mirror / target-energy / disruption ablations | Tested whether focused mirror or target-map changes could outperform the broader policy | Narrow patches underperformed live even when locally plausible | Rejected as overfit or incomplete ablations. |
| Alternate archetype tests | Explored public-meta threats and non-Dragapult approaches | Valuable as opponents, anchors, and threat families even when not strong final packages | Used them to shape matchup-specific target maps rather than switching final archetype. |
| Offline Lucario + Dragapult portfolio analysis | Estimated whether two different archetypes covered different meta pockets | Portfolio management can reduce meta risk, but the Kaggle package normally runs one deck.csv | Reported as strategic evidence only; not overclaimed as a single-package solution. |
The central debate was robustness versus specialization. A specialized patch can look clever when it beats the matchup it was designed for. But the leaderboard is not one matchup. It is a moving field. That pushed us toward a final policy that recognizes threat families, but remains conservative in its core game plan.
Failure-to-patch matrix
| Failure mode | Observed symptom | Strategy response | Decision |
|---|---|---|---|
| Late Dragapult / thin setup | Reached Drakloak/Dragapult too late or not at all | Prioritize evolution line and board stability over slow support/stall when opponent is racing | Final Dragapult setup discipline. |
| Lucario pressure | Mega Lucario pressure raced Dragapult and punished weak prize mapping | Add matchup target logic for Riolu/Mega Lucario and loaded attackers; avoid broad overfit boosts | Broad anti-Lucario patch rejected after underperforming the baseline. |
| Crustle wall pressure | Crustle/no-damage wall disrupted Phantom Dive lines | Score Crustle/Dwebble and wall pieces as strategic targets; preserve alternate pressure | Included as target-map evidence, not a risky final patch. |
| Alakazam/Dunsparce pressure | Control/stall threatened both Dragapult and Lucario | Test deck-count and target discipline; validate locally before submission | Helped one smoke test but hurt broader reliability, so rejected. |
| Overfit counter patches | Candidate improved one local matchup but worsened intended/important others | Prefer conservative robustness and negative evidence over single-matchup optimization | Final writeup frames failures as part of method. |
| Single-package portfolio constraint | Offline Lucario + Dragapult portfolio analysis was stable, but Kaggle package uses one deck.csv | Use portfolio as team/submission-management evidence, not as final package claim | Explicit caveat in the writeup. |
Secondary Lucario evidence
The Lucario/Riolu policy was not the final top scorer, but it was strategically useful. It covered some Dragapult weaknesses and helped isolate threat classes. Its final public Simulation score in the current downloaded submission table is 967.6.
Offline portfolio analysis comparing the Lucario policy with the final Dragapult policy was stable across 432 games at 245-187-0 / 56.7%. This result is not claimed as a single Kaggle package improvement because the package normally reads one deck.csv; it is included as evidence for how we reasoned about meta coverage and submission risk.
Reproducibility
The local submission-ready pack keeps exact package names, hashes, source snippets, and evidence tables for auditability. They are intentionally not front-loaded in this public-facing narrative because the Strategy argument should be readable without decoding local file names.
The reproducibility basis is:
- final selected policy: deterministic Dragapult ex rule-based agent;
- primary source: the agent implementation and matching
deck.csvin the local evidence pack; - validation: Python compilation, deck checks, and Docker/local-engine gauntlets;
- external evidence: Kaggle Simulation public score/rank snapshots and submission history;
- local evidence: anchor, expanded-pit, meta-proxy, ablation, and failure-analysis tables.
If the final Kaggle form allows attachments or appendices, the exact package hash table can be included there rather than in the main narrative.
Conclusion
The final strategy is a robust, meta-aware Dragapult ex rule-based agent. Its strength came from a deliberate development process: observe live results, compare them against local tests, identify where the local proxy was wrong, convert repeated losses into narrow threat classes, and reject changes that only solved one visible problem.
The most important lesson is methodological. In this environment, reliable board development, prize-race planning, conservative matchup-specific heuristics, and humility about local-vs-live meta mismatch beat brittle complexity. Failed submissions and rejected patches were not wasted attempts; they were the evidence that shaped the final robust policy.
Final Strategy Submission Checklist
Before clicking submit publicly on Kaggle:
- [ ] Open
pokemon-tcg-ai-battle-challenge-strategywhile logged in. - [ ] Confirm whether Kaggle expects a Writeup post, Notebook, dataset/file upload, or any additional forms.
- [ ] Paste
final-writeup.mdinto the Strategy writeup editor. - [ ] Add tables from
tables/where useful. - [ ] Mention Simulation evidence but make the submission under the Strategy competition.
- [ ] Include package hash from
tables/package_hashes.csv. - [ ] If attachments are supported, attach the primary package/source snippets or link them according to Kaggle rules.
- [ ] Preview formatting.
- [ ] Submit/finalize — do not leave as draft.
Guardrails:
- [ ] Do not fetch more Kaggle replay/history data unless Ivan explicitly re-approves.
- [ ] Do not make public Kaggle submission/post until Ivan explicitly approves the exact final content.
Local audit pack
The local submission-ready folder keeps exact package names, hashes, source snippets, raw submission history, and detailed evidence paths. Those are useful for auditability, but they are intentionally excluded from this external-facing phone preview.
Competition facts
| competition | slug | deadline | category | reward | teamCount | entered | role_in_strategy_submission |
|---|---|---|---|---|---|---|---|
| Strategy | pokemon-tcg-ai-battle-challenge-strategy | 2026-09-13 23:59:00 | Featured | 240,000 Usd | 94 | True | Prize target / writeup destination |
| Simulation | pokemon-tcg-ai-battle | 2026-08-16 23:59:00 | Featured | Knowledge | 3243 | True | Performance evidence source |
Leaderboard snapshot
| teamName | teamId | rank | score | rowCount | lastSubmissionDate | snapshotCheckedAt | bestObservedRank | bestObservedScore | bestObservedAt |
|---|---|---|---|---|---|---|---|---|---|
| TomTom | 16379626 | 86 | 1068.8 | 3243 | 2026-06-23 00:49:57 | 2026-06-25T11:50:22.509127+00:00 | 73 | 1076.4 | 2026-06-22T11:12:19.960865+00:00 |
Final agent deck breakdown
| group | card | card_id | count | role |
|---|---|---|---|---|
| Core evolution | Dreepy | 119 | 4 | Primary basic; setup priority via Poffin/bench scoring |
| Core evolution | Drakloak | 120 | 4 | Stage-1 bridge; high evolution/search priority |
| Core attacker | Dragapult ex | 121 | 4 | Primary finisher; Phantom Dive prize-race engine |
| Support Pokémon | Fezandipiti ex | 140 | 1 | Comeback draw/value after KO; avoided when opponent on last prize |
| Support Pokémon | Latias ex | 184 | 1 | Mobility support when active is a setup/support body |
| Support Pokémon | Budew | 235 | 2 | Early setup/stall option; guarded to avoid over-passive play |
| Support Pokémon | Meowth ex | 1071 | 1 | Supporter access line when not already committed |
| Evolution/search | Rare Candy | 1079 | 3 | Direct Dreepy -> Dragapult conversion when legal |
| Disruption | Unfair Stamp | 1080 | 1 | High-priority comeback disruption after KO, especially when opponent has <=3 prizes |
| Search/setup | Buddy-Buddy Poffin | 1086 | 4 | Early Dreepy/Budew setup |
| Recovery | Night Stretcher | 1097 | 2 | Recover Pokémon/energy when hand_score justifies it |
| Disruption | Crushing Hammer | 1120 | 3 | Tempo disruption; targets loaded threats, especially Lucario lines |
| Search/filter | Ultra Ball | 1121 | 4 | Discard outlet/search; gated by hand quality |
| Search/filter | Poké Pad | 1152 | 3 | Find Dreepy/Drakloak/support lines |
| Tool | Lucky Helmet | 1156 | 1 | Low-priority draw/value tool |
| Gust/supporter | Boss’s Orders | 1182 | 4 | Aggressive prize-race gust when plan_a identifies a bench target |
| Energy/supporter | Crispin | 1198 | 4 | Energy acceleration when Dragapult is not online |
| Setup/supporter | Brock’s Scouting | 1210 | 2 | Early setup support, especially missing Budew/Latias on turn 2 |
| Draw/supporter | Lillie’s Determination | 1227 | 4 | Main draw supporter fallback |
| Stadium | Team Rocket’s Watchtower | 1256 | 2 | Stadium replacement / early disruption |
| Energy | Basic Fire Energy | 2 | 4 | One half of Dragapult attack cost |
| Energy | Basic Psychic Energy | 5 | 4 | One half of Dragapult attack cost |