Kaggle Strategy Submission Preview

Robust Meta-Aware Dragapult ex Agent

Phone-friendly preview of the writeup and supporting materials before Ivan approves the final Kaggle Strategy submission.

Best observed rank#73Score 1076.4
Prepared snapshot#86 / 3243Score 1068.8
Final packageDragapult exguarded deterministic policy
Not submitted yet. This page is a review preview for Ivan. It is not the Kaggle submission, and Kaggle submission remains gated on Ivan approving the exact final content.

Visual evidence

Submission score timelineLocal validation summary
Final writeup

Strategy Writeup: Robust Meta-Aware Dragapult ex Agent

TL;DR

Our final Strategy entry is a deterministic, rule-based Dragapult ex agent. The policy is built around a simple idea: develop Dragapult ex reliably, plan turns around the prize race rather than raw damage, and use matchup-aware disruption only when it improves the current game state.

The best observed Simulation result for Team TomTom was rank #73 with score 1076.4. A later prepared leaderboard snapshot showed rank #86 / 3243 with score 1068.8. This was the strongest prepared Simulation-backed Strategy candidate.

The main lesson from the project was not simply “Dragapult is good.” The important lesson was that local testing and actual leaderboard performance could differ sharply. We therefore treated the public leaderboard, official episode metadata, local gauntlets, meta-proxy tests, and failed submissions as separate pieces of evidence. The final agent is the one that survived that full process best: a conservative, explainable Dragapult policy that adapts to multiple threat families without over-specializing to one local matchup.

Competition context

This Strategy writeup is for the prize competition pokemon-tcg-ai-battle-challenge-strategy (240,000 USD, deadline 2026-09-13 23:59). The Simulation competition pokemon-tcg-ai-battle has reward Knowledge; we use Simulation results as performance evidence, but the prize target is the Strategy submission.

Final agent

Best observed rank/score: rank #73, score 1076.4

Prepared snapshot: rank #86 / 3243, score 1068.8

Primary archetype: Dragapult ex / Budew / Fezandipiti ex / Latias ex / Meowth ex, supported by Dreepy, Drakloak, Rare Candy, Crushing Hammer, Boss's Orders, Crispin, Lillie's Determination, and Team Rocket's Watchtower.

The policy is deterministic. For each legal-option decision, it:

  1. Parses the board, hand, discard, available attackers, and visible resources.
  2. Estimates card counts where possible.
  3. Builds a prize-race plan around Dragapult ex and Phantom Dive.
  4. Scores legal actions using setup, evolution, attachment, supporter, retreat, Boss target, bench-damage, and disruption modules.
  5. Avoids dangerous or low-value actions, including damage into immunity/counter targets and unnecessary draw/search at low deck count.

Naming guide for experiment labels

Some labels in our local logs were short internal experiment names. For the external writeup, the important point is not the label itself but what strategic idea it represented.

Label used in this writeupExternal-facing meaning
Final Dragapult policyThe selected deterministic Dragapult ex strategy: stable setup first, prize-race planning, conditional disruption, and matchup-aware target selection.
Lucario/Riolu policyA separate Lucario-based strategy engine used as secondary evidence and as a meta-coverage comparison point.
Resource-inference Dragapult variantsDragapult experiments that emphasized visible-resource tracking, deck-count awareness, and remaining-card inference.
Mirror / target-energy / no-Hammer-Boss variantsFocused ablations used to test whether narrower targeting or disruption changes improved live performance.
Alternate archetype testsHop/Dunsparce, Alakazam/Dunsparce, Crustle wall, Iono/Bellibolt, Abomasnow/Kyogre, and similar families used as opponent anchors or meta probes.
Failure labelsHuman-readable tags for repeated loss patterns, such as late setup, Lucario pressure, wall pressure, or control/stall pressure.
Portfolio evidenceOffline meta-coverage analysis comparing Dragapult and Lucario approaches. It is not claimed as a single Kaggle package.

The raw local file names are kept in the private evidence pack for reproducibility, but the Strategy narrative uses the strategic descriptions above.

Why this agent was the final choice

We tried many directions before settling on this one: Dragapult resource variants, Lucario, Iono/Bellibolt, Crustle wall, Abomasnow/Kyogre, same-deck mirror tuning, target-energy variants, no-Hammer/Boss variants, Hop/Dunsparce, Alakazam/Dunsparce, and early search/router ideas.

Several of those ideas were reasonable. Some even looked promising locally. But the actual leaderboard punished brittle changes. Focused patches could improve one local matchup while lowering broader Simulation performance. Meta-counter decks could be useful as teachers or anchors without being the right final package.

The final Dragapult direction won because it was deliberately boring in the right places:

Top-scoring agent breakdown

LayerWhat it doesWhy it matters
Performance anchorFinal Dragapult ex policy, best observed rank #73 / score 1076.4; prepared snapshot score 1068.8Establishes the final Dragapult ex policy as our primary Simulation-backed Strategy evidence.
Board/resource parserTracks active/bench state, hand/discard counts, visible deck resources, prize uncertainty, turn logs, and prior KO/item-lock signalsLets the agent avoid blind plays and recognize setup/comeback states.
Prize plannermain_option_proc() evaluates active targets plus Phantom Dive bench-counter combinationsMakes the policy prize-race first rather than damage-first.
Target mappokemon_score() weights prize value, energy, tools, stage, HP, and known threat piecesEncodes matchup knowledge for Lucario, Abomasnow/Kyogre, Iono/Bellibolt, Crustle/Dwebble, and swarm boards.
Setup enginehand_score() strongly values Dreepy, Drakloak, Dragapult ex, Rare Candy, Poffin, and search lines when evolution is availableDirectly addresses the repeated failure mode of late or missing Dragapult.
Support guardrailsConditional use of Budew, Meowth ex, Latias ex, Fezandipiti ex, and draw/search cardsPrevents over-passive or support-heavy turns when the opponent is already racing.
Disruption timingBoss, Crushing Hammer, Unfair Stamp, and Watchtower are scored by prize plan/comeback/threat stateKeeps disruption conditional instead of blindly spending tempo.
Safety guardsAvoids damage into immunity/counter targets and restricts draw/search at low deck countReduces wasted Phantom Dive counters and deck-out risk.

The full local submission pack contains the deck-role table and implementation breakdown used to prepare this writeup.

Development process

This was an iterative strategy-engine project, not a neural-network training run. The loop was:

  1. Build a candidate deck/agent.
  2. Validate package safety and deck legality.
  3. Run local gauntlets against archived high-score agents, public agents, and current local candidates.
  4. Submit only selected packages to Kaggle Simulation.
  5. Treat live underperformance as negative evidence.
  6. Convert repeated failures into narrow heuristic changes.
  7. Re-test; reject changes that improved one matchup while damaging broader robustness.

The most important part of the process was restraint. We did not treat every local improvement as a submission-worthy improvement. We kept asking: does this change survive broader anchors, or did it only exploit yesterday's local test?

Local-vs-leaderboard meta gap

One of the biggest findings was that local performance and actual leaderboard performance could differ sharply.

The local runner was still essential. It caught deck mistakes, obvious tactical regressions, and gave us enough games to compare ideas cheaply. But it was not a full substitute for the Kaggle leaderboard environment. The live opponent pool was richer, path-dependent, and changing over time.

We saw this gap repeatedly:

This changed how we evaluated agents. We stopped asking only “what wins locally?” and started asking “what is robust to a changing external meta?”

To approximate the external environment, we used a layered process:

  1. Official/live evidence first when available. Kaggle public score, rank snapshots, and official episode metadata were treated as separate evidence from local games.
  2. High-score archive as meta proxy. We built local anchor, expanded-pit, and meta-proxy leaderboards using archived high-score agents, public-style agents, and our current candidates.
  3. Failure labels instead of raw winrate chasing. Repeated losses were grouped into readable threat classes: late Dragapult setup, Lucario pressure, Crustle wall pressure, Alakazam/Dunsparce pressure, and similar patterns. Those labels became target-map and setup-discipline changes.
  4. Ablation discipline. When a focused patch improved one smoke test but hurt mirror or broader anchors, we rejected it.
  5. Meta-shift planning. Lucario was kept as a secondary/complementary archetype because it gave coverage information and portfolio evidence, but we did not claim the portfolio as a single-package solution under the deck.csv constraint.

The final design is therefore not simply “the best local gauntlet agent.” It is the agent that survived the local-to-live mismatch best: a conservative Dragapult ex policy that recognized meta threat families without over-specializing to yesterday's leaderboard.

The local evidence pack keeps a more detailed table of these environment findings.

Simulation submission trajectory

The live Simulation leaderboard was used as external evidence, not as the final prize submission itself. The most important trajectory was:

PhaseStrategic ideaPublic score / resultWhat it taught us
Early alternate archetypesTested non-Dragapult directions and public-meta countersMostly low or unstable public scoresUseful as negative evidence and matchup probes, but not reliable final packages.
Resource-focused DragapultImproved card/resource awareness and deck-count disciplineStrong prior candidate around 959.0Resource tracking mattered, but extra routing/search complexity increased package and robustness risk.
Focused Dragapult ablationsMirror tuning, target-energy logic, and disruption-card ablations710.9, 723.7, and 842.6Narrow local ideas did not generalize well enough live.
Secondary Lucario/Riolu policyIndependent Lucario-based strategy engine967.6Strong complement and meta probe, but not the top final strategy.
Final Dragapult policyConservative Dragapult ex setup, prize planning, and matchup-aware targeting1068.8 prepared score; best observed rank #73 / 1076.4Best balance of live evidence, local robustness, and implementation safety.

This summary intentionally omits raw local package names from the public narrative. They remain available in the private submission-ready pack if exact reproducibility records are needed.

Local evaluation highlights

EvaluationStrategic subjectRecordWinrateInterpretation
Anchor leaderboardFinal Dragapult policy15-5-075.0%Best local anchor result; selected as the final robust direction.
Expanded-pit aggregateLucario/Riolu policy15-5-075.0%Strong independent archetype and useful complement.
Portfolio meta-coverage testLucario + Dragapult comparison245-187-056.7%Useful evidence for meta coverage, but not a single-package claim.
Control-matchup smoke testLucario deck-count guard10-283.3%Helped one threat but hurt broader reliability, so it was rejected.
Anti-Lucario focused patchDragapult anti-Lucario overfit test6-1037.5%A targeted patch underperformed the baseline and was rejected.
Baseline comparisonFinal Dragapult policy vs Lucario pressure9-756.2%Baseline was stronger than the broad anti-Lucario patch.

Alternatives considered and debated

We did not simply keep the highest local winrate candidate. Several non-final directions were seriously considered because they exposed different parts of the metagame.

Candidate familyWhy it was attractiveWhat we learnedFinal decision
Lucario/Riolu policyStrong independent archetype and useful counter-pressure into some Dragapult weaknesses; public score 967.6 and local aggregate 15-5 / 75.0%Helped reveal which losses were caused by prize-race pressure versus setup weaknessKept as secondary evidence and meta complement, not the final Strategy agent.
Dragapult resource/search/routing variantsTried to improve resource inference, search routing, disruption-card counts, and support-card tempoSome variants scored respectably, including a resource-inference candidate at 959.0, but the added complexity did not improve the final robustness tradeoffFolded useful resource discipline back into the simpler guarded Dragapult policy.
Mirror / target-energy / disruption ablationsTested whether focused mirror or target-map changes could outperform the broader policyNarrow patches underperformed live even when locally plausibleRejected as overfit or incomplete ablations.
Alternate archetype testsExplored public-meta threats and non-Dragapult approachesValuable as opponents, anchors, and threat families even when not strong final packagesUsed them to shape matchup-specific target maps rather than switching final archetype.
Offline Lucario + Dragapult portfolio analysisEstimated whether two different archetypes covered different meta pocketsPortfolio management can reduce meta risk, but the Kaggle package normally runs one deck.csvReported as strategic evidence only; not overclaimed as a single-package solution.

The central debate was robustness versus specialization. A specialized patch can look clever when it beats the matchup it was designed for. But the leaderboard is not one matchup. It is a moving field. That pushed us toward a final policy that recognizes threat families, but remains conservative in its core game plan.

Failure-to-patch matrix

Failure modeObserved symptomStrategy responseDecision
Late Dragapult / thin setupReached Drakloak/Dragapult too late or not at allPrioritize evolution line and board stability over slow support/stall when opponent is racingFinal Dragapult setup discipline.
Lucario pressureMega Lucario pressure raced Dragapult and punished weak prize mappingAdd matchup target logic for Riolu/Mega Lucario and loaded attackers; avoid broad overfit boostsBroad anti-Lucario patch rejected after underperforming the baseline.
Crustle wall pressureCrustle/no-damage wall disrupted Phantom Dive linesScore Crustle/Dwebble and wall pieces as strategic targets; preserve alternate pressureIncluded as target-map evidence, not a risky final patch.
Alakazam/Dunsparce pressureControl/stall threatened both Dragapult and LucarioTest deck-count and target discipline; validate locally before submissionHelped one smoke test but hurt broader reliability, so rejected.
Overfit counter patchesCandidate improved one local matchup but worsened intended/important othersPrefer conservative robustness and negative evidence over single-matchup optimizationFinal writeup frames failures as part of method.
Single-package portfolio constraintOffline Lucario + Dragapult portfolio analysis was stable, but Kaggle package uses one deck.csvUse portfolio as team/submission-management evidence, not as final package claimExplicit caveat in the writeup.

Secondary Lucario evidence

The Lucario/Riolu policy was not the final top scorer, but it was strategically useful. It covered some Dragapult weaknesses and helped isolate threat classes. Its final public Simulation score in the current downloaded submission table is 967.6.

Offline portfolio analysis comparing the Lucario policy with the final Dragapult policy was stable across 432 games at 245-187-0 / 56.7%. This result is not claimed as a single Kaggle package improvement because the package normally reads one deck.csv; it is included as evidence for how we reasoned about meta coverage and submission risk.

Reproducibility

The local submission-ready pack keeps exact package names, hashes, source snippets, and evidence tables for auditability. They are intentionally not front-loaded in this public-facing narrative because the Strategy argument should be readable without decoding local file names.

The reproducibility basis is:

If the final Kaggle form allows attachments or appendices, the exact package hash table can be included there rather than in the main narrative.

Conclusion

The final strategy is a robust, meta-aware Dragapult ex rule-based agent. Its strength came from a deliberate development process: observe live results, compare them against local tests, identify where the local proxy was wrong, convert repeated losses into narrow threat classes, and reject changes that only solved one visible problem.

The most important lesson is methodological. In this environment, reliable board development, prize-race planning, conservative matchup-specific heuristics, and humility about local-vs-live meta mismatch beat brittle complexity. Failed submissions and rejected patches were not wasted attempts; they were the evidence that shaped the final robust policy.

Submission checklist

Final Strategy Submission Checklist

Before clicking submit publicly on Kaggle:

Guardrails:

Local audit pack

The local submission-ready folder keeps exact package names, hashes, source snippets, raw submission history, and detailed evidence paths. Those are useful for auditability, but they are intentionally excluded from this external-facing phone preview.

Public evidence tables

Competition facts

competitionslugdeadlinecategoryrewardteamCountenteredrole_in_strategy_submission
Strategypokemon-tcg-ai-battle-challenge-strategy2026-09-13 23:59:00Featured240,000 Usd94TruePrize target / writeup destination
Simulationpokemon-tcg-ai-battle2026-08-16 23:59:00FeaturedKnowledge3243TruePerformance evidence source

Leaderboard snapshot

teamNameteamIdrankscorerowCountlastSubmissionDatesnapshotCheckedAtbestObservedRankbestObservedScorebestObservedAt
TomTom16379626861068.832432026-06-23 00:49:572026-06-25T11:50:22.509127+00:00731076.42026-06-22T11:12:19.960865+00:00

Final agent deck breakdown

groupcardcard_idcountrole
Core evolutionDreepy1194Primary basic; setup priority via Poffin/bench scoring
Core evolutionDrakloak1204Stage-1 bridge; high evolution/search priority
Core attackerDragapult ex1214Primary finisher; Phantom Dive prize-race engine
Support PokémonFezandipiti ex1401Comeback draw/value after KO; avoided when opponent on last prize
Support PokémonLatias ex1841Mobility support when active is a setup/support body
Support PokémonBudew2352Early setup/stall option; guarded to avoid over-passive play
Support PokémonMeowth ex10711Supporter access line when not already committed
Evolution/searchRare Candy10793Direct Dreepy -> Dragapult conversion when legal
DisruptionUnfair Stamp10801High-priority comeback disruption after KO, especially when opponent has <=3 prizes
Search/setupBuddy-Buddy Poffin10864Early Dreepy/Budew setup
RecoveryNight Stretcher10972Recover Pokémon/energy when hand_score justifies it
DisruptionCrushing Hammer11203Tempo disruption; targets loaded threats, especially Lucario lines
Search/filterUltra Ball11214Discard outlet/search; gated by hand quality
Search/filterPoké Pad11523Find Dreepy/Drakloak/support lines
ToolLucky Helmet11561Low-priority draw/value tool
Gust/supporterBoss’s Orders11824Aggressive prize-race gust when plan_a identifies a bench target
Energy/supporterCrispin11984Energy acceleration when Dragapult is not online
Setup/supporterBrock’s Scouting12102Early setup support, especially missing Budew/Latias on turn 2
Draw/supporterLillie’s Determination12274Main draw supporter fallback
StadiumTeam Rocket’s Watchtower12562Stadium replacement / early disruption
EnergyBasic Fire Energy24One half of Dragapult attack cost
EnergyBasic Psychic Energy54One half of Dragapult attack cost