Not submitted yet. This page is a review preview for Ivan. It is not the Kaggle submission, and Kaggle submission remains gated on Ivan approving the exact final content.

Visual evidence

Final writeup

Strategy Writeup: Robust Meta-Aware Dragapult ex Agent

TL;DR

Our final Strategy entry is a deterministic, rule-based Dragapult ex agent. The policy is built around a simple idea: develop Dragapult ex reliably, plan turns around the prize race rather than raw damage, and use matchup-aware disruption only when it improves the current game state.

The best observed Simulation result for Team TomTom was rank #73 with score 1076.4. A later prepared leaderboard snapshot showed rank #86 / 3243 with score 1068.8. This was the strongest prepared Simulation-backed Strategy candidate.

The main lesson from the project was not simply “Dragapult is good.” The important lesson was that local testing and actual leaderboard performance could differ sharply. We therefore treated the public leaderboard, official episode metadata, local gauntlets, meta-proxy tests, and failed submissions as separate pieces of evidence. The final agent is the one that survived that full process best: a conservative, explainable Dragapult policy that adapts to multiple threat families without over-specializing to one local matchup.

Competition context

This Strategy writeup is for the prize competition pokemon-tcg-ai-battle-challenge-strategy (240,000 USD, deadline 2026-09-13 23:59). The Simulation competition pokemon-tcg-ai-battle has reward Knowledge; we use Simulation results as performance evidence, but the prize target is the Strategy submission.

Final agent

Best observed rank/score: rank #73, score 1076.4

Prepared snapshot: rank #86 / 3243, score 1068.8

Primary archetype: Dragapult ex / Budew / Fezandipiti ex / Latias ex / Meowth ex, supported by Dreepy, Drakloak, Rare Candy, Crushing Hammer, Boss's Orders, Crispin, Lillie's Determination, and Team Rocket's Watchtower.

The policy is deterministic. For each legal-option decision, it:

Parses the board, hand, discard, available attackers, and visible resources.
Estimates card counts where possible.
Builds a prize-race plan around Dragapult ex and Phantom Dive.
Scores legal actions using setup, evolution, attachment, supporter, retreat, Boss target, bench-damage, and disruption modules.
Avoids dangerous or low-value actions, including damage into immunity/counter targets and unnecessary draw/search at low deck count.

Naming guide for experiment labels

Some labels in our local logs were short internal experiment names. For the external writeup, the important point is not the label itself but what strategic idea it represented.

Label used in this writeup	External-facing meaning
Final Dragapult policy	The selected deterministic Dragapult ex strategy: stable setup first, prize-race planning, conditional disruption, and matchup-aware target selection.
Lucario/Riolu policy	A separate Lucario-based strategy engine used as secondary evidence and as a meta-coverage comparison point.
Resource-inference Dragapult variants	Dragapult experiments that emphasized visible-resource tracking, deck-count awareness, and remaining-card inference.
Mirror / target-energy / no-Hammer-Boss variants	Focused ablations used to test whether narrower targeting or disruption changes improved live performance.
Alternate archetype tests	Hop/Dunsparce, Alakazam/Dunsparce, Crustle wall, Iono/Bellibolt, Abomasnow/Kyogre, and similar families used as opponent anchors or meta probes.
Failure labels	Human-readable tags for repeated loss patterns, such as late setup, Lucario pressure, wall pressure, or control/stall pressure.
Portfolio evidence	Offline meta-coverage analysis comparing Dragapult and Lucario approaches. It is not claimed as a single Kaggle package.

The raw local file names are kept in the private evidence pack for reproducibility, but the Strategy narrative uses the strategic descriptions above.

Why this agent was the final choice

We tried many directions before settling on this one: Dragapult resource variants, Lucario, Iono/Bellibolt, Crustle wall, Abomasnow/Kyogre, same-deck mirror tuning, target-energy variants, no-Hammer/Boss variants, Hop/Dunsparce, Alakazam/Dunsparce, and early search/router ideas.

Several of those ideas were reasonable. Some even looked promising locally. But the actual leaderboard punished brittle changes. Focused patches could improve one local matchup while lowering broader Simulation performance. Meta-counter decks could be useful as teachers or anchors without being the right final package.

The final Dragapult direction won because it was deliberately boring in the right places:

It tries to reach Dragapult ex reliably before greedier support lines.
It values multi-prize Phantom Dive turns over generic damage.
It maps Boss targets and bench damage by matchup instead of using one global target rule.
It treats Lucario, Alakazam/Dunsparce, Crustle, Iono/Bellibolt, Abomasnow/Kyogre, mirrors, and low-HP swarm boards as different threat classes.
It uses Crushing Hammer, Boss, Unfair Stamp, and Watchtower only when they support the current prize plan or prevent a loaded attacker line.
It rejects broad overfit patches, even when they improve a single smoke test.

Top-scoring agent breakdown

Layer	What it does	Why it matters
Performance anchor	Final Dragapult ex policy, best observed rank #73 / score 1076.4; prepared snapshot score 1068.8	Establishes the final Dragapult ex policy as our primary Simulation-backed Strategy evidence.
Board/resource parser	Tracks active/bench state, hand/discard counts, visible deck resources, prize uncertainty, turn logs, and prior KO/item-lock signals	Lets the agent avoid blind plays and recognize setup/comeback states.
Prize planner	`main_option_proc()` evaluates active targets plus Phantom Dive bench-counter combinations	Makes the policy prize-race first rather than damage-first.
Target map	`pokemon_score()` weights prize value, energy, tools, stage, HP, and known threat pieces	Encodes matchup knowledge for Lucario, Abomasnow/Kyogre, Iono/Bellibolt, Crustle/Dwebble, and swarm boards.
Setup engine	`hand_score()` strongly values Dreepy, Drakloak, Dragapult ex, Rare Candy, Poffin, and search lines when evolution is available	Directly addresses the repeated failure mode of late or missing Dragapult.
Support guardrails	Conditional use of Budew, Meowth ex, Latias ex, Fezandipiti ex, and draw/search cards	Prevents over-passive or support-heavy turns when the opponent is already racing.
Disruption timing	Boss, Crushing Hammer, Unfair Stamp, and Watchtower are scored by prize plan/comeback/threat state	Keeps disruption conditional instead of blindly spending tempo.
Safety guards	Avoids damage into immunity/counter targets and restricts draw/search at low deck count	Reduces wasted Phantom Dive counters and deck-out risk.

The full local submission pack contains the deck-role table and implementation breakdown used to prepare this writeup.

Development process

This was an iterative strategy-engine project, not a neural-network training run. The loop was:

Build a candidate deck/agent.
Validate package safety and deck legality.
Run local gauntlets against archived high-score agents, public agents, and current local candidates.
Submit only selected packages to Kaggle Simulation.
Treat live underperformance as negative evidence.
Convert repeated failures into narrow heuristic changes.
Re-test; reject changes that improved one matchup while damaging broader robustness.

The most important part of the process was restraint. We did not treat every local improvement as a submission-worthy improvement. We kept asking: does this change survive broader anchors, or did it only exploit yesterday's local test?

Local-vs-leaderboard meta gap

One of the biggest findings was that local performance and actual leaderboard performance could differ sharply.

The local runner was still essential. It caught deck mistakes, obvious tactical regressions, and gave us enough games to compare ideas cheaply. But it was not a full substitute for the Kaggle leaderboard environment. The live opponent pool was richer, path-dependent, and changing over time.

We saw this gap repeatedly:

same-deck mirror looked plausible as a targeted idea but scored 710.9 live;
target-energy scored 723.7;
no-Hammer-Boss scored 842.6;
earlier broad meta-teacher / replayfix / metahammer-style ideas produced public-score volatility and stayed below their stronger parent lines;
the final Dragapult policy itself moved over time: best observed rank #73 / score 1076.4, later prepared snapshot #86 / 3243 / 1068.8, as teams and opponents changed.

This changed how we evaluated agents. We stopped asking only “what wins locally?” and started asking “what is robust to a changing external meta?”

To approximate the external environment, we used a layered process:

Official/live evidence first when available. Kaggle public score, rank snapshots, and official episode metadata were treated as separate evidence from local games.
High-score archive as meta proxy. We built local anchor, expanded-pit, and meta-proxy leaderboards using archived high-score agents, public-style agents, and our current candidates.
Failure labels instead of raw winrate chasing. Repeated losses were grouped into readable threat classes: late Dragapult setup, Lucario pressure, Crustle wall pressure, Alakazam/Dunsparce pressure, and similar patterns. Those labels became target-map and setup-discipline changes.
Ablation discipline. When a focused patch improved one smoke test but hurt mirror or broader anchors, we rejected it.
Meta-shift planning. Lucario was kept as a secondary/complementary archetype because it gave coverage information and portfolio evidence, but we did not claim the portfolio as a single-package solution under the deck.csv constraint.

The final design is therefore not simply “the best local gauntlet agent.” It is the agent that survived the local-to-live mismatch best: a conservative Dragapult ex policy that recognized meta threat families without over-specializing to yesterday's leaderboard.

The local evidence pack keeps a more detailed table of these environment findings.

Simulation submission trajectory

The live Simulation leaderboard was used as external evidence, not as the final prize submission itself. The most important trajectory was:

Phase	Strategic idea	Public score / result	What it taught us
Early alternate archetypes	Tested non-Dragapult directions and public-meta counters	Mostly low or unstable public scores	Useful as negative evidence and matchup probes, but not reliable final packages.
Resource-focused Dragapult	Improved card/resource awareness and deck-count discipline	Strong prior candidate around 959.0	Resource tracking mattered, but extra routing/search complexity increased package and robustness risk.
Focused Dragapult ablations	Mirror tuning, target-energy logic, and disruption-card ablations	710.9, 723.7, and 842.6	Narrow local ideas did not generalize well enough live.
Secondary Lucario/Riolu policy	Independent Lucario-based strategy engine	967.6	Strong complement and meta probe, but not the top final strategy.
Final Dragapult policy	Conservative Dragapult ex setup, prize planning, and matchup-aware targeting	1068.8 prepared score; best observed rank #73 / 1076.4	Best balance of live evidence, local robustness, and implementation safety.

This summary intentionally omits raw local package names from the public narrative. They remain available in the private submission-ready pack if exact reproducibility records are needed.

Local evaluation highlights

Evaluation	Strategic subject	Record	Winrate	Interpretation
Anchor leaderboard	Final Dragapult policy	15-5-0	75.0%	Best local anchor result; selected as the final robust direction.
Expanded-pit aggregate	Lucario/Riolu policy	15-5-0	75.0%	Strong independent archetype and useful complement.
Portfolio meta-coverage test	Lucario + Dragapult comparison	245-187-0	56.7%	Useful evidence for meta coverage, but not a single-package claim.
Control-matchup smoke test	Lucario deck-count guard	10-2	83.3%	Helped one threat but hurt broader reliability, so it was rejected.
Anti-Lucario focused patch	Dragapult anti-Lucario overfit test	6-10	37.5%	A targeted patch underperformed the baseline and was rejected.
Baseline comparison	Final Dragapult policy vs Lucario pressure	9-7	56.2%	Baseline was stronger than the broad anti-Lucario patch.

Alternatives considered and debated

We did not simply keep the highest local winrate candidate. Several non-final directions were seriously considered because they exposed different parts of the metagame.

Candidate family	Why it was attractive	What we learned	Final decision
Lucario/Riolu policy	Strong independent archetype and useful counter-pressure into some Dragapult weaknesses; public score 967.6 and local aggregate 15-5 / 75.0%	Helped reveal which losses were caused by prize-race pressure versus setup weakness	Kept as secondary evidence and meta complement, not the final Strategy agent.
Dragapult resource/search/routing variants	Tried to improve resource inference, search routing, disruption-card counts, and support-card tempo	Some variants scored respectably, including a resource-inference candidate at 959.0, but the added complexity did not improve the final robustness tradeoff	Folded useful resource discipline back into the simpler guarded Dragapult policy.
Mirror / target-energy / disruption ablations	Tested whether focused mirror or target-map changes could outperform the broader policy	Narrow patches underperformed live even when locally plausible	Rejected as overfit or incomplete ablations.
Alternate archetype tests	Explored public-meta threats and non-Dragapult approaches	Valuable as opponents, anchors, and threat families even when not strong final packages	Used them to shape matchup-specific target maps rather than switching final archetype.
Offline Lucario + Dragapult portfolio analysis	Estimated whether two different archetypes covered different meta pockets	Portfolio management can reduce meta risk, but the Kaggle package normally runs one `deck.csv`	Reported as strategic evidence only; not overclaimed as a single-package solution.

The central debate was robustness versus specialization. A specialized patch can look clever when it beats the matchup it was designed for. But the leaderboard is not one matchup. It is a moving field. That pushed us toward a final policy that recognizes threat families, but remains conservative in its core game plan.

Failure-to-patch matrix

Failure mode	Observed symptom	Strategy response	Decision
Late Dragapult / thin setup	Reached Drakloak/Dragapult too late or not at all	Prioritize evolution line and board stability over slow support/stall when opponent is racing	Final Dragapult setup discipline.
Lucario pressure	Mega Lucario pressure raced Dragapult and punished weak prize mapping	Add matchup target logic for Riolu/Mega Lucario and loaded attackers; avoid broad overfit boosts	Broad anti-Lucario patch rejected after underperforming the baseline.
Crustle wall pressure	Crustle/no-damage wall disrupted Phantom Dive lines	Score Crustle/Dwebble and wall pieces as strategic targets; preserve alternate pressure	Included as target-map evidence, not a risky final patch.
Alakazam/Dunsparce pressure	Control/stall threatened both Dragapult and Lucario	Test deck-count and target discipline; validate locally before submission	Helped one smoke test but hurt broader reliability, so rejected.
Overfit counter patches	Candidate improved one local matchup but worsened intended/important others	Prefer conservative robustness and negative evidence over single-matchup optimization	Final writeup frames failures as part of method.
Single-package portfolio constraint	Offline Lucario + Dragapult portfolio analysis was stable, but Kaggle package uses one `deck.csv`	Use portfolio as team/submission-management evidence, not as final package claim	Explicit caveat in the writeup.

Secondary Lucario evidence

The Lucario/Riolu policy was not the final top scorer, but it was strategically useful. It covered some Dragapult weaknesses and helped isolate threat classes. Its final public Simulation score in the current downloaded submission table is 967.6.

Offline portfolio analysis comparing the Lucario policy with the final Dragapult policy was stable across 432 games at 245-187-0 / 56.7%. This result is not claimed as a single Kaggle package improvement because the package normally reads one deck.csv; it is included as evidence for how we reasoned about meta coverage and submission risk.

Reproducibility

The local submission-ready pack keeps exact package names, hashes, source snippets, and evidence tables for auditability. They are intentionally not front-loaded in this public-facing narrative because the Strategy argument should be readable without decoding local file names.

The reproducibility basis is:

final selected policy: deterministic Dragapult ex rule-based agent;
primary source: the agent implementation and matching deck.csv in the local evidence pack;
validation: Python compilation, deck checks, and Docker/local-engine gauntlets;
external evidence: Kaggle Simulation public score/rank snapshots and submission history;
local evidence: anchor, expanded-pit, meta-proxy, ablation, and failure-analysis tables.

If the final Kaggle form allows attachments or appendices, the exact package hash table can be included there rather than in the main narrative.

Conclusion

The final strategy is a robust, meta-aware Dragapult ex rule-based agent. Its strength came from a deliberate development process: observe live results, compare them against local tests, identify where the local proxy was wrong, convert repeated losses into narrow threat classes, and reject changes that only solved one visible problem.

The most important lesson is methodological. In this environment, reliable board development, prize-race planning, conservative matchup-specific heuristics, and humility about local-vs-live meta mismatch beat brittle complexity. Failed submissions and rejected patches were not wasted attempts; they were the evidence that shaped the final robust policy.

Submission checklist

Final Strategy Submission Checklist

Before clicking submit publicly on Kaggle:

[ ] Open pokemon-tcg-ai-battle-challenge-strategy while logged in.
[ ] Confirm whether Kaggle expects a Writeup post, Notebook, dataset/file upload, or any additional forms.
[ ] Paste final-writeup.md into the Strategy writeup editor.
[ ] Add tables from tables/ where useful.
[ ] Mention Simulation evidence but make the submission under the Strategy competition.
[ ] Include package hash from tables/package_hashes.csv.
[ ] If attachments are supported, attach the primary package/source snippets or link them according to Kaggle rules.
[ ] Preview formatting.
[ ] Submit/finalize — do not leave as draft.

Guardrails:

[ ] Do not fetch more Kaggle replay/history data unless Ivan explicitly re-approves.
[ ] Do not make public Kaggle submission/post until Ivan explicitly approves the exact final content.

Local audit pack

The local submission-ready folder keeps exact package names, hashes, source snippets, raw submission history, and detailed evidence paths. Those are useful for auditability, but they are intentionally excluded from this external-facing phone preview.

Public evidence tables

Competition facts

competition	slug	deadline	category	reward	teamCount	entered	role_in_strategy_submission
Strategy	pokemon-tcg-ai-battle-challenge-strategy	2026-09-13 23:59:00	Featured	240,000 Usd	94	True	Prize target / writeup destination
Simulation	pokemon-tcg-ai-battle	2026-08-16 23:59:00	Featured	Knowledge	3243	True	Performance evidence source

Leaderboard snapshot

teamName	teamId	rank	score	rowCount	lastSubmissionDate	snapshotCheckedAt	bestObservedRank	bestObservedScore	bestObservedAt
TomTom	16379626	86	1068.8	3243	2026-06-23 00:49:57	2026-06-25T11:50:22.509127+00:00	73	1076.4	2026-06-22T11:12:19.960865+00:00

Final agent deck breakdown

group	card	card_id	count	role
Core evolution	Dreepy	119	4	Primary basic; setup priority via Poffin/bench scoring
Core evolution	Drakloak	120	4	Stage-1 bridge; high evolution/search priority
Core attacker	Dragapult ex	121	4	Primary finisher; Phantom Dive prize-race engine
Support Pokémon	Fezandipiti ex	140	1	Comeback draw/value after KO; avoided when opponent on last prize
Support Pokémon	Latias ex	184	1	Mobility support when active is a setup/support body
Support Pokémon	Budew	235	2	Early setup/stall option; guarded to avoid over-passive play
Support Pokémon	Meowth ex	1071	1	Supporter access line when not already committed
Evolution/search	Rare Candy	1079	3	Direct Dreepy -> Dragapult conversion when legal
Disruption	Unfair Stamp	1080	1	High-priority comeback disruption after KO, especially when opponent has <=3 prizes
Search/setup	Buddy-Buddy Poffin	1086	4	Early Dreepy/Budew setup
Recovery	Night Stretcher	1097	2	Recover Pokémon/energy when hand_score justifies it
Disruption	Crushing Hammer	1120	3	Tempo disruption; targets loaded threats, especially Lucario lines
Search/filter	Ultra Ball	1121	4	Discard outlet/search; gated by hand quality
Search/filter	Poké Pad	1152	3	Find Dreepy/Drakloak/support lines
Tool	Lucky Helmet	1156	1	Low-priority draw/value tool
Gust/supporter	Boss’s Orders	1182	4	Aggressive prize-race gust when plan_a identifies a bench target
Energy/supporter	Crispin	1198	4	Energy acceleration when Dragapult is not online
Setup/supporter	Brock’s Scouting	1210	2	Early setup support, especially missing Budew/Latias on turn 2
Draw/supporter	Lillie’s Determination	1227	4	Main draw supporter fallback
Stadium	Team Rocket’s Watchtower	1256	2	Stadium replacement / early disruption
Energy	Basic Fire Energy	2	4	One half of Dragapult attack cost
Energy	Basic Psychic Energy	5	4	One half of Dragapult attack cost