This glossary contains specific definitions for terms that often come up in discussion of competitive Pokémon artificial intelligence. This list is not comprehensive — terms which are less frequently used / more basic / generally well-defined elsewhere are not included. If a term is missing from this page it may be covered by one of the following glossaries:
A method for solving incomplete information games that are too large to be solved in their original form. The original game is abstracted to create a smaller game that’s strategically similar, such that the new abstract game can be solved and the solution can be mapped back to the original to serve as an approximation to the true solution to the original game. In Pokémon, techniques such as bucketing can be used to reduce the state space, or the original game may be mapped to a variant which is more tractable.
A feasible operation on the state — i.e., a move a player can make at a stage in the game. move is ambiguous in Pokémon, as Pokémon have moves and thus some of the game theoretical moves of a player in a Pokémon battle includes the action of literally using one of their Pokémon’s moves. Pokémon Showdown and pkmn instead use the terms choice and choices instead of action and actions, and PokeSim refers to this as a decision - all of these should be interchangeable, though using move to refer to an action / choice / decision should be avoided. Both player’s choices considered together are called the joint action, and applying the joint action to the battle state causes a transition.
An intelligent agent (IA) is an entity that autonomously acts upon an environment to achieve its goals. In computer chess the individual agents are usually referred to as engines, but in competitive Pokémon artificial intelligence circles agent or bot is preferred as engine is used to refer to the code responsible for the simulation of core Pokémon battle mechanics.
Alpha-beta pruning is a search algorithm that seeks to decrease the number of nodes that are evaluated by the minimax algorithm in its search tree. When applied to a standard minimax tree, it returns the same move as minimax would, but prunes away branches that can’t possibly influence the final decision. Alpha-beta has been shown to also work in simultaneous move games in (Bošanský 2013) and (Saffidine 2021), where it typically gets abbreviated as SMAB.
An influential artificial intelligence agent which achieved superhuman Go performance using a combination of Monte Carlo tree search and neural networks (Silver 2016). Along with AlphaZero (Silver 2017) and MuZero (Silver 2018), the AlphaGo architecture serves as the modern foundation for many perfect information game-playing agents (e.g., Lc0).
An algorithm that can return a valid solution to a problem even if it’s interrupted before it ends.
The multi-armed bandit problem is a classic reinforcement learning problem that codifies the exploration–exploitation tradeoff dilemma. In the problem, a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice’s properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. In terms of competitive Pokémon AIs, bandit problems arise when determining which parts of the search tree to explore, and bandit usually refers to the particular type of algorithm being used to solve the bandit problem (e.g., UCB or Exp3).
- Bayes' rule
The mathematical rule that describes how to update a belief, given some evidence. Fundamental to Bayesian inference used by Pokémon AI in combination with usage statistics to reason about what state they may be in.
A player’s probabilistic assessment of the current state of the game, particularly when they have imperfect information, often represented as probability distributions over possible hidden states. Players update their beliefs based on available observations and the actions of others, and these beliefs crucially influence their decision-making process, as they attempt to choose actions that maximize their expected payoff given their understanding of the game’s true state.
A sequence of two adjacent symbols in particular order. The concept can be generalized to an arbitrary number of symbols (n-gram), though sometimes the term bigram is (incorrectly) used for the general case as well. While bigrams can crop up in many different places within competitive Pokémon artificial intelligence, when referenced in an unqualified manner they usually refer to joint probabilities processed from usage statistics such as . These can also sometimes be called correlations.
- bimatrix game
A simultaneous game for two players in which each player has a finite set of actions. Such a game can be fully described by two matrices, each describing the payoffs of one of the players. In an adversarial game, the two matrices will sum to a constant matrix.
Either refers to a language binding, the application programming interface that provides glue code specifically made to allow a programming language to use a foreign library or operating system service, or a specific class of moves in Pokémon known as binding moves which prevent their target from switching and cause damage for multiple turns (also known as wrapping or partial-trapping moves).
Abbreviation for Best-of-, where refers to the number of games used to decide the winner of a set of battles between two players (usually 1 or 3, e.g., BO3).
Slang term for agent derived from the word robot. Alternatively, may be used to characterize the parts of the client code responsible for connecting and communicating with the server and distinguishing it from the core search / knowledge components of the agent.
Also known as data binning, a generalization of rounding data where the original data that falls within a small interval (known as a bin or bucket) gets replaced by a value representative of that interval. This is a form of abstraction, the canonical examples of bucketing in competitive Pokémon AI research are reducing the range of possible HP values or shrinking the number of damage rolls.
The game of Pokémon as it exists on the original Nintendo console hardware (due to the earliest generations of Pokémon being played on Game Boy ROM cartridges).
Counterfactual Regret Minimization is an iterative algorithm designed to find approximate Nash equilibria in extensive-form games with imperfect information. It works by calculating the counterfactual regret at each decision point, which measures how much a player would regret not having taken a different action, given the information available at that point. CFR then updates its strategy over multiple iterations, minimizing these regrets over time. This process leads to strategies that converge towards a Nash equilibrium, where no player can benefit by unilaterally changing their strategy. Variants include:
Elements within a game that are governed by random events outside the direct control of the players. The Chance player is a conceptual player used to model these elements of uncertainty within the game and allows for the analysis of games where the outcomes depend not only on the choices of the players but also on elements of randomness. The pkmn engine exposes APIs for inspecting or influencing chance when built with certain compile-time flags (
The code that communicates with the server to enable a player to play the game. In standard Pokémon formats the client only has a partial representation of one perspective of the actual battle state stored on the server. Clients connect to the server and exchange public information with the server via a protocol.
Action consolidation is a technique for combining the results of actions with identical outcomes. Consolidation is similar to transpositions in that they both involve symmetry, with the difference being that consolidation allows for skipping similar work by leveraging past results whereas transposition tables allow for skipping work that has already been done. Damage coalition is a common form of consolidation in competitive Pokémon AI.
The contempt factor reflects the estimated superiority / inferiority of the program over its opponent. Programs incorporating contempt may choose to follow different strategies against opponents of varying perceived skill level.
- damage calculator
A program that determines the amount of HP a Pokémon will lose as a result of an opponent’s damaging attack. The damage calculator or calc usually takes the form of a stripped down engine that focuses entirely on the damage formula.
- damage coalition
A common form of action consolidation where damage rolls which produce the same results get combined. Damage rolls often result in the same outcome due to rounding or if the damage gets capped due to causing a target to faint or breaking a Substitute.
The depth of a node is the number of edges from the node to the tree’s root node. A root node will have a depth of 0.
A simplification technique where either a stochastic game is converted into a deterministic one or where various perfect information states get sampled from the set of possible imperfect information states so that methods for solving perfect information games can be applied to games with imperfect information. The latter definition is more common, but unfortunately this technique has several weakness: the computational budget must be shared between the sampled perfect information games and various aspects of the search are often duplicated; strategy fusion, which occurs due to playing different strategies in different worlds when it has to find a unique strategy for all the worlds; and non-locality, which arises is from choosing locally optimal moves that are globally inferior.
In the context of data analysis and machine learning, dimensionality refers to the number of features (variables or attributes) that describe a data point whereas the cardinality refers to the number of values each feature can take on. High dimensionality and/or cardinality can lead to problems such as combinatorial explosion and the curse of dimensionality which renders certain approaches intractable, necessitating techniques such as dimensionality reduction, a form of simplification.
Code which operates / controls the engine, perhaps to provide additional features or to allow for integration within a broader application.
Endless Battle Clause, a rule that exists in Smogon University’s formats to guarantee that all battles end in 1000 turns or fewer, while also seeking to curtail certain specific degenerate strategies.
Extreme equilibrium, a Nash equilibrium where both players’ strategies can’t be described as convex combinations (weighted averages where the weights are non-negative and add up to 1) of other mixed strategies that form equilibria.
An extensive-form game is a specification of a game allowing for the explicit representation of a number of key aspects, like the sequencing of players’ possible moves, their choices at every decision point, the (possibly imperfect) information each player has about the other player’s moves when they make a decision, and their payoffs for all possible game outcomes. Extensive-form games also allow for the representation of incomplete information in the form of chance events modeled as moves by nature. Alternatives to the extensive-form game representation include the normal-form representation that simply boils the game down to a payoff matrix, or the FOSG representation.
The Elo rating system, a method for calculating the relative skill levels of players in zero-sum games. Notably, not “ELO”, as the Elo system is named after its creator, Arpad Elo.
A relatively low-dimensional space into which high-dimensional vectors can be translated, ideally capturing some of the semantics of the input by placing semantically similar inputs close together in the embedding space. Usually learned and then used as input to a neural network, examples of embeddings in Pokémon include poke2vec and pokemb.
Code that’s designed to mimic the outwardly observable behavior of both the hardware and software features of a real device, to allow for playing a Pokémon cartridge on arbitrary platforms.
Also known as the lategame, the final stage of the game, after the midgame. In competitive Pokémon, this stage of the game is far more tractable to solve given enough information has usually been revealed such that the game effectively turns into one of perfect information, and certain common scenarios may have been exhaustively solved and encoded in a precomputed endgame tablebase (EGTB).
Exponential-weight algorithm for Exploration and Exploitation, a bandit algorithm commonly used in tree bandit search. Exp3 provides weaker theoretical guarantees with respect to convergence compared to UCB though is effective in practice and is well-suited to non-stationary environments where the best choice might change over time.
A measure of the advantage an opponent can gain by deviating from their strategy to specifically counter the player’s. Exploitability is important for assessing the quality of strategies — strategies that are highly exploitable allow an opponent to significantly improve their outcome by exploiting the player’s choices. ε-exploitability is a relaxed measure of exploitability that considers the best outcome an opponent can get while being at most ε away from their own Nash equilibrium payoff. ε-exploitability is much less computationally expensive to determine and can be more robust and realistic than traditional exploitability measures.
The set of rules and regulations that define the game being played. In Pokémon, the format determines the generation of play as well as which configurations and combinations of Pokémon are allowed. Format is often used interchangeably with metagame though the latter actually refers to what’s commonly seen in a format at a particular point in time (formats dictate what’s allowed and metagames describe what popular strategies arise under those constraints).
A factored-observation stochastic game is a specification of a game that builds on the concept of a partially observable stochastic game (POSG) from multi-agent reinforcement learning by dividing each agent’s observations based on whether they represent public or private information (Kovařík 2022). Whereas an information set representation is concerned with grouping indistinguishable game histories regardless of whether the hidden information is directly related to the environment or another player’s knowledge, FOSGs explicitly factor observations into public and private components allowing for analysis specifically focused on how shared knowledge and private information shape an agent’s perception of the game state.
Fictitious play is a learning process where each player assumes their opponents are playing stationary (unchanging) strategies. During each round, a player calculates the best response to the historical observed actions of their opponents, believing those actions represent fixed strategies. This iterative process can lead players towards converging on a Nash equilibrium in certain types of games (like zero-sum games), but it can fall short in settings where opponents actively adapt their strategies in response to the fictitious player’s actions. FP is also a common abbreviation for the Pokémon move Focus Punch
Any set of circumstances that has a result dependent on the actions of two or more decision-makers (players). In the context of competitive Pokémon battling, the game or battle can either be viewed as the combination of the team-building and piloting components (and sometimes even featuring an initial drafting component) or simply the piloting aspect. In most cases, without additional qualifiers the game can usually be understood to exclusively refer to the piloting component.
- game tree
A graph representing all possible game states within a game. The complete game tree for a game is the game tree starting at the initial state and containing all possible moves from each state and is equivalent to the tree obtained from the extensive-form game representation). Given that’s usually intractable to enumerate the entire game tree in Pokémon, AI agents usually search over a partial game tree which contains a subset of the nodes from the complete game tree.
A grouping of Pokemon game releases that separates them based on the Pokémon and mechanics they include. Generations are typically referred to by acronyms based on their release titles:
- Generation I: RBY, RB, RBG
- Generation II: GSC, GS
- Generation III: ADV, RSE, RS, RSEFRLG
- Generation IV: DPP, DPPt, DP, HGSS, DPPtHGSS
- Generation V: BW, BW2, B2W2
- Generation VI: XY, ORAS
- Generation VII: SM, USUM
- Generation VIII: SS, SwSh
- Generation IX: SV
- genetic algorithm
Genetic algorithms (GA) are a method for solving both constrained and unconstrained optimization problems that’s based on natural selection. Genetic algorithms are typically useful when the objective function is discontinuous, non-differentiable, stochastic, or highly nonlinear, and are often used to determine the values of hyperparameters.
Can be used to describe an unorthodox and usually suboptimal Pokémon set, but more commonly refers to generational gimmicks which are novel mechanics which have been introduced to certain Pokémon generations which alter which actions are possible. Generational gimmicks include Mega Evolution, Z-Moves, Dynamax, and Terastallization.
An approach for arriving at a solution by trial and error or by rules that are only loosely defined via methods which aren’t optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, short-term goal or approximation.
A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are derived via training.
The process of drawing conclusions from evidence and reasoning. Inferences go beyond the explicitly stated information, using prior knowledge, experience, and logic to reach a likely or plausible conclusion. Deduction is a form of inference where the conclusions are logically guaranteed given a set of premises.
Hidden information refers to any aspect of the game state that’s not directly observable by all players. Games where some information is hidden from at least one player are considered to have imperfect information, in contrast to perfect information games.
A complete information game is one where all players know the rules, the payoffs of each possible outcome and the strategies available to everyone whereas in an incomplete information game at least one player lacks knowledge about some aspect of the game rules, payoffs, or another player’s strategy space. Complete information deals with game structure and payoffs whereas perfect information deals with game state — it is possible that a game’s information may be perfect but incomplete or complete and imperfect.
Public information is knowledge shared by all players whereas private information is knowledge held by only a single player. Private information is necessarily hidden but not all hidden information is private as it may refer to information known by a subset of more than one player or be completely unknown to all participants.
- information set
An information set (IS) is the set of all possible actions in the game for a given player, built on their observations and a set for a particular player that, given what that player has observed, shows the decision vertices available to the player which are indistinguishable to them at the current point in the game. All nodes within an information set represent situations where the player has observed the same sequence of actions, despite potentially different hidden moves being made by their opponents.
- input log
A log which tracks every input that has been made over the course of a battle, where an input encodes the action taken by a player. However, there are several potential ways this input can be encoded:
- raw input simply encodes the action type and slot (and optionally the target) (e.g.,
move 1 2 mega). This form is ultimately required by an engine to unambiguously implement the battle mechanics of Pokémon.
- logical input is an encoding where move or species names are used in lieu of slots, (e.g.,
move thunderbolt). This encoding is more intuitive and often more useful to work with when utilizing machine learning techniques, though isn’t accepted by all engines and can be ambiguous (e.g., in formats without the Species Clause).
- contextual input is a class of inputs which add additional context to a logical input about the scenario the input was made in to provide necessary information for learning algorithms (e.g.,
Zapdos Thunderbolt vs. Starmieor
Zapdos -> Starmie vs. Rhydon).
Applying anchored subsets of the input log to the initial battle state can be used to recompute past battle states, and competitive Pokémon developers may choose to improve memory overhead by storing the game tree in terms of an initial state and inputs as opposed to storing copies of battle state at each node.
- raw input simply encodes the action type and slot (and optionally the target) (e.g.,
- interior equilibria
A mixed Nash equilibrium where all players play each of their available strategies with a positive probability — i.e., no player puts all their weight on any single strategy. Interior equilibria are often more stable than non-interior equilibria because they’re less likely to be disrupted by small changes in the game’s payoffs. This is because if one player deviates from an interior equilibrium by playing a different strategy with a small probability, the other player won’t have a strong incentive to change their strategy in response.
- joint action
The combination of actions chosen by all players in a simultaneous move game. An individual player’s action is not sufficient to determine an outcome in a simultaneous move game — all players’ actions are required and each possible combination of actions may result in different payoffs for each player.
- killer move
A move ordering heuristic that optimizes search by prioritizing evaluating highly advantageous moves that dramatically limit an opposing player’s response options (e.g., a move which causes an opposing Pokémon to faint, possibly winning the battle).
Knowledge is the possession of information. Knowledge can refer to an a priori understanding of a game’s mechanics and rules, can be encoded as heuristics or within an evaluation function, or can be learned procedurally.
Learning refers to machine learning (ML), where statistical algorithms are leveraged to generalize and perform tasks without explicit instructions. There three main categories of machine learning are:
- supervised learning, where both example inputs and their desired outputs are provided
- unsupervised learning, only the inputs are provided, and
- reinforcement learning, where an agent aims to perform a certain goal and is given feedback along the way that it attempts to maximize
The first computer agent to achieve superhuman performance in no-limit Texas hold ’em poker, using a new variant of counterfactual regret minimization dubbed CFR+ (Brown 2017). Libratus was later followed by Modicum (Brown 2018), Pluribus (Brown 2019), and ReBeL (Brown 2020) — state of the art agents which are similarly foundational as AlphaGo but for imperfect information games.
- linear programming
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Typically used in competitive Pokémon AI research to solve for the Nash equilibrium.
The likelihood of superiority denotes how likely it would be for two players of the same strength to reach a certain result — in other fields called a p-value, a measure of statistical significance of a departure from the null hypothesis.
Match-up (or MU) describes the advantage (or disadvantage) of a given team or Pokémon against another team or Pokémon. In the context of teams or the battle as a whole, match-up determines which team is favored to win after team-building, and often the advantage may be strongly in the favor of one side.
- matrix node
A simple competitive Pokémon playing agent that chooses to use whichever
moveaction is expected to do the most damage the next turn to the opponent, commonly seen in the literature alongside the when benchmarking / testing agents due to its simplicity of implementation. Certain variants of this exist depending on how much of the damage formula is implemented, with the simplest versions of perhaps only considering each move’s effective base power after STAB and type effectiveness. A fully featured can achieve a much higher level of play than a , though is extremely exploitable (this can be somewhat mitigated by the addition of some randomness or certain heuristics).
Monte Carlo tree search is a heuristic search algorithm that combines tree search with the Monte Carlo method, a broad class of algorithms which rely on repeated random sampling to obtain results. MCTS is particularly useful for games with large decision spaces and imperfect information. The algorithm works by iteratively building a search tree where each node represents a possible game state, and edges represent actions. MCTS selectively expands the tree, guided by simulations (random playouts) that estimate the value of taking different actions from each state. This combination of strategic exploration and value estimation allows MCTS to find strong moves even in complex games like Go, where it’s impossible to evaluate every possible move.
A Markov decision process is as a stochastic decision-making process that uses a mathematical framework to model the decision-making of a dynamic system in scenarios where the results are either random or controlled by a decision maker, which makes sequential decisions over time. In competitive Pokémon AI research, MDP may also refer to the in certain contexts.
Also know as the meta, describes the most popular strategies and approaches to playing a game. In competitive Pokémon, usage stats provide information to determine which Pokémon define a particular format’s metagame.
A higher-level, problem-independent procedure or framework that outlines strategies to guide the design of heuristic optimization algorithms. Examples include genetic algorithms, simulated annealing, and Tabu search.
Also known as maximin, describes the concept of minimizing the possible loss for a worst case (maximum loss) scenario. Usually refers to the minimax search algorithm, a recursive algorithm for choosing the next move in an n-player game such that each player makes the move that maximizes the minimum value of the position resulting from the opponent’s possible moves. Furthermore, due to competitive Pokémon being a stochastic game, minimax is often actually referring to the expectiminimax algorithm.
The minimal viable product, a version with the bare-mininum number of features to be usable / demonstrate enough of a solution that feedback becomes relevant.
The Nash equilibrium, a stable state of a system involving the interaction of different participants, in which no participant can gain by a unilateral change of strategy if the strategies of the others remain unchanged. A mixed strategy Nash equilibrium or mixed Nash equilibrium (MNE) is a Nash equilibrium which requires a mixed strategy (a probability distribution over pure strategies), and every n-player game has at least one mixed Nash equilibrium. In competitive Pokémon, Nash equilibrium is used interchangeably with mixed Nash equilibrium.
A neural network, is a computer system modeled on the human brain and nervous system, comprised of multiple layers of nodes, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.
An efficiently updatable neural network, stylized ƎUИИ, an architecture intended to replace the evaluation function of alpha-beta searches running on a CPU via SIMD instructions. Originally developed for Computer Shogi, NNUE has been used successfully in chess engines such as Stockfish to achieve high ratings.
Information produced when the state transitions. In Pokémon, this may include a result which can be used to determine payoff and possible legal actions, logs which detail various events that may have occurred, an account of the chance actions, or a multitude of other possibilities depending on the engine.
An offline algorithm receives all input data upfront and is required to output a solution that addresses the entire problem. If something can be done offline that usually means that it can be precomputed ahead-of-time, as opposed to in the middle of a game (contrast to online).
An online algorithm is one that can process its input piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start (contrast to offline).
The beginning of the game (sometimes referred to as the early game). In Pokémon, this consists of the Team Preview in generations where it exists as well as the first few turns of the initial lead match-up. Followed by the midgame. Many agents leverage an opening book which is a precomputed policy for handling play at this stage of the game.
- opponent modeling
The process of building a representation of an opponent’s strategies, preferences, and potential actions in a competitive environment. This model can be built through observation of the opponent’s behavior, historical data, and by inferring their goals. An opponent model allows a player to proactively adjust strategies based on their understanding of the opponent’s tendencies and weaknesses and is an important factor in predicting an opponent’s actions.
One-turn look ahead. The is the conventional name for an agent which leverages only a single turn of look ahead. TL can be used to generalize this shorthand for describing the depth an agent searches to (e.g., 3TL for three turns worth of look ahead).
The reward a player receives from arriving at a particular outcome. In Pokémon, only terminal states give rewards, where the reward is determined based on the result of the battle (winning, losing or drawing — though pedantically, Pokémon glitches mean that one also needs to support the notion of a battle ending in error which can usually be mapped to a draw) and the utility / reward function (which usually maps W/L/T to
1/-1/0making Pokémon a zero-sum game or
1/0/0.5making Pokémon a constant-sum game, the latter being more common).
A public belief state is a probability distribution over all possible hidden states of the game, based on the information that’s shared by all players. PBSs allow players to reason about what others might know and make decisions based on the likelihood of different scenarios, even when they don’t have complete information about the game’s true state.
Perspective or viewpoint determines which information in Pokémon is available. Perspective can refer to a specific player or side, or can be one of two special perspectives — the omniscient perspective which is aware of all information from both sides and the spectator perspective which is aware of all public information but none of the private information of either player.
The component of competitive Pokémon that involves playing out a battle online after each players’ teams have already been chosen.
A C++ library that builds on top of
lrsliband provides a collection of search and planning utilities for simultaneous move stochastic games. Pinyon includes a state of the art version of simultaneous move alpha-beta search using the double-oracle method (Bošanský 2013) modified to support for stochastic games as well as a generalized implementation of tree-bandit search with Exp3.
A common abbreviation of Pokémon dating back to the earliest releases where the stylized PKMN was included in the cartridge character set. However, this abbreviation has since been adopted as the name for the pkmn collection of projects (notably the pkmn engine,
libpkmn) and is often what’s being referenced (and is always what’s being referred to when prefixed with an @, as @pkmn is the name of the organization).
A strategic decision-maker within the context of the game. In Pokémon these are usually referred to as Player 1 and Player 2 (alternatively, P1 / P2 or Player A / Player B) and are technically not completely symmetrical because certain game mechanics depend on the notion of the host player (conventionally P1). In more general terms, either P1 or P2 may be considered the player vs. the enemy / foe depending on perspective — the P1 vs. P2 distinction is absolute whereas the less specific player vs. foe terminology is relative. When considering Pokémon as a bimatrix game the row player and column player are usually also used to refer to relative perspectives.
The act of playing out the rest of the game from a particular node in the game tree. The term playout or rollout is sometimes confusingly used as a shorthand to describe a single iteration of a tree bandit search algorithm over the tree (i.e., the entire process of selection / expansion / simulation / backpropagation) as opposed to describing the just the simulation phase. Furthermore, playouts may sometimes not actually involve playing out to a terminal node if an evaluation function is used instead.
A state to action mapping — for each action possible from a given state, a policy returns the probability of taking the action. Ultimately, the objective of a competitive Pokémon AI is to determine the optimal policy for any given state. Reinforcement learning aims to accomplish this by learning a policy network.
Partially observable Markov decision process, a generalization of a Markov decision process which models an agent decision process in which it’s assumed that the system dynamics are determined by an MDP, but the agent can’t directly observe the underlying state.
The act of thinking during an opponent’s turn. Given that Pokémon is a simultaneous move game, pondering occurs after a player has submitted their action (or been asked to wait) while they’re waiting for their next opportunity to input an action.
An estimate of something that will happen in the future. In competitive Pokémon, a prediction usually refers to a player’s expectation of their opponent’s next action, though players also make predictions about any and all unknown information.
The format and rules governing how clients communicate with the server. Different simulators / engines have their own protocols and there is no single standard, though the vast majority of competitive Pokémon agents adhere to the Pokémon Showdown protocol.
A name for every heuristic that removes completely certain branches of the search tree, assuming they have no bearing to the search result. Backward pruning where the decision on what to prune happens after finding a refutation while searching (e.g. Alpha-beta) is sound and will never affect the search result — forward pruning which preemptively eliminates branches before searching comes with the risk of potentially overlooking something. Compare to reduction.
A model-free reinforcement learning algorithm to learn the value of an action in a particular state that can handle problems with stochastic transitions and rewards without requiring adaptations. Q-learning was used by DeepMind in their Deep Q-Network (DQN) algorithm along with a technique called experience replay to solve a wide-range of Atari games (Mnih 2015).
Also known as RP, a trivial competitive Pokémon playing agent that leverages some version of a uniform policy, commonly seen in the literature alongside the when benchmarking/ testing agents due to its simplicity of implementation. However, despite this seemingly simple definition, multiple subtly different versions of the exist (e.g., many implementations decide to choose only
moveactions unless forced to consider
switchactions, and may have similar heuristics for decided how to treat generational gimmicks). While all implementations are forced to choose
switchactions when they’re the only option available, nuances between implemntations exist in the common scenario where either type of action is possible.
Formally, the notation should be used to refer to an agent which of the time uniformly chooses to make a
switchaction when available, otherwise choosing uniformly between possible
moveactions. instead always considers available
moveactions but only considers
switchactions of the time unless forced. This distinction of choosing uniformly between a minimal subset of the action space or considering potentially multiple subsets of the space can be extended to gimmicks such as
terastallize, though in those cases a more explicit notation is preferred (e.g., would consider
switch20% of the time and use
zmovewhen available 90% of the time).
An approximation of a difficult problem by a nearby problem that’s easier to solve. A solution of the relaxed problem provides information about the original problem.
- reverse damage calculator
A tool which computes information about a Pokémon’s set (ability, item, spread ranges) based off of the amount of damage its moves inflict on its opponents. The inverse of a damage calculator.
- reward shaping
A technique used in reinforcement learning where small intermediate rewards are given to the algorithm to help reshape sparse reward functions to denser ones to help the algorithm converge more quickly.
A random number generator, though almost always in the context of competitive Pokémon this refers to a pseudo-random number generator (PRNG). The term RNG can also simply refer to random events in general. RNGs usually produce a stream of random numbers which depend on the initial starting state of the generator, known as its seed.
A colloquial term describing a chance event — the result from invoking the random number generator. The roll concept is generalized from that of a dice roll. In Pokémon, a damage roll is one of many possible damage values that result from the step in the damage formula that applies a random factor to the base computed damage of most attacking moves. See also damage coalition.
One of a set of explicit or understood regulations or principles governing conduct within a particular activity or sphere. The rules of competitive Pokémon are dictated by the format — there are a wide variety of possible classes of custom rules which may be apply. Can also refer to a rule from a rule-based agent.
A system that applies human-made rules to store, sort and manipulate data as opposed to rules discovered via machine learning. These systems typically take the forum of
if-thenstatements and involve hand-crafted evaluation functions.
The process of exploring the possible decision paths and outcomes within a game to identify strong strategies or solutions. Search algorithms systematically evaluate different actions and their potential consequences, often building a search tree where nodes represent game states and branches represent possible moves. The goal of search is to find optimal or near-optimal strategies, considering factors like payoffs, uncertainty, and the actions of other players.
A technique where a game-playing agent plays against itself repeatedly in order to generate data and improve the agent’s strategies without the need for a human opponent. During self-play, the agent learns from both its successes and failures, gradually refining its decision-making abilities within the game environment.
Sometimes called serialization, sequentialization refers to the process of transforming a simultaneous move game into a sequential game in order to simplify analysis or to make the game more tractable for finding solutions. Sequentialization involves introducing an artificial order of moves, potentially with information revealed to some players based on the actions of those who moved earlier, even if the original game didn’t have this sequential structure.
A simulator is something that mimics the behavior of something by modelling the underlying state of the target, though doesn’t necessarily attempt to do so through emulation of an external device. In the context of Pokémon, simulator more often describes a particular platform that provides a venue for online Pokémon battling, and usually involves a game server being built around a Pokémon engine. Pokémon Showdown is the most comprehensive and most popular Pokémon simulator.
Short for Single Instruction on Multiple Data, SIMD characterizes a type of CPU instruction which allows for computing operations in parallel on a vector of numbers.
Refers to the 1-indexed position of either a Pokémon in a player’s party (known as a team slot or party slot) or a move in a Pokémon’s moveset (move slot). Required when specifying a player’s action.
A simultaneous move game (also simply simultaneous game or static game) is a game where each player chooses their action without knowledge of the actions chosen by other players. Contrasts an alternating move game (AM) (also known as a sequential game) where play alternates between players and each player is aware of their opponent’s prior action.
SM is also a commonly used abbreviation for Pokémon Sun & Pokémon Moon, the seventh Pokémon generation.
Used to describe something with low informational entropy) — i.e, something which has comparatively low information per element. A matrix that primarily consists of zeros or a game tree where most of the branches can be pruned is considered to sparse.
A spread or EV spread refers to the distribution of components which make up a Pokémon’s stats — primarily EVs (effort values), but also IVs (individual values) and nature. Depending on the context, spread can also refer to Pokémon moves which target more than one Pokémon in certain formats.
An abbreviation for same-type attack bonus, a game mechanic where a damage boost gets applied to attacking moves used by a Pokémon of the same type.
The information required to determine which actions are available to each player. In Pokémon, this is the battle, and the term battle is used as opposed to state as state is a loaded term in programming (programming in general is arguably all about the management of state, so state needs to be qualified as battle state or game state to be unambiguous) despite it being a useful term in papers dealing purely with game theory.
Used to describe a process characterized by randomness. Probabilistic goes further and denotes something where the outcome can be described by a probability distribution. Stochastic and probabilistic processes are non-deterministic, though non-deterministic describes a system or process where the outcome can’t be predicted with certainty from its initial state (and which may have multiple possible outcomes for a single input). Given the subtle distinction between these three terms they’re often used interchangeably.
Used interchangeably used with policy, refers generically to a plan to achieve an overall goal.
A notion used in the solution concept of subgame perfect Nash equilibrium, a refinement of the Nash equilibrium that eliminates non-credible threats. Subgames are a subset of a game that meets a specific set of criteria and when taken in isolation constitutes a game in its own right.
Similarity or exact correspondence between different things. In the context of games, symmetry can result certain states occurring more than once in the game tree, meaning techniques such as transposition tables and action consolidation can be leveraged to speed up search.
Tree-bandit search, the name for the family of search algorithms which are roughly based around Monte Carlo tree search but which make use of various bandit algorithms to greatly improve performance and convergence.
Temporal difference learning refers to a class of model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function. These methods sample from the environment and perform updates based on current estimates, adjusting predictions to match later, more accurate, predictions about the future before the final outcome is known.
- Team Preview
A pre-battle phase introduced in Generation V in which players get to see each of the Pokémon on their opponent’s team and choose which Pokémon get brought to battle and in which order.
- Thompson sampling
The de facto standard open source machine learning and scientific computing framework, most often used in Python via PyTorch.
A neural network that learns context and thus meaning by tracking relationships in sequential data.
The change that a state undergoes due to an action. In the pkmn engine, this is implemented by a battle’s
updatefunction. In game theory, turns typically count transitions (though not in Pokémon), and a transition may produce a payoff or some sort of observation.
The entire set of possible transitions from any given state. While every Pokémon game engine must allow for performing a single transition, performing all transitions at once and returning the set of states is a less common feature, and almost always when transitions or
transitionsis used in conversation as a noun it’s referring to this functionality.
A sequence of moves that results in a position that may also be reached by another, more common sequence of moves. Transpositions are a type of symmetry and are often handled via transposition table. True transpositions are uncommon in Pokémon unless some level of abstraction has been applied.
A transposition table is a cache of previously seen positions, and associated evaluations, in a game tree generated by a computer game playing program. If a position recurs via a different sequence of moves, the value of the position is retrieved from the table, avoiding re-searching the game tree below that position.
Game theory uses the term ply to refer to one turn taken by one player in a sequential game which doesn’t apply to Pokémon given that Pokémon is a simultaneous game. A turn in Pokémon is a well-defined concept as certain game mechanics depend on its definition (e.g., residual damage happens at the end of a turn in Pokémon in later generations). Instead, it’s more useful to reason in terms of decision points that occur between updates to the battle state (which may happen multiple times in a single turn, e.g., due to a Pokémon using Baton Pass or fainting) — places where the game requires input from players to proceed. In Pokémon Showdown and the pkmn engine, a decision point always requires a joint action, introducing the concept of passing / waiting to account for situations where on a console one of the players is forced to do nothing while input is collected from their opponent.
Upper Confidence bounds applied to Trees, the contemporary standard implementation of Monte Carlo tree search which uses the UCB algorithm to select promising child nodes to expand. See also tree bandit search.
- uniform policy
A policy which assigns equal (uniform) probability to each of the actions possible — this sort of policy is usually only optimal in the absence of any other information (e.g., at the beginning of a search).
- usage stats
Statistics processed from a corpus of battle logs that detail the usage of various Pokémon and their attributes for a given format. Smogon University produces public statistics calculated from all rated battles on Pokémon Showdown each month which they use to influence the tiering decisions for the formats they maintain, though alternative usage statistics can also be produced for specific use cases.
The expected payoff from a given state, produced by an evaluation function. Evaluation functions are usually difficult to produce accurately in Pokémon and are often approximated with reinforcement learning via a value network that assigns a value to each state of the game by calculating an expected cumulative score.
Software that has been advertised but isn’t yet available to use / buy, either because it’s only a concept or because it’s still being written or designed. See also pkmn.
A variation from the standard rules of competitive Pokémon which usually simplifies / relaxes certain properties / rules of the game to make it more tractable for AI, with the expectation that solutions to a variant usually are useful for generalizing to the original ruleset. In computer poker, Kuhn poker is the most common simplification studied, whereas in Pokémon there are numerous potential options of variants. Note that while the various Pokémon formats can be considered variants, the term variant is reserved to describe simplifications to the game that go beyond the rule changes typical between formats.
A win condition is a Pokémon or scenario that a player needs in order to secure that they win the game.
A zero-sum game is a mathematical representation in game theory of a situation that involves two sides, where the result is an advantage for one side and an equivalent loss for the other, such that the net improvement in benefit of the game is zero. Zero-sum games are a specific example of a constant-sum game, a concept when the sum of the players’ gains and losses is a fixed constant . Constant-sum games can be trivially transformed into a zero-sum game by subtracting from every payoff.