Research | pkmn.ai

[Rudolph 2025]: Reevaluating Policy Gradient Methods for Imperfect-Information Games. Rudolph, M., Lichtle, N., Mohammadpour, S., Bayen, A., Kolter, J. Z., Zhang, A., Farina, G., Vinitsky, E. and Sokota, S. (2025) arXiv:2502.08938 [cs.LG].
[Zhang 2025]: General search techniques without common knowledge for imperfect-information games, and application to superhuman Fog of War chess. Zhang, B. H. Z. and Sandholm, T. (2025) arXiv:2506.01242 [cs.GT].
[Ruoss 2024]: Grandmaster-Level Chess Without Search. Ruoss, A., Delétang, G., Medapati, S., Grau-Moya, J., Wenliang, L. K., Catt, E., Reid, J. and Genewein, T. (2024)
[Sokota 2024]: The Update-Equivalence Framework for Decision-Time Planning. Sokota, S., Farina, G., Wu, D. J., Hu, H., Wang, K. A., Kolter, J. Z. and Brown, N. (2024)
[Wu 2024]: Monte-Carlo Graph Search from First Principles. Wu, D. J. (2024)
[McAleer 2023]: ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret. McAleer, S. M., Farina, G., Lanctot, M. and Sandholm, T. (2023) The Eleventh International Conference on Learning Representations.
[Sokota 2023]: Abstracting Imperfect Information Away from Two-Player Zero-Sum Games. Sokota, S., D'Orazio, R., Ling, C. K., Wu, D. J., Kolter, J. Z. and Brown, N. (2023) Proceedings of the 40th International Conference on Machine Learning.
[Świechowski 2023]: Monte Carlo Tree Search: a review of recent modifications and applications. Świechowski, M., Godlewski, K., Sawicki, B. and Mańdziuk, J. (2023) Artificial Intelligence Review, 56(3):2497–2562.
[Sychrovsky 2023]: Learning not to Regret. Sychrovsky, D., Sustr, M., Davoodi, E., Lanctot, M. and Schmid, M. (2023) arXiv:2303.01074 [cs.GT].
[Takizawa 2023]: Othello is Solved. Takizawa, H. (2023) arXiv:2310.19387 [cs.AI].
[Antonoglou 2022]: Planning in Stochastic Environments with a Learned Model. Antonoglou, I., Schrittwieser, J., Ozair, S., Hubert, T. K. and Silver, D. (2022) International Conference on Learning Representations.
[Diddigi 2022]: A Generalized Minimax Q-learning Algorithm for Two-Player Zero-Sum Stochastic Games. Diddigi, R. B., Kamanchi, C. and Bhatnagar, S. (2022)
[Kovařík 2022]: Rethinking formal models of partially observable multiagent decision making. Kovařík, V., Schmid, M., Burch, N., Bowling, M. and Lisý, V. (2022) Artificial Intelligence, 303:103645.
[Nashed 2022]: A Survey of Opponent Modeling in Adversarial Domains. Nashed, S. and Zilberstein, S. (2022) Journal of Artificial Intelligence Research, 73.
[Perolat 2022]: Mastering the game of Stratego with model-free multiagent reinforcement learning. Perolat, J., De Vylder, B., Hennes, D., Tarassov, E., Strub, F., Boer, V., Muller, P., Connor, J. T., Burch, N., Anthony, T., McAleer, S., Elie, R., Cen, S. H., Wang, Z., Gruslys, A., Malysheva, A., Khan, M., Ozair, S., Timbers, F., Pohlen, T., Eccles, T., Rowland, M., Lanctot, M., Lespiau, J., Piot, B., Omidshafiei, S., Lockhart, E., Sifre, L., Beauguerlange, N., Munos, R., Silver, D., Singh, S., Hassabis, D. and Tuyls, K. (2022) Science, 378(6623):990–996.
[Czech 2021]: Improving AlphaZero Using Monte-Carlo Graph Search. Czech, J., Korus, P. and Kersting, K. (2021) Proceedings of the International Conference on Automated Planning and Scheduling, 31(1):103–111.
[Norelli 2021]: OLIVAW: Mastering Othello without Human Knowledge, nor a Fortune. Norelli, A. and Panconesi, A. (2021) arXiv:2103.17228 [cs.LG].
[Perolat 2021]: From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization. Perolat, J., Munos, R., Lespiau, J., Omidshafiei, S., Rowland, M., Ortega, P., Burch, N., Anthony, T., Balduzzi, D., De Vylder, B., Piliouras, G., Lanctot, M. and Tuyls, K. (2021) Proceedings of the 38th International Conference on Machine Learning, pages 8525–8535.
[Saffidine 2021]: Alpha-Beta Pruning for Games with Simultaneous Moves. Saffidine, A., Finnsson, H. and Buro, M. (2021) Proceedings of the AAAI Conference on Artificial Intelligence, 26(1):556–562.
[Schmid 2021]: Student of Games. Schmid, M., Moravčík, M., Burch, N., Kadlec, R., Davidson, J., Waugh, K., Bard, N., Timbers, F., Lanctot, M., Holland, Z., Davoodi, E., Christianson, A. and Bowling, M. H. (2021) arXiv:2112.03178 [cs.AI].
[Yang 2021]: An Overview of Multi-agent Reinforcement Learning from Game Theoretical Perspective. Yang, Y. and Wang, J. (2021)
[Brown 2020]: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games. Brown, N., Bakhtin, A., Lerer, A. and Gong, Q. (2020) Advances in Neural Information Processing Systems, pages 17057–17069.
[Ganzfried 2020]: Fictitious Play Outperforms Counterfactual Regret Minimization. Ganzfried, S. (2020) arXiv:2001.11165 [cs.GT].
[Gray 2020]: Human-Level Performance in No-Press Diplomacy via Equilibrium Search. Gray, J., Lerer, A., Bakhtin, A. and Brown, N. (2020) arXiv:2010.02923 [cs.AI].
[Kovařík 2020]: Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games. Kovařík, V. and Lisý, V. (2020) Machine Learning, 109(1):1–50.
[Kroer 2020]: Limited lookahead in imperfect-information games. Kroer, C. and Sandholm, T. (2020) Artificial Intelligence, 283:103218.
[Lerer 2020]: Improving Policies via Search in Cooperative Partially Observable Games. Lerer, A., Hu, H., Foerster, J. and Brown, N. (2020) Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):7187-7194.
[O'Donoghue 2020]: Matrix games with bandit feedback. O'Donoghue, B., Lattimore, T. and Osband, I. (2020) Conference on Uncertainty in Artificial Intelligence.
[Steinberger 2020]: DREAM: Deep Regret minimization with Advantage baselines and Model-free learning. Steinberger, E., Lerer, A. and Brown, N. (2020)
[Ye 2020]: Towards Playing Full MOBA Games with Deep Reinforcement Learning. Ye, D., Chen, G., Zhang, W., Chen, S., Yuan, B., Liu, B., Chen, J., Liu, Z., Qiu, F., Yu, H., Yin, Y., Shi, B., Wang, L., Shi, T., Fu, Q., Yang, W., Huang, L. and Liu, W. (2020)
[Bard 2019]: The Hanabi challenge: A new frontier for AI research. Bard, N., Foerster, J. N., Chandar, S., Burch, N., Lanctot, M., Song, H. F., Parisotto, E., Dumoulin, V., Moitra, S., Hughes, E., Dunning, I., Mourad, S., Larochelle, H., Bellemare, M. G. and Bowling, M. (2020) Artificial Intelligence, 280.
[Brown 2019]: Superhuman AI for multiplayer poker. Brown, N. and Sandholm, T. (2019) Science, 365(6456):885-890. (Includes supplementary text).
[Brown 2019b]: Solving Imperfect-Information Games via Discounted Regret Minimization. Brown, N. and Sandholm, T. (2019) Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence.
[Foerster 2019]: Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning. Foerster, J., Song, F., Hughes, E., Burch, N., Dunning, I., Whiteson, S., Botvinick, M. and Bowling, M. (2019) arXiv:1811.01458 [cs.MA].
[Goodman 2019]: Re-determinizing MCTS in Hanabi. Goodman, J. (2019) 2019 IEEE Conference on Games (CoG), pages 1–8.
[Kapturowski 2019]: Recurrent Experience Replay in Distributed Reinforcement Learning. Kapturowski, S., Ostrovski, G., Dabney, W., Quan, J. and Munos, R. (2019) International Conference on Learning Representations.
[Kovařík 2019]: Value Functions for Depth-Limited Solving in Zero-Sum Imperfect-Information Games. Kovařík, V. and Lisý, V. (2019)
[Schrittwieser 2019]: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Schrittwieser, J., Ioannis, A., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., Lillicrap, T. P. and Silver, D. (2019)
[Vinyals 2019]: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., Vezhnevets, A. S., Leblond, R., Pohlen, T., Dalibard, V., Budden, D., Sulsky, Y., Molloy, J., Paine, T. L., Gulcehre, C., Wang, Z., Pfaff, T., Wu, Y., Ring, R., Yogatama, D., Wünsch, D., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Kavukcuoglu, K., Hassabis, D., Apps, C. and Silver, D. (2019) Nature, 575(7782):350-354.
[Wu 2019]: Accelerating Self-Play Learning in Go. Wu, D. J. (2019) arXiv:1902.10565 [cs.LG].
[Brown 2018]: Depth-Limited Solving for Imperfect-Information Games. Brown, N., Sandholm, T. and Amos, B. (2018) Advances in Neural Information Processing Systems, pages 7663–7674.
[Brown 2018b]: Deep Counterfactual Regret Minimization. Brown, N., Lerer, A., Gross, S. and Sandholm, T. (2018) arXiv:1811.00164 [cs.AI].
[Espeholt 2018]: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning, I., Legg, S. and Kavukcuoglu, K. (2018)
[Russo 2018]: A Tutorial on Thompson Sampling. Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I. and Wen, Z. (2018) Found. Trends Mach. Learn., 11(1):1–96.
[Schmid 2018]: Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines. Schmid, M., Burch, N., Lanctot, M., Moravčík, M., Kadlec, R. and Bowling, M. (2018)
[Silver 2018]: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K. and Hassabis, D. (2018) Science, 362(6419):1140–1144. (Includes supplementary text).
[Świechowski 2018]: Improving Hearthstone AI by Combining MCTS and Supervised Learning Algorithms. Świechowski, M., Tajmajer, T. and Janusz, A. (2018) arXiv:1808.04794 [cs.AI].
[Xiao 2018]: Memory-Augmented Monte Carlo Tree Search. Xiao, C., Mei, J. and Müller, M. (2018) Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).
[Brown 2017]: Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Brown, N. and Sandholm, T. (2017) Science, 359(6374):418-424. (Includes supplementary text).
[Brown 2017b]: Safe and Nested Subgame Solving for Imperfect-Information Games. Brown, N. and Sandholm, T. (2017) Neural Information Processing Systems.
[Fortunato 2017]: Noisy Networks for Exploration. Fortunato, M., Azar, M. G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C. and Legg, S. (2017) CoRR, abs/1706.10295.
[Li 2017]: Thompson Sampling for Monte Carlo Tree Search and Maximin Action Identification. Li, M. (2017) Master's thesis, Universiteit Leiden, Leiden, Netherlands.
[Mirsoleimani 2017]: Structured Parallel Programming for Monte Carlo Tree Search. Mirsoleimani, S. A., Plaat, A., Herik, J. and Vermaseren, J. A. M. (2017) arXiv:1704.00325 [cs.AI].
[Moravčík 2017]: DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Moravčík, M., Schmid, M., Burch, N., Lisý, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M. and Bowling, M. (2017) Science, 356(6337):508-513.
[Neller 2017]: An Introduction to Counterfactual Regret Minimization. Neller, T. W. and Lanctot, M. (2017)
[Schulman 2017]: Proximal Policy Optimization Algorithms. Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O. (2017) CoRR, abs/1707.06347.
[Silver 2017]: Mastering the game of Go without human knowledge. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., Driessche, G., Graepel, T. and Hassabis, D. (2017) Nature, 550(7676):354–359.
[Svenstrup 2017]: Hash Embeddings for Efficient Word Representations. Svenstrup, D., Hansen, J. M. and Winther, O. (2017)
[Volkert 2017]: Confidence Bound Algorithms in Game Trees. Volkert, M. (2017) Master's thesis, Universiteit Leiden, Leiden, Netherlands.
[Baier 2016]: Time Management for Monte Carlo Tree Search. Baier, H. and Winands, M. H. M. (2016) IEEE Transactions on Computational Intelligence and AI in Games, 8(3):301–314.
[Bošanský 2016]: Algorithms for Computing Strategies in Two-Player Simultaneous Move Games. Bošanský, B., Lisý, V., Lanctot, M., Cermak, J. and Winands, M. H. M. (2016) Artificial Intelligence, 237:1-40.
[Heinrich 2016]: Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. Heinrich, J. and Silver, D. (2016)
[Mirsoleimani 2016]: A New Method for Parallel Monte Carlo Tree Search. Mirsoleimani, S. A., Plaat, A., Herik, H. and Vermaseren, J. (2016) arXiv:1605.04447 [cs.DC].
[Silver 2016]: Mastering the game of Go with deep neural networks and tree search. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. and Hassabis, D. (2016) Nature, 529(7587):484–489.
[Bowling 2015]: Heads-up limit hold’em poker is solved. Bowling, M., Burch, N., Johanson, M. and Tammelin, O. (2015) Science, 347(6218):145–149.
[Clark 2015]: Training Deep Convolutional Neural Networks to Play Go. Clark, C. and Storkey, A. (2015) Proceedings of the 32nd International Conference on Machine Learning, pages 1766–1774.
[Hausknecht 2015]: Deep Recurrent Q-Learning for Partially Observable MDPs. Hausknecht, M. J. and Stone, P. (2015) arXiv:1507.06527 [cs.LG].
[Lisý 2015]: Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games. Lisý, V., Lanctot, M. and Bowling, M. (2015) Adaptive Agents and Multi-Agent Systems.
[Mnih 2015]: Human-level control through deep reinforcement learning. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. and Hassabis, D. (2015) Nature, 518(7540):529–533.
[Schaul 2015]: Deep Recurrent Q-Learning for Partially Observable MDPs. Schaul, T., Quan, J., Antonoglou, I. and Silver, D. (2015) arXiv:1511.05952 [cs.LG].
[Tian 2015]: Better Computer Go Player with Neural Network and Long-term Prediction. Tian, Y. and Zhu, Y. (2015) arXiv:1511.06410 [cs.LG].
[Maddison 2014]: Move Evaluation in Go Using Deep Convolutional Neural Networks. Maddison, C. J., Huang, A., Sutskever, I. and Silver, D. (2014) arXiv:1412.6564 [cs.LG].
[Powley 2014]: Information capture and reuse strategies in Monte Carlo Tree Search, with applications to games of hidden information. Powley, E. J., Cowling, P. I. and Whitehouse, D. (2014) Artificial Intelligence, 217:92–116.
[Tak 2014]: Monte Carlo Tree Search Variants for Simultaneous Move Games. Tak, M. J. W., Lanctot, M. and Winands, M. H. M. (2014) 2014 IEEE Conference on Computational Intelligence and Games, pages 1–8.
[Tammelin 2014]: Solving Large Imperfect Information Games Using CFR+. Tammelin, O. (2014)
[Baier 2013]: Monte-Carlo Tree Search and minimax hybrids. Baier, H. and Winands, M. H. M. (2013) 2013 IEEE Conference on Computational Inteligence in Games (CIG), pages 1–8.
[Bošanský 2013]: Using Double-Oracle Method and Serialized Alpha-Beta Search for Pruning in Simultaneous Move Games. Bošanský, B., Lisý, V., Čermák, J., Vítek, R. and Pěchouček, M. (2013) Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pages 48–54.
[Burch 2013]: Solving Imperfect Information Games Using Decomposition. Burch, N., Johanson, M. B. and Bowling, M. (2013) AAAI Conference on Artificial Intelligence.
[Lanctot 2013]: Monte Carlo Tree Search for Simultaneous Move Games: A Case Study in the Game of Tron. Lancto, M., Wittlinger, C., Winands, M. H.M. and Den Teuling, N. G.P. (2013) Proceedings of the Twenty-Fifth Benelux Conference on Artificial Intelligence (BNAIC), pages 104–111.
[Lisý 2013]: Convergence of Monte Carlo Tree Search in Simultaneous Move Games. Lisý, V., Kovařík, V., Lanctot, M. and Bošanský, B. (2013) Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, pages 2112–2120.
[Nayyar 2013]: Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach. Nayyar, A., Mahajan, A. and Teneketzis, D. (2013) Automatic Control, IEEE Transactions on, 58(7):1644–1658.
[Browne 2012]: A Survey of Monte Carlo Tree Search Methods. Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S. and Colton, S. (2012) IEEE Transactions on Computational Intelligence and AI in Games, 4(1):1–43.
[Burch 2012]: Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions. Burch, N., Lanctot, M., Szafron, D. and Gibson, R. (2012) Advances in Neural Information Processing Systems.
[Cowling 2012]: Information Set Monte Carlo Tree Search. Cowling, P. I., Powley, E. J. and Whitehouse, D. (2012) IEEE Transactions on Computational Intelligence and AI in Games, 4(2):120–143.
[Cowling 2012b]: Ensemble Determinization in Monte Carlo Tree Search for the Imperfect Information Card Game Magic: The Gathering. Cowling, P. I., Ward, C. D. and Powley, E. J. (2012) IEEE Transactions on Computational Intelligence and AI in Games, 4(4):241–257.
[Den Teuling 2012]: Monte-Carlo Tree Search for the Simultaneous Move Game Tron. Den Teuling, N. G.P. and Winands, M. H.M. (2012)
[Halpern 2012]: Iterated regret minimization: A new solution concept. Halpern, J. Y. and Pass, R. (2012) Games and Economic Behavior, 74(1):184–207.
[Lanctot 2012]: No-Regret Learning in Extensive-Form Games with Imperfect Recall. Lanctot, M., Gibson, R., Burch, N., Zinkevich, M. and Bowling, M. (2012) arXiv:1205.0622 [cs.GT].
[Perick 2012]: Comparison of different selection strategies in Monte-Carlo Tree Search for the game of Tron. Perick, P., St-Pierre, D. L., Maes, F. and Ernst, D. (2012) 2012 IEEE Conference on Computational Intelligence and Games (CIG), pages 242–249.
[Tak 2012]: N-Grams and the Last-Good-Reply Policy Applied in General Game Playing. Tak, M. J. W., Winands, M. H. M. and Bjornsson, Y. (2012) IEEE Transactions on Computational Intelligence and AI in Games, 4(2):73–83.
[Tolpin 2012]: MCTS Based on Simple Regret. Tolpin, D. and Shimony, S. E. (2012) Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, pages 570–576.
[Auger 2011]: Multiple Tree for Partially Observable Monte-Carlo Tree Search. Auger, D. (2011) Applications of Evolutionary Computation, pages 53–62.
[Fern 2011]: Ensemble Monte-Carlo Planning: An Empirical Study. Fern, A. and Lewis, P. (2011) Proceedings of the International Conference on Automated Planning and Scheduling, 21(1):58–65.
[Rocki 2011]: Large-Scale Parallel Monte Carlo Tree Search on GPU. Rocki, K. and Suda, R. (2011) 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pages 2034–2037.
[Rosin 2011]: Multi-armed bandits with episode context. Rosin, C. D. (2011) Annals of Mathematics and Artificial Intelligence, 61:203–230.
[Yoshizoe 2011]: Scalable distributed Monte-Carlo tree search. Yoshizoe, K., Kishimoto, A., Kaneko, T., Yoshimoto, H. and Ishikawa, Y. (2011) Proceedings of the 4th Annual Symposium on Combinatorial Search, SoCS 2011, pages 180–187.
[Avis 2010]: Enumeration of Nash Equilibria for Two-Player Games. Avis, D., Rosenberg, G. D., Savani, R. and Stengel, B. (2010) Economic Theory, 42(1):9–37.
[Chaslot 2010]: Monte-Carlo Tree Search. Chaslot, G. M. J. (2010) PhD thesis, Maastricht University, Maastricht, Netherlands.
[Enzenberger 2010]: A Lock-Free Multithreaded Monte-Carlo Tree Search Algorithm. Enzenberger, M. and Müller, M. (2010) Advances in Computer Games, pages 14–20.
[Long 2010]: Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search. Long, J., Sturtevant, N., Buro, M. and Furtak, T. (2010) Proceedings of the AAAI Conference on Artificial Intelligence, 24(1):134–140.
[Rocki 2010]: Massively Parallel Monte Carlo Tree Search. Rocki, K. and Suda, R. (2010)
[Schiffel 2010]: Symmetry Detection in General Game Playing. Schiffel, S. (2010) Proceedings of the AAAI Conference on Artificial Intelligence, 24(1):980–985.
[Bjarnason 2009]: Lower Bounding Klondike Solitaire with Monte-Carlo Planning. Bjarnason, R., Fern, A. and Tadepalli, P. (2009) Proceedings of the International Conference on Automated Planning and Scheduling, 19(1):26–33.
[Lanctot 2009]: Monte Carlo Sampling for Regret Minimization in Extensive Games. Lanctot, M., Waugh, K., Zinkevich, M. and Bowling, M. (2009) Advances in Neural Information Processing Systems.
[Shafiei 2009]: Comparing UCT versus CFR in Simultaneous Games. Shafiei, M., Sturtevant, N. R. and Schaeffer, J. (2009)
[Winands 2008]: Monte-Carlo Tree Search Solver. Winands, M. H. M., Björnsson, Y. and Saito, J. (2008) Computers and Games, pages 25–36.
[Coulom 2007]: Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. Coulom, R. (2007) Computers and Games, pages 72–83.
[Gilpin 2007]: Lossless Abstraction of Imperfect Information Games. Gilpin, A. and Sandholm, T. (2007) Journal of the ACM, 54(5):25–es.
[Zinkevich 2007]: Regret Minimization in Games with Incomplete Information. Zinkevich, M., Johanson, M., Bowling, M. and Piccione, C. (2007) Advances in Neural Information Processing Systems.
[Gelly 2006]: Exploration exploitation in Go: UCT for Monte-Carlo Go. Gelly, S. and Wang, Y. (2006) NIPS: Neural Information Processing Systems Conference On-line trading of Exploration and Exploitation Workshop.
[Auer 2002]: Finite-time Analysis of the Multiarmed Bandit Problem. Auer, P., Cesa-Bianchi, N. and Fischer, P. (2002) Machine Learning, 47(2):235–256.
[Frank 1998]: Search in games with incomplete information: a case study using Bridge card play. Frank, I. and Basin, D. (1998) Artificial Intelligence, 100(1):87–123.