Abstract
Artificial intelligence (AI) is defined as the ability of machines to perform tasks that are usually associated with intelligent beings. Argument and debate are fundamental capabilities of human intelligence, essential for a wide range of human activities, and common to all human societies. The development of computational argumentation technologies is therefore an important emerging discipline in AI research1. Here we present Project Debater, an autonomous debating system that can engage in a competitive debate with humans. We provide a complete description of the system’s architecture, a thorough and systematic evaluation of its operation across a wide range of debate topics, and a detailed account of the system’s performance in its public debut against three expert human debaters. We also highlight the fundamental differences between debating with humans as opposed to challenging humans in game competitions, the latter being the focus of classical ‘grand challenges’ pursued by the AI research community over the past few decades. We suggest that such challenges lie in the ‘comfort zone’ of AI, whereas debating with humans lies in a different territory, in which humans still prevail, and for which novel paradigms are required to make substantial progress.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
The full transcripts of the three public debates in which Project Debater participated are available in Supplementary Information section 11, including information that elucidates the system’s operation throughout, and the results of the audience votes. In addition, multiple datasets that were constructed and used while developing Project Debater are available at https://www.research.ibm.com/haifa/dept/vst/debating_data.shtml. Source data are provided with this paper for Fig. 3. Source data are provided with this paper.
Code availability
Most of the underlying capabilities of Project Debater, including the argument mining components, are freely available for academic research upon request as cloud services via https://early-access-program.debater.res.ibm.com/academic_use (in which the terminology differs: what we call here ‘motion’ and ‘topic’ is denoted as ‘topic’ and ‘concept’, respectively.).
References
Lawrence, J. & Reed, C. Argument mining: a survey. Comput. Linguist. 45, 765–818 (2019).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/1810.04805 (2018).
Peters, M. et al. Deep contextualized word representations. In Proc. 2018 Conf. North Am. Ch. Assoc. for Computational Linguistics: Human Language Technologies Vol. 1, 2227–2237 (Association for Computational Linguistics, 2018); https://www.aclweb.org/anthology/N18–1202
Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, http://www.persagen.com/files/misc/radford2019language.pdf (2019).
Socher, R. et al. Recursive deep models for semantic compositionality over a sentiment treebank. In Proc. Empirical Methods in Natural Language Processing (EMNLP) 1631–1642 (Association for Computational Linguistics, 2013).
Yang, Z. et al. XLNet: generalized autoregressive pretraining for language understanding. In Adv. in Neural Information Processing Systems (NIPS) 5753−5763 (Curran Associates,2019).
Cho, K., van Merriënboer, B., Bahdanau, D. & Bengio, Y. On the properties of neural machine translation: encoder–decoder approaches. In Proc. 8th Worksh. on Syntax, Semantics and Structure in Statistical Translation 103−111 (Association for Computational Linguistics, 2014).
Gambhir, M. & Gupta, V. Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47, 1–66 (2017).
Young, S., Gašić, M., Thomson, B. & Williams, J. POMDP-based statistical spoken dialog systems: A review. Proc. IEEE 101, 1160–1179 (2013).
Gurevych, I., Hovy, E. H., Slonim, N. & Stein, B. Debating Technologies (Dagstuhl Seminar 15512) Dagstuhl Report 5 (2016).
Levy, R., Bilu, Y., Hershcovich, D., Aharoni, E. & Slonim, N. Context dependent claim detection. In Proc. COLING 2014, the 25th Int. Conf. on Computational Linguistics: Technical Papers 1489–1500 (Dublin City University and Association for Computational Linguistics, 2014); https://www.aclweb.org/anthology/C14–1141
Rinott, R. et al. Show me your evidence—an automatic method for context dependent evidence detection. In Proc. 2015 Conf. on Empirical Methods in Natural Language Processing 440–450 (Association for Computational Linguistics, 2015); https://www.aclweb.org/anthology/D15–1050
Shnayderman, I. et al. Fast end-to-end wikification. Preprint at https://arxiv.org/abs/1908.06785 (2019).
Borthwick, A. A Maximum Entropy Approach To Named Entity Recognition. PhD thesis, New York Univ. https://cs.nyu.edu/media/publications/borthwick_andrew.pdf (1999).
Finkel, J. R., Grenager, T. & Manning, C. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proc. 43rd Ann. Meet. Assoc. for Computational Linguistics 363–370 (Association for Computational Linguistics, 2005).
Levy, R., Bogin, B., Gretz, S., Aharonov, R. & Slonim, N. Towards an argumentative content search engine using weak supervision. In Proc. 27th Int. Conf. on Computational Linguistics (COLING 2018) 2066–2081, https://www.aclweb.org/anthology/C18-1176.pdf (International Committee on Computational Linguistics, 2018).
Ein-Dor, L. et al. Corpus wide argument mining—a working solution. In Proc. Thirty-Fourth AAAI Conf. on Artificial Intelligence 7683−7691 (AAAI Press, 2020).
Levy, R. et al. Unsupervised corpus-wide claim detection. In Proc. 4th Worksh. on Argument Mining 79–84 (Association for Computational Linguistics, 2017); https://www.aclweb.org/anthology/W17–5110
Shnarch, E. et al. Will it blend? Blending weak and strong labeled data in a neural network for argumentation mining. In Proc. 56th Ann. Meet. Assoc. for Computational Linguistics Vol. 2, 599–605 (Association for Computational Linguistics, 2018); https://www.aclweb.org/anthology/P18–2095
Gleize, M. et al. Are you convinced? Choosing the more convincing evidence with a Siamese network. In Proc. 57th Conf. Assoc. for Computational Linguistic, 967–976 (Association for Computational Linguistics, 2019).
Bar-Haim, R., Bhattacharya, I., Dinuzzo, F., Saha, A. & Slonim, N. Stance classification of context-dependent claims. In Proc. 15th Conf. Eur. Ch. Assoc. for Computational Linguistics Vol. 1, 251–261 (Association for Computational Linguistics, 2017).
Bar-Haim, R., Edelstein, L., Jochim, C. & Slonim, N. Improving claim stance classification with lexical knowledge expansion and context utilization. In Proc. 4th Worksh. on Argument Mining 32–38 (Association for Computational Linguistics, 2017).
Bar-Haim, R. et al. From surrogacy to adoption; from bitcoin to cryptocurrency: debate topic expansion. In Proc. 57th Conf. Assoc. for Computational Linguistics 977–990 (Association for Computational Linguistics, 2019).
Bilu, Y. et al. Argument invention from first principles. In Proc. 57th Ann. Meet. Assoc. for Computational Linguistics 1013–1026 (Association for Computational Linguistics, 2019).
Ein-Dor, L. et al. Semantic relatedness of Wikipedia concepts—benchmark data and a working solution. In Proc. Eleventh Int. Conf. on Language Resources and Evaluation (LREC 2018) 2571−2575 (Springer, 2018).
Pahuja, V. et al. Joint learning of correlated sequence labelling tasks using bidirectional recurrent neural networks. In Proc. Interspeech 548−552 (International Speech Communication Association, 2017).
Mirkin, S. et al. Listening comprehension over argumentative content. In Proc. 2018 Conf. on Empirical Methods in Natural Language Processing 719–724 (Association for Computational Linguistics, 2018).
Lavee, T. et al. Listening for claims: listening comprehension using corpus-wide claim mining. In ArgMining Worksh. 58−66 (Association for Computational Linguistics, 2019).
Orbach, M. et al. A dataset of general-purpose rebuttal. In Proc. 2019 Conf. on Empirical Methods in Natural Language Processing 5595−5605 (Association for Computational Linguistics, 2019).
Slonim, N., Atwal, G. S., Tkačik, G. & Bialek, W. Information-based clustering. Proc. Natl Acad. Sci. USA 102, 18297–18302 (2005).
Ein Dor, L. et al. Learning thematic similarity metric from article sections using triplet networks. In Proc. 56th Ann. Meet. Assoc. for Computational Linguistics Vol. 2, 49–54 (Association for Computational Linguistics, 2018); https://www.aclweb.org/anthology/P18–2009
Shechtman, S. & Mordechay, M. Emphatic speech prosody prediction with deep Lstm networks. In 2018 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) 5119–5123 (IEEE, 2018).
Mass, Y. et al. Word emphasis prediction for expressive text to speech. In Interspeech 2868–2872 (International Speech Communication Association, 2018).
Feigenblat, G., Roitman, H., Boni, O. & Konopnicki, D. Unsupervised query-focused multi-document summarization using the cross entropy method. In Proc. 40th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval 961–964 (Association for Computing Machinery, 2017).
Daxenberger, J., Schiller, B., Stahlhut, C., Kaiser, E. & Gurevych, I. Argumentext: argument classification and clustering in a generalized search scenario. Datenbank-Spektrum 20, 115–121 (2020).
Gretz, S. et al. A large-scale dataset for argument quality ranking: construction and analysis. In Thirty-Fourth AAAI Conf. on Artificial Intelligence 7805–7813 (AAAI Press, 2020); https://aaai.org/ojs/index.php/AAAI/article/view/6285
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
Samuel, A. L. Some studies in machine learning using the game of checkers. IBM J. Res. Develop. 3, 210–229 (1959).
Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994).
Campbell, M., Hoane, A. J., Jr & Hsu, F.-h. Deep Blue. Artif. Intell. 134, 57–83 (2002).
Ferrucci, D. A. Introduction to “This is Watson”. IBM J. Res. Dev. 56, 235–249 (2012).
Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140–1144 (2018).
Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In 5th Int. Conf. on Computers and Games inria-0011699 (Springer, 2006).
Vinyals, O. et al. Grandmaster level in Starcraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
Acknowledgements
We thank E. Aharoni, D. Carmel, S. Fine, M. Levinger, and L. Haas for invaluable help during the early stages of this work. We thank A. Aaron and R. Fernandez for help in developing the Project Debater voice; P. Levin-Slesarev for work on the figures; G. Feigenblat and J. Daxenberger for help in generating baseline results; Y. Katsis for comments on the draft; N. Ovadia, D. Zafrir and H. Natarajan for their sportsmanship; and I. Dagan, I. Gurevych, C. Reed, B. Stein, H. Wachsmuth and U. Zakai for many discussions. We are indebted to the in-house annotators and in-house debaters, and especially to A. Polnarov and H. Goldlist-Eichler, who worked on this project over the years. Finally, we thank the additional researchers and managers from the Haifa, Dublin, India and Yorktown IBM Research labs who contributed to this project over the years, and especially to J. E. Kelly, A. Krishna, D. Gil and the IBM communications team, Epic Digital and Intelligence Squared for their support and ideas.
Author information
Authors and Affiliations
Contributions
N.S. conceived the idea of Project Debater. N.S., Y.B., C.A., R.B.-H., B.B., F.B., L.C., E.C.-K., L.D., L.E., L.E.-D, R.F.-M, A. Gavron, A. Gera., M.G., S.G., D.G., A.H., D.H., R.H., Y.H., S.H., M.J., C.J., Y. Kantor, Y. Katz, D. Konopnicki, Z.K., L.K., D. Krieger, D.L., T.L., R.L., N.L., Y.M., A.M., S.M., G.M., M.O., E.R., R.R., S.S., D.S., E.S., I.S., A. Spector, B.S., A.T., O.T.-R., E.V. and R.A. designed and built Project Debater, with guidance from S.O.-K. and A. Soffer. N.S., Y.B., R.F.-M, and R.A. designed the evaluation framework. N.S., Y.B., and R.A. wrote the paper, with contribution from A. Gera to the In Depth Analysis section. N.S., Y.B., R.B.-H., L.C., L.D., L.E.-D., A. Gera, R.F.-M., S.G., C.J., Y. Kantor, D.L., G.M., M.O., E.S., A.T., E.V. and R.A. wrote the Supplementary Information. Y. Katz led the software engineering of the project. N.S. and R.A. led the team, with D.G. co-leading during the early stages of the project.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature thanks Claire Cardie and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
This file contains Supplementary Information Sections 1-11, including Supplementary Tables 1-3, Supplementary Figures 1-6 and Supplementary References – see contents pages for details.
Supplementary Information
This file contains additional information, including: query_sentiment_lexicon - a lexicon of sentiment words, used as a building block to create queries for sentence retrieval in the claim detection and evidence detection components; action_verb_expansions - a mapping between common action verbs and their syntactic and semantic expansions; claim_verb_phrases - a list of verb phrases commonly found in sentences containing claims; contrastive_expressions - a lexicon of expressions indicating contrast and study_conclusions - a list of phrases (unigrams to 5-grams) that frequently appear in reports of study results and conclusions.
Source data
Rights and permissions
About this article
Cite this article
Slonim, N., Bilu, Y., Alzate, C. et al. An autonomous debating system. Nature 591, 379–384 (2021). https://doi.org/10.1038/s41586-021-03215-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-021-03215-w
This article is cited by
-
The future of cognitive strategy-enhanced persuasive dialogue agents: new perspectives and trends
Frontiers of Computer Science (2025)
-
An introduction to computational argumentation research from a human argumentation perspective
Autonomous Agents and Multi-Agent Systems (2025)
-
Building machines that learn and think with people
Nature Human Behaviour (2024)
-
Effects of Demonstrating Consensus Between Robots to Change User’s Opinion
International Journal of Social Robotics (2024)
-
Argumentation effect of a chatbot for ethical discussions about autonomous AI scenarios
Knowledge and Information Systems (2024)