skip to main content
research-article

The YAGO-NAGA approach to knowledge discovery

Authors Info & Claims
Published:20 March 2009Publication History
Skip Abstract Section

Abstract

This paper gives an overview on the YAGO-NAGA approach to information extraction for building a conveniently searchable, large-scale, highly accurate knowledge base of common facts. YAGO harvests infoboxes and category names of Wikipedia for facts about individual entities, and it reconciles these with the taxonomic backbone of WordNet in order to ensure that all entities have proper classes and the class system is consistent. Currently, the YAGO knowledge base contains about 19 million instances of binary relations for about 1.95 million entities. Based on intensive sampling, its accuracy is estimated to be above 95 percent. The paper presents the architecture of the YAGO extractor toolkit, its distinctive approach to consistency checking, its provisions for maintenance and further growth, and the query engine for YAGO, coined NAGA. It also discusses ongoing work on extensions towards integrating fact candidates extracted from natural-language text sources.

References

  1. Eugene Agichtein: Scaling Information Extraction to Large Document Collections. IEEE Data Eng. Bull. 28(4), 2005.Google ScholarGoogle Scholar
  2. Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary G. Ives: DBpedia: A Nucleus for a Web of Open Data. ISWC/ASWC 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni: Open Information Extraction from the Web. IJCAI 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni: Structured Querying of Web Text Data: A Technical Challenge. CIDR 2007.Google ScholarGoogle Scholar
  5. Hamish Cunningham: An Introduction to Information Extraction. In: Encyclopedia of Language and Linguistics, 2nd Edition, Elsevier, 2005.Google ScholarGoogle Scholar
  6. Pedro DeRose, Warren Shen, Fei Chen, AnHai Doan, Raghu Ramakrishnan: Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach. VLDB 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Minko Dudev, Shady Elbassuoni, Julia Luxenburger, Maya Ramanath, Gerhard Weikum: Personalizing the Search for Knowledge. PersDB 2008.Google ScholarGoogle Scholar
  8. Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Artif. Intell. 165(1), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay Jain, Luis Gravano: Towards a Query Optimizer for Text-Centric Tasks. ACM Trans. Database Syst. 32(4), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, Gerhard Weikum: NAGA: Searching and Ranking Knowledge. ICDE 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gjergji Kasneci, Maya Ramanath, Mauro Sozio, Fabian M. Suchanek, Gerhard Weikum: STAR: Steiner Tree Approximation in Relationship-Graphs. ICDE 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Xiaoyong Liu, W. Bruce Croft: Statistical Language Modeling for Information Retrieval. Annual Review of Information Science and Technology 39, 2004.Google ScholarGoogle Scholar
  13. Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-Rong Wen, Wei-Ying Ma: Web Object Retrieval. WWW 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Frederick Reiss, Sriram Raghavan, Rajasekar Krishnamurthy, Huaiyu Zhu, Shivakumar Vaithyanathan: An Algebraic Approach to Rule-Based Information Extraction. ICDE 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sunita Sarawagi: Information Extraction. Foundations and Trends in Databases 2(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Warren Shen, AnHai Doan, Jeffrey F. Naughton, Raghu Ramakrishnan: Declarative Information Extraction Using Datalog with Embedded Extraction Predicates. VLDB 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fabian M. Suchanek, Georgiana Ifrim, Gerhard Weikum: Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents. KDD 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum: YAGO: a Core of Semantic Knowledge. WWW 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fabian Suchanek, Gjergji Kasneci, Gerhard Weikum: YAGO: A Large Ontology from Wikipedia and WordNet. Journal of Web Semantics, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Fei Wu, Daniel S. Weld: Autonomously Semantifying Wikipedia. CIKM 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Fei Wu, Daniel S. Weld: Automatically Refining the wikipedia Infobox Ontology. WWW 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. ChengXiang Zhai, John D. Lafferty: A risk minimization framework for information retrieval. Inf. Process. Manage. 42(1), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Qi Zhang, Fabian M. Suchanek, Lihua Yue, Gerhard Weikum: TOB: Timely Ontologies for Business Relations. WebDB 2008.Google ScholarGoogle Scholar
  24. Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma: Simultaneous Record Detection and Attribute Labeling in Web Data Extraction. KDD 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The YAGO-NAGA approach to knowledge discovery

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGMOD Record
            ACM SIGMOD Record  Volume 37, Issue 4
            December 2008
            116 pages
            ISSN:0163-5808
            DOI:10.1145/1519103
            Issue’s Table of Contents

            Copyright © 2009 Authors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 March 2009

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader