research-article

The YAGO-NAGA approach to knowledge discovery

Authors:
Gjergji Kasneci

Max Planck Institute for Informatics, Saarbruecken, Germany

Max Planck Institute for Informatics, Saarbruecken, Germany
View Profile

,
Maya Ramanath

Max Planck Institute for Informatics, Saarbruecken, Germany

Max Planck Institute for Informatics, Saarbruecken, Germany
View Profile

,
Fabian Suchanek

Max Planck Institute for Informatics, Saarbruecken, Germany

Max Planck Institute for Informatics, Saarbruecken, Germany
View Profile

,
Gerhard Weikum

Max Planck Institute for Informatics, Saarbruecken, Germany

Max Planck Institute for Informatics, Saarbruecken, Germany
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 37 Issue 4December 2008pp 41–47https://doi.org/10.1145/1519103.1519110

Published:20 March 2009Publication History

ACM SIGMOD Record

Abstract

This paper gives an overview on the YAGO-NAGA approach to information extraction for building a conveniently searchable, large-scale, highly accurate knowledge base of common facts. YAGO harvests infoboxes and category names of Wikipedia for facts about individual entities, and it reconciles these with the taxonomic backbone of WordNet in order to ensure that all entities have proper classes and the class system is consistent. Currently, the YAGO knowledge base contains about 19 million instances of binary relations for about 1.95 million entities. Based on intensive sampling, its accuracy is estimated to be above 95 percent. The paper presents the architecture of the YAGO extractor toolkit, its distinctive approach to consistency checking, its provisions for maintenance and further growth, and the query engine for YAGO, coined NAGA. It also discusses ongoing work on extensions towards integrating fact candidates extracted from natural-language text sources.

References

Eugene Agichtein: Scaling Information Extraction to Large Document Collections. IEEE Data Eng. Bull. 28(4), 2005.Google Scholar
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary G. Ives: DBpedia: A Nucleus for a Web of Open Data. ISWC/ASWC 2007. Google ScholarDigital Library
Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni: Open Information Extraction from the Web. IJCAI 2007. Google ScholarDigital Library
Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni: Structured Querying of Web Text Data: A Technical Challenge. CIDR 2007.Google Scholar
Hamish Cunningham: An Introduction to Information Extraction. In: Encyclopedia of Language and Linguistics, 2nd Edition, Elsevier, 2005.Google Scholar
Pedro DeRose, Warren Shen, Fei Chen, AnHai Doan, Raghu Ramakrishnan: Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach. VLDB 2007. Google ScholarDigital Library
Minko Dudev, Shady Elbassuoni, Julia Luxenburger, Maya Ramanath, Gerhard Weikum: Personalizing the Search for Knowledge. PersDB 2008.Google Scholar
Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates: Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Artif. Intell. 165(1), 2005. Google ScholarDigital Library
Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay Jain, Luis Gravano: Towards a Query Optimizer for Text-Centric Tasks. ACM Trans. Database Syst. 32(4), 2007. Google ScholarDigital Library
Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, Gerhard Weikum: NAGA: Searching and Ranking Knowledge. ICDE 2008. Google ScholarDigital Library
Gjergji Kasneci, Maya Ramanath, Mauro Sozio, Fabian M. Suchanek, Gerhard Weikum: STAR: Steiner Tree Approximation in Relationship-Graphs. ICDE 2009. Google ScholarDigital Library
Xiaoyong Liu, W. Bruce Croft: Statistical Language Modeling for Information Retrieval. Annual Review of Information Science and Technology 39, 2004.Google Scholar
Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-Rong Wen, Wei-Ying Ma: Web Object Retrieval. WWW 2007. Google ScholarDigital Library
Frederick Reiss, Sriram Raghavan, Rajasekar Krishnamurthy, Huaiyu Zhu, Shivakumar Vaithyanathan: An Algebraic Approach to Rule-Based Information Extraction. ICDE 2008. Google ScholarDigital Library
Sunita Sarawagi: Information Extraction. Foundations and Trends in Databases 2(1), 2008. Google ScholarDigital Library
Warren Shen, AnHai Doan, Jeffrey F. Naughton, Raghu Ramakrishnan: Declarative Information Extraction Using Datalog with Embedded Extraction Predicates. VLDB 2007. Google ScholarDigital Library
Fabian M. Suchanek, Georgiana Ifrim, Gerhard Weikum: Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents. KDD 2006. Google ScholarDigital Library
Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum: YAGO: a Core of Semantic Knowledge. WWW 2007. Google ScholarDigital Library
Fabian Suchanek, Gjergji Kasneci, Gerhard Weikum: YAGO: A Large Ontology from Wikipedia and WordNet. Journal of Web Semantics, 2008. Google ScholarDigital Library
Fei Wu, Daniel S. Weld: Autonomously Semantifying Wikipedia. CIKM 2007. Google ScholarDigital Library
Fei Wu, Daniel S. Weld: Automatically Refining the wikipedia Infobox Ontology. WWW 2008. Google ScholarDigital Library
ChengXiang Zhai, John D. Lafferty: A risk minimization framework for information retrieval. Inf. Process. Manage. 42(1), 2006. Google ScholarDigital Library
Qi Zhang, Fabian M. Suchanek, Lihua Yue, Gerhard Weikum: TOB: Timely Ontologies for Business Relations. WebDB 2008.Google Scholar
Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma: Simultaneous Record Detection and Attribute Labeling in Web Data Extraction. KDD 2006. Google ScholarDigital Library

Index Terms

The YAGO-NAGA approach to knowledge discovery
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
  2. Machine learning
    1. Machine learning approaches
      1. Rule learning
2. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining

Recommendations

Yago: a core of semantic knowledge
WWW '07: Proceedings of the 16th international conference on World Wide Web

We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO builds on entities and relations and currently contains more than 1 million entities and 5 million facts. This includes the Is-A hierarchy as well as non-...
Read More
YAGO: A Large Ontology from Wikipedia and WordNet

This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million ...
Read More
Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia
EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology

Recent progress in information extraction has shown how to automatically build large ontologies from high-quality sources like Wikipedia. But knowledge evolves over time; facts have associated validity intervals. Therefore, ontologies should include ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGMOD Record Volume 37, Issue 4
December 2008
116 pages
ISSN:0163-5808
DOI:10.1145/1519103
Issue’s Table of Contents

Copyright © 2009 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 March 2009
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 61
  Total Citations
  View Citations
- 700
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The YAGO-NAGA approach to knowledge discovery

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

Yago: a core of semantic knowledge

YAGO: A Large Ontology from Wikipedia and WordNet

Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The YAGO-NAGA approach to knowledge discovery

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

Yago: a core of semantic knowledge

YAGO: A Large Ontology from Wikipedia and WordNet

Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media