Formulating the KBpedia Knowledge Structure



CORALVILLE, IA, April 30, 2015 -- The fortunate thing when we started the Cognonto venture was that we had substantial platform and API experience under our belts with the Open Semantic Framework. OSF was a proven approach that we could improve on in a second generation by dropping the reliance on Drupal and by re-writing its underlying APIs in Clojure.

The real nub of the Cognonto venture, however, was in its underlying knowledge structure. After its beginnings in Wikipedia, and with promise from Wikidata, we saw that we could do meaningful integration based on our existing Cyc and UMBEL roots. But we also saw flaws with that approach: Cyc was limited in some areas in comparison to Wikipedia, and UMBEL was designed for a purpose other than knowledge-based artificial intelligence (KBAI). Further, were we even capturing the right fundamental building blocks in the Cognonto venture?

Cyc brought us a common-sense view of the world; Wikipedia provided us global scope and crowdsourced review; Wikidata brought us instance data; UMBEL brought us a more tractable way to handle the complexity of the knowledge graph, especially through its emerging, relatively flat "typology" structure; and DBpedia brought us useful structure that we could process. What logic could we point to guide us about how to construct this new knowledge structure?

Actually, the answer was fairly simple. The answers to these questions reside in the purpose of the venture: To structure existing knowledge to best support artificial intelligence and machine learning. None of the existing sources fully met this purpose. But we had amazing structure and content within our sources to embrace this purpose, even with only portions of the contributing sources. We also saw the importance of geographical information to the endeavor, and thus also promoted GeoNames with its 10 million geo-located entities as one of the core knowledge bases. Our design thus began with six KBs, each contributing schema, instances, relationships or data records. We are supplementing that with a further 20 standard external ontologies and vocabularies, such as (supported by the major search engines), to promote interoperability with the broader Web and content.

In keeping with the heritage of Wikipedia and structured DBpedia, we named this central knowledge structure 'KBpedia', an amalgam of the best features and strengths of the contributing six knowledge bases. Yet we still needed an overriding schema for the entire structure and its linkages. From a coverage standpoint, the concepts and ideas came mostly from Wikipedia and Cyc, supplemented by GeoNames in the geolocational realm. From an instance standpoint, instance records and data were derived from Wikidata, Wikipedia and GeoNames. From a data characterization or attributes standpoint, the major contributors were DBpedia and Wikidata. UMBEL provided a simpler means for organizing all of these concepts and entities.

Still, all of these contributors required some form of integrative mindset. Thus, the final structure of KBpedia, as governed by the KBpedia Knowledge Ontology (KKO) or knowledge graph, follows the triadic logic of Charles Sanders Peirce. Peirce's fundamental idea is that objects are only perceived via icons, indexes or symbols, and even then meaning comes from how these signs are interpreted. Multiple perspectives and interpretations are thus inherent in any thing, and fallible truth comes about by truth-testing and subjecting those "truths" to community review and concurrence. The scientific method is one of the purest expressions of this approach.

The logical mindset of Peirce provides a powerful way to look at any domain, try to explain it, and then generate new ideas and knowledge when observations (the interpretations) do not conform to the current world view. The Peircean mindset also provides a powerful means to organize (categorize) knowledge at every level. For any given topic, there are potentials, realized particulars from those potentials, and generalities and implications that may be drawn from these categorical aspects. Simpler knowledge and structure begats more complicated knowledge and structure, all subjected to truth-testing, in an approach that is quite fractal in nature.

Most things in the world, from fictional things to cameras to people to rivers, can be placed into relatively simple hierarchical structures of "types", which in various aggregates may be represented as mostly distinct "typologies". These typologies, in turn, can be related to the overall structure of knowledge and concepts through this Peircean triadic view. The net result is a logical, simply organized, computable structure that provides a tie-in point for any thing or concept imaginable.

This simple triadic structure, overlaying these various "supertypes" of aggregate things, is what now forms the KBpedia Knowledge Ontology, or KKO. It is clean and simple and logical, unlike any prior upper-level ontology attempting to capture the breadth of human knowledge. Completion of the KBpedia knowledge structure, its integration of six knowledge bases, and its logical schema as KKO, needs to be a priority getting Cognonto ready for commercial release.

Note: This release was written for internal documentation purposes and was not publicly released until the inauguration of the formal Cognonto venture.