The richness of the data, structures and features within KBpedia make it an effective source for supervised (or 'distant') machine learning. Under distant machine learning, the existing relationships and structure of KBpedia are used to select and aggregate various slices of its content. Entities, attributes, concepts, and relations — and all of the types by which these are organized — are all available for such slicing and aggregation.
There are, for example, more than 33,000 entity types in KBpedia suitable for such purposes. There are more than 20 million instances that can be drawn upon.
These aggregations and their labels can then be used, depending on the nature of the aggregation, as positive or negative training sets for the supervised learners. Other features may be extracted for unsupervised learning to enhance the overall feature pool. Effective reference standards ("gold standards") of selections should also be created at this stage in order to inform precision and recall statistics as various changes to the learners are refined.
Fine-grained aggregations, combined with effectively labeled training sets and standards for testing performance, lead to nearly automatic set-up of the machine learners. The inherent structure and rich feature sets of KBpedia lead to successful machine learning projects at much reduced costs in comparatively fast times. KBpedia shatters current bottlenecks for conventional manual vetting and training.