Wikidata - Largest Crowd sources Knowledge Graph - Open Data

Wikidata is one of the many sister project of Wikimedia Foundation Wikimedia Foundation projects

What is a Knowledge Graph

Or a Knowledge Base to be more generic, but we tend to use Graph structure, hence many times used interchangebly as Knowledge Graph. (Again, google calls it Knowledge Graph as Knowledge Graph) When we refer to these, the following concepts are what we have in mind

  • Some sort of formalization in terms of how we are representing our data (Ontology!)
  • Data - in terms of Entities, events, relationships or any other formalization defined
  • Some Functions - in term of functions for maintaince, cleaning, freshness.
  • Some Engine - in terms of a platform that helps us have these functions, make them run on the data we have

Some examples of these knowledge bases:

  • Wikidata
  • Google’s Knowledge Graph
    • Freebase
  • Microsft’s Satori
  • Wolfram Alpha

Wikidata

Wikidata is a document-oriented database, focused on items. Each item represents a topic (or an administrative page used to maintain Wikipedia) and is identified by a unique number, prefixed with the letter Q — for example, the item for the topic Douglas Adams is Q42 — known as a “QID”. This enables the basic information required to identify the topic the item covers to be translated without favouring any language.

wikidata sample

As last update on this page, Wikidata has 57,255,752 data items! If you want get started you self, learn more about how/what and even the why behind wikidata, I would strongly recommend Wikidata Tour which is maintained by the community to help new people get started with Wikidata.

Can we query it?

SPARQL,I mean Yes. Yes you can query Wikidata using SPARQL.

SPARQL = An RDF Query Language

You could go as far as saying its like SQL for data in RDF specifications

Now what is RDF? RDF is basically “Subject - Predicate - Object” triples. RDF = Resource Description Framework RDF image

Wikidata provides a beautiful tool for querying known as Wikidata Query Service

Let build a query now.

  • For the first query, lets simply get a count of people that have a spouse listed. (ever married)
#Query for Wikidata intro
SELECT (COUNT(?item) AS ?count) 
WHERE 
{
  ?item wdt:P31 wd:Q5.
  ?item wdt:P26 ?spouse.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} 

You can do alot with these queries, here is a list of examples queries listed. A must visit page of the internet.

< A live query building session to follow >

Query for Nuclear plants setup per country in last 60 years

Thanks and for further Communications

Thanks for going through this material, hope I was able to help in some form or way.

Any help, updating this page, or something broader would be highly appreciated. Best place to communicate would be in order

Usefullness for audience

  • There is a huge pool of clean, categorized, context-rich data on the web, any ML engineer’s dream. Better sets of data to play with equates to better experiences being built.

  • Learning through practice. I would encourage getting involved in the working of these KG. Including development, maintenance tasks, data curation, discussions and usage.

  • Empower day to day - Use these open data repositories to supercharge your reports, customer experience and operations.