Demystifying the Google Knowledge Graph (Part II)
The contribution of the Knowledge Graph to the advent of the semantic web
In Part 1 of this series, we examined how Google evolved between 2006 and 2012. We also looked at how some of their projects, such as Google Squared and Wonder Wheel, set out to make web queries and search results more intelligent by implementing and relying on the principles of the Semantic Web. In this way, these projects laid the groundwork for Google’s current efforts.
The Semantic Web is a collaborative movement led by the W3C that promotes the restructuring of the web through use of common formats for data in an attempt to organize documents in a framework; a “web of data”. Due to the inclusion of semantic content on websites, and through the use of languages specifically designed for data such as RDF (Resource Description Framework), the Semantic Web can make data available to be accessed and used by enterprises, applications and programs.
While the term “Semantic Web”, also known as Web 3.0, has experienced a resurgence of sorts, the underlying concept has existed since the beginning of the Web. The term was used as early as 1994 by the inventor of the World Wide Web, Tim Berners Lee.
« The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation ».
The Semantic Web relies on the idea of free and accessible information, something that is contrary to the common practices of several big players, such as Amazon, Apple, and eBay. Tim Berners Lee condemned these kinds of practices in his open letter entitled “Long Live the Web,” published in July 2011. He followed this by criticizing what Forrester referred to as ‘Splinternet’, blaming, in equal measure, the proprietary platforms offered by Apple and Nokia, such as tablets and other mobile devices. His sights, however, are clearly set on Facebook, who hold information on a billion individuals worldwide (or approximately 15% of the world’s total population). Larry Page, during his recent interview on Charlie Rose, didn’t hesitate to lash out at Facebook for hoarding user information. Facebook defends itself by claiming that it values and supports personal privacy, but there seems to be inconsistencies in their position as they have made user information available to Yahoo!. The concept of Splinternet is illustrated in the figure below.
To improve the odds of reaching their goal of semantic search, Google chose to rely on a structured database of facts. In addition to leveraging existing information pools such as Wikipedia and the CIA World Factbook, Google purchased Freebase in 2010. Freebase is a collaborative knowledge base that places emphasis on the online community to create and maintain metadata or entities in an effort to support open and free information. Since being acquired, Freebase has grown from 12 million entities to more than 200 million, as of March 15th, 2012; the date on which Amit Singhal published his article on Google+. Furthermore, Google has just invested in Wikidata, the new initiative from the Wikimedia Foundation, the people behind Wikipedia. Wikidata will be a database of structured information extracted from Wikipedia that will be editable and accessible by any person or machine; and certainly by Google.
Google also places emphasis on semantic search as a means to return search results in real time. In 2009, Google tried this by developing a real time search engine with Google Caffeine, but was unable to establish its efficacy. Despite their partnership with Twitter to include Twitter’s most recent status updates in search results, the search engine needed additional elements to process a query; elements that allude to content that is often uploaded almost instantaneously. In the end, the task proved overwhelming and Google was unable to generate pertinent search results to the excessive queries that followed Michael Jackson’s death. See the illustration below.
In addition to having misinterpreted the sudden increase in search, for about 25 minutes some people saw a “We’re sorry” page when they searched Google News, the answers Google was providing for the first result were irrelevant as it was another Michael Jackson that has no relation with the users’ queries.
Analysts then spoke about opportunity of a semantic engine that is more suitable for real-time search. Indeed, during the frenzy that followed the death of Michael Jackson, all engines (Google, Yahoo and Bing included) only Hakia gave an unequivocal result.
Hakia is a semantic engine, kind of Powerset (acquired by Bing in 2008) and Wolfram-alpha which is the current partner of Bing in its semantic search solution. For further details on this epic, I would like to refer you to Nick Burcher’s post which was released at that time.
In June 2010, Google announced the completion of its web indexing system called Caffeine, and in November 2011, they announced a significant improvement to their ranking algorithm that impacts roughly 35 percent of searches (Built upon the momentum from Caffeine). Google has made an update speaking about 10% of changes, but we all know real-time search is huge!
Since that time, Google’s agreement with Twitter expired, and Google launched Google Plus which counts 170 million users actually. Thanks to this social platform delivering recent posts, Google relies on its semantic features to give meaning to all of these interactions and also interpret social signals through other social networks connected to Google Plus accounts.
For further details on the history of the Semantic Web, as well as different definitions such as ontology, schemas, RDF, and HTML5 microdata, I would invite you to read my article titled: Google Knowledge Graph : bienvenue Web sémantique.
In Part 3 of this series, we will outline how the advent of the Knowledge Graph impacts search engine optimization (natural/paid) as well as how to take advantage of the coming changes.
*This article reflects my personal opinion and does not necessarily represent the position of Mediative.