OCT 29th 2008

Cross-Pollinating DBpedia and FreebaseNow that Freebase is available as Linked Data a big question that comes to mind is whether these two major projects will move to assimilate one another. DBpedia and Freebase – two endeavors primarily focused on curating unstructured and semi-structured data about everything and releasing it back into the wild (with structure) – get the bulk of their information from Wikipedia, so the amount of topical overlap is assumed to be extremely high. DBpedia gains new information when it extracts data from the latest Wikipedia dump, whereas Freebase, in addition to Wikipedia extractions, gains new information through its userbase of editors.

It is this incredible amount of overlap (with regard to content and purpose) which creates a sort of paradox, where it can be speculated that DBpedia and Freebase would both gain and lose value through efforts to cross-pollinate. Assimilating each other's updates would cause both to become "more complete" (in the same sense that an incrementing number is closer to infinity after each increment), thus gaining value. However, both may lose value as well if "value" is the perception of being "the most complete database about everything." Freebase may see a drop in userbase growth and participation if it becomes a mirror of DBpedia (or vice-versa) and the popularity once garnered by one project may shift towards the other, or away entirely.

This may not be an actual paradox since we're talking about mixing two different perceptions of value (value from the developer's point of view and value from the point of view of the project itself), but we must still look at it from both vantage points. This may simply be another issue of business interest vs. developer interest. All issues regarding popularity and ubiquity aside, cross-pollination is a Good Thing for the purposes of the Semantic Web and Linked Data in general.

About the author

James Simmons

It's my goal to help bring about the Semantic Web. I also like to explore related topics like natural language processing, information retrieval, and web evolution. I'm the primary author of Semantic Focus and I'm currently working on several Semantic Web projects.

Trackback URL for this entry:

http://www.semanticfocus.com/blog/tr/id/600436/

Spam protection by Akismet

Comments for this entry:

  1. Posted by John Bäckstrand on October 30, 2008 at 2:14am

    I see dbpedia more as a RDF-ified version of wikipedia, and freebase as a wikipedia-with-semantics "fork" that is kept in semi-synched. I really see no large benefits from assimilation here. If freebase users wants some specific part of dbpedia in there, they will probably just pull it in. Having the same data as dbpedia does has no real value in itself, other than to make it easier for people to link to the right things.

    I actually doubt there is much in dbpedia that is not in freebase, but I could be wrong.

  2. Posted by James Simmons on October 30, 2008 at 5:13pm

    >>If freebase users wants some specific part of dbpedia in there, they will probably just pull it in.

    That is precisely an example of Freebase assimilating DBpedia, although that may have been the point you were making. I think having the same data as DBpedia would be entirely beneficial to Freebase (and vice-versa), as that would mean extending the coverage of all knowledge.

    I would be surprised if DBpedia and Freebase contained exactly the same information (not including Freebase's user contributions), simply because both projects use their own techniques for extracting information from Wikipedia dumps. As an extreme example, if one project began using NLP to extract facts from article text then we would see very different (and much more) data being extracted from Wikipedia, which the other project would certainly benefit from assimilating.

    This benefit becomes very clear if it is Freebase that takes that step first because their methods are proprietary and DBpedia would not be able to acquire that data on its own. The question on my mind right now though is whether any of this will matter once the data becomes somewhat ubiquitous across the Web.

Post a comment

  1. Spam protection by Akismet