OCT 30th 2008

The Seesaw Effect of Algorithms vs. DataOver the years I've noticed that the importance of algorithms and data tends to shift back and forth, depending on which at the time is hardest to duplicate (often from a business perspective). This effect seems to be caused by the availability or demand of one side increasing or decreasing, shifting the balance of importance to the other. At one point the world of software was dominated by the proprietary. The organization with the best software (backend, algorithms, etc) was the dominant entity and data (from say, a Web 2.0 perspective) was generally not the focus. This may have partly been the responsibility of a mindset formed during an era with very little storage space and before mass user activity on the Web.

Things have changed and the word proprietary has become a sort-of developer faux pas. Open source has caused a paradigm shift away from the old proprietary software models and has allowed organizations to focus their attention on the other side of the equation: data. As a result of this shift we saw the start of the Web 2.0 era (perhaps with a few years of padding before the phrase started floating around). Now many organizations focus on the data they acquire and how they can leverage it to their advantage. As a result we see many walled gardens in an attempt to preserve this advantage.

However we may be seeing another shift, this time back to software once again. The Semantic Web calls for making data open and ubiquitous. This is a strong paradigm shift away from the walled garden mindset (and most people understand this, especially the business set). After writing about the cross-pollination of DBpedia and Freebase it occurred to me that the project with the most advanced proprietary information extraction algorithms would in a sense be the "dominant" project because it would be able to leverage its software in a space where data is becoming a commodity.

Freebase has a secret sauce and that is probably their biggest advantage over competing projects. In the Semantic Web/Linked Data Web/Web 3.0 (whatever we feel like calling it at the time), data may decrease in value as it spreads and becomes more commoditized; at least in the original sense of value it once had: as a tool that only the walled gardens could leverage.

We are seeing the walls come down, possibly to be replaced once again by proprietary algorithms.

About the author

James Simmons

It's my goal to help bring about the Semantic Web. I also like to explore related topics like natural language processing, information retrieval, and web evolution. I'm the primary author of Semantic Focus and I'm currently working on several Semantic Web projects.

Trackback URL for this entry:


Spam protection by Akismet

Comments for this entry:

No one has left a comment for this entry. Be the first!

Post a comment

  1. Spam protection by Akismet