Webscaled - Dataset Marketplace
NOV 1st 2007

Image credit: Node GardensThe more we use the Internet, the more we realize the necessity of finding new solutions to better organize the growing mass of information. Today we actually have a certain number of tools to add meaning to the information that we drop all over the Web. Adding a comprehensible meaning to computers, allowing them to help us better organize things. That's the big idea behind the Semantic Web, an idea which appears more and more obvious to us everyday. In this field, we already have many advanced technologies, starting with those offered by the W3C itself: XML, RDF, OWL, etc.

But I have the feeling that something important is still missing to allow the Semantic Web to really take off. To understand what it is we have to take into account the fact that the Web is in permanent change and data is continuously added, modified or deleted. The fact of having semantic information doesn't change anything, it just challenges the search engines more everyday to crawl the whole Web constantly in order to detect the slightest change.

It would be so much more efficient (and simple) if Web sites could alert the search engines themselves: "Hi Google, I've changed, please visit me!". Furthermore, it would strongly reduce the indexing time, until we come up to "real time search engines." You might say that Google is in charge, that they earn enough to endlessly increase their infrastructure. You'd be wrong to think that because it is not only Google's business. From the expansion of the Semantic Web should emerge various specialized Web sites in the need of aggregating an important mass of information focused on one field but able to appear anywhere in the Web. So what, everybody's going to crawl separately?

But on top of the issue regarding the search engines, there is so much more. Let's imagine that I want to make an online address book. The easiest way consists of storing the data myself; I will save my addresses in any database and that's it. There is another approach, a little bit more complicated but so much more interesting: you can store some kind of links making it possible to get the informations back at the source at any moment. In my address book, for instance, I will have a record directly connected to my favorite restaurant. This restaurant has a Web site which includes exactly what I'm looking for: the restaurant exposes their details in a way that my address book recognizes, as the hCard format for example. Therefore I will subscribe my address book to this Web site.

Then, thanks to the subscription, the restaurant's details would be available "on demand" and show in my address book. Of course, we can optimize the process by keeping a copy of the information in a local database. But let's make it clear, it is only a temporary copy, a kind of "cache" if you want. The real data stays at the source, on the restaurant's Web site. If a change occurs (e.g. the restaurant moves) changes will be automatically reported in my address book.

This "data subscription method" seems to be an interesting way to reach a kind of decentralized database able to work on a worldwide scale. But there is a much more essential aspect: the idea of "backlinks." Actually, a subscription comes down to weaving a bidirectional link from one piece of information to another. This very small concept actually has enormous consequences. The computer would now be able to determine how data is connected to each other and suddenly become a lot more intelligent.

Let's take a look at another example of understand exactly what's going on. What if our restaurant is willing to collect comments from the customers, to show them in its Web site in a sort of visitor's book for instance. The restaurant would simply add a form on its Web site so the customers could save their comments.

But the restaurant's owner is a lot more ambitious, he also wishes to show the comments that have been stored somewhere else, on other Web sites. Either gastronomic critics or miscellaneous opinions given on the Web, our restaurant's owner would like to display all of them on his own Web site. Unfortunately, there is no existing easy solution to accomplish this according to the current Web. There is no easy way to search all the information related to our restaurant.

This is when the concept of "backlinks" could be very useful. Actually, chances are that the miscellaneous comments spread on the Web already include a link to the restaurant's Web site. But unfortunately in our old Web, those links are one-way. The restaurant's Web site doesn't even know about them, unless it makes a "link:" request with a search engine or if it considers the "HTTP Referers," but it stays unsatisfactory (in the world of blogs, there is something called the Pingback protocol). No matter what, let's say that the links are bidirectional: when a comment would be posted somewhere, the restaurant's Web site would be alerted in order to store the corresponding "backlink." Finally you would only have to go up to the different links and find all the comments linked to our famous restaurant.

The relational databases don't operate differently, the "backlink" concept is prevailing and we couldn't imagine it in a different way. But this isn't how the Web works, is it good or bad? I cannot say... However, if we want to someday achieve the World Wide Database dream, I think we should seriously consider the use of mechanisms for bringing subscriptions and "backlinks," therefore allowing the semantic information to really "exist."

Trackback URL for this entry:

http://www.semanticfocus.com/blog/tr/id/443621/

Spam protection by Akismet

Comments for this entry:

  1. Posted by Yihong Ding on November 1, 2007 at 9:18am

    Manuel,

    The idea of Object Oriented Web is interesting, and I am looking forward to the following parts of your series.

    I agree that backlinks are very much useful. But I am just a little bit worried about how you are going to implement these backlinks. In fact, the idea of backlinks has been thought from the very beginning when web links were invented. But it is always a difficult thing to do. It involves two main difficulties.

    First, how could the inbound side know that it is linked? You mentioned a term called "subscription." Although this is a workable method, it is really tedious to link a site while at the same time subscribing, at least with the current technology. Do you have a satisfactory solution to this problem?

    Second, a deeper concern is that backlines violate the basic structure of web links, i.e. one outbound and one inbound. A normal link only has one target and it represents one typical meaning in a web document. Backlinks, however, are from one to many and many of the backlinks have the same meanings, while some others are not. This complexity of semantics makes the issue of backlinks be extremely difficult. It is no longer just about subscription, but also about the complexity of various subscriptions and the automated methods on disambiguating the semantics. In the global scale, you can see how difficult this knowlege management problem could become.

    I wish to see how you might adjust these concerns in your future posts. Anyway, this is a good discussion about the future web. Keep on going!

    best,

    -- Yihong

  2. Posted by Manuel Vila on November 1, 2007 at 11:15am

    Thanks Yihong for your encouragement! :) You are actually quite right, my vision doesn't fit the existing web at all! My approach is to forget everything that exists today in order to reinvent the web. I am not sure what will emerge, for now I have just the premonition that it might be interesting.

  3. Posted by Elliot Turner on November 1, 2007 at 9:55pm

    Great thoughts; I agree that we need more bidirectional capability within the web. The "web of static documents" mindset can only take us so far. Subscriptions offer many benefits, but significant drawbacks as well. Scalability concerns are hard to overcome, as are many of the legal issues that can complicate hierarchical publish-subscribe systems. Some attempts have been made at defining standardized web-based subscription mechanisms, WS-Notification being one of them. None of these have gotten much traction outside of very narrow applications, however.

    The concept of "back links" is interesting. I personally feel something along these lines will end up 'winning the day' -- lightweight approaches that are easy to adopt, and usable when annotating "dirty" datasets.

  4. Posted by jay flights on December 31, 2008 at 2:38pm

    well, the beauty of the web is intos grossness - and this is also its main actuality. The gross combined with an attempt to get that grossness on a peice of napkin on your dinner table. This is what a search engine attempts to do - but fails. Beacuse it cannot understand. The serach engine is a kid - or bettr a puppy - it has a basic understanding structure. It is quick, rapid, and quick, but it has a very less brain - it does not understand the human tounge. This is where the semetic web comes to for - well the microsoft's aquired powerset, but it is not sementic at all - use it and you would know - it searches wikipedia only - let this not be a hinderance - but it is not semetic - oh no - it has not yet crossed the threshold age of babyhood too.

  5. Posted by oto aksesuar on February 12, 2009 at 4:28am

    Oto Aksesuar, Oto Paspas, Oto Cam Rüzgarl???, Oto Koltuk K?l?f?, Ampul, Xenon, Kokpit Gö?üs Ve Maun Kaplama Tuning Far ve Stop omsa,demircio?lu ,demircio?lu oto ,araba

  6. Posted by James on February 18, 2009 at 2:43am

    Backlinks is a good way to interlink sites wit each other. But I agree with the issues raised by Yihong Ding in previous posts.

    I came to this blog first time and am very interested to see such a good material here.

  7. Posted by get backlink on February 18, 2009 at 7:34pm

    To get backlink is really a big deal from high PR websites.
    better to increase your backlink by regular directory submission and blog submission.

  8. Posted by febin on April 3, 2009 at 4:42am

    Hey,

    Have you guys got any chance to look in to Semandeks http://semandeks.com.

  9. Posted by Mark on May 11, 2009 at 3:59am

    I agree that backlinking is an imperitive part of website seo right now. Quality backlinks will put your site quickly on the map, providded you atleast still provide some good content. Here is something I saw on a forum that might help out. http://marketplace.sitepoint.com/listings/66256

  10. Posted by sohbet on May 31, 2009 at 8:32pm

    thanks good site.

    ----
    Sohbet

    chat

  11. Posted by BackLinkStat on June 9, 2009 at 2:25am

    Try BackLinkStat.com - Get detailed backlinks report of your site for FREE!

  12. Posted by backlinks on June 23, 2009 at 5:18pm

    nice backlinks is that

  13. Posted by backlinks on October 6, 2009 at 4:23pm

    Thanks a lot for this meaningful article...

Post a comment

  1. Spam protection by Akismet