NOV 1st 2007

Image credit: Node GardensThe more we use the Internet, the more we realize the necessity of finding new solutions to better organize the growing mass of information. Today we actually have a certain number of tools to add meaning to the information that we drop all over the Web. Adding a comprehensible meaning to computers, allowing them to help us better organize things. That's the big idea behind the Semantic Web, an idea which appears more and more obvious to us everyday. In this field, we already have many advanced technologies, starting with those offered by the W3C itself: XML, RDF, OWL, etc.

But I have the feeling that something important is still missing to allow the Semantic Web to really take off. To understand what it is we have to take into account the fact that the Web is in permanent change and data is continuously added, modified or deleted. The fact of having semantic information doesn't change anything, it just challenges the search engines more everyday to crawl the whole Web constantly in order to detect the slightest change.

It would be so much more efficient (and simple) if Web sites could alert the search engines themselves: "Hi Google, I've changed, please visit me!". Furthermore, it would strongly reduce the indexing time, until we come up to "real time search engines." You might say that Google is in charge, that they earn enough to endlessly increase their infrastructure. You'd be wrong to think that because it is not only Google's business. From the expansion of the Semantic Web should emerge various specialized Web sites in the need of aggregating an important mass of information focused on one field but able to appear anywhere in the Web. So what, everybody's going to crawl separately?

But on top of the issue regarding the search engines, there is so much more. Let's imagine that I want to make an online address book. The easiest way consists of storing the data myself; I will save my addresses in any database and that's it. There is another approach, a little bit more complicated but so much more interesting: you can store some kind of links making it possible to get the informations back at the source at any moment. In my address book, for instance, I will have a record directly connected to my favorite restaurant. This restaurant has a Web site which includes exactly what I'm looking for: the restaurant exposes their details in a way that my address book recognizes, as the hCard format for example. Therefore I will subscribe my address book to this Web site.

Then, thanks to the subscription, the restaurant's details would be available "on demand" and show in my address book. Of course, we can optimize the process by keeping a copy of the information in a local database. But let's make it clear, it is only a temporary copy, a kind of "cache" if you want. The real data stays at the source, on the restaurant's Web site. If a change occurs (e.g. the restaurant moves) changes will be automatically reported in my address book.

This "data subscription method" seems to be an interesting way to reach a kind of decentralized database able to work on a worldwide scale. But there is a much more essential aspect: the idea of "backlinks." Actually, a subscription comes down to weaving a bidirectional link from one piece of information to another. This very small concept actually has enormous consequences. The computer would now be able to determine how data is connected to each other and suddenly become a lot more intelligent.

Let's take a look at another example of understand exactly what's going on. What if our restaurant is willing to collect comments from the customers, to show them in its Web site in a sort of visitor's book for instance. The restaurant would simply add a form on its Web site so the customers could save their comments.

But the restaurant's owner is a lot more ambitious, he also wishes to show the comments that have been stored somewhere else, on other Web sites. Either gastronomic critics or miscellaneous opinions given on the Web, our restaurant's owner would like to display all of them on his own Web site. Unfortunately, there is no existing easy solution to accomplish this according to the current Web. There is no easy way to search all the information related to our restaurant.

This is when the concept of "backlinks" could be very useful. Actually, chances are that the miscellaneous comments spread on the Web already include a link to the restaurant's Web site. But unfortunately in our old Web, those links are one-way. The restaurant's Web site doesn't even know about them, unless it makes a "link:" request with a search engine or if it considers the "HTTP Referers," but it stays unsatisfactory (in the world of blogs, there is something called the Pingback protocol). No matter what, let's say that the links are bidirectional: when a comment would be posted somewhere, the restaurant's Web site would be alerted in order to store the corresponding "backlink." Finally you would only have to go up to the different links and find all the comments linked to our famous restaurant.

The relational databases don't operate differently, the "backlink" concept is prevailing and we couldn't imagine it in a different way. But this isn't how the Web works, is it good or bad? I cannot say... However, if we want to someday achieve the World Wide Database dream, I think we should seriously consider the use of mechanisms for bringing subscriptions and "backlinks," therefore allowing the semantic information to really "exist."

Trackback URL for this entry:

http://www.semanticfocus.com/blog/tr/id/443621/

Spam protection by Akismet

Comments for this entry:

  1. Posted by Yihong Ding on November 1, 2007 at 9:18am

    Manuel,

    The idea of Object Oriented Web is interesting, and I am looking forward to the following parts of your series.

    I agree that backlinks are very much useful. But I am just a little bit worried about how you are going to implement these backlinks. In fact, the idea of backlinks has been thought from the very beginning when web links were invented. But it is always a difficult thing to do. It involves two main difficulties.

    First, how could the inbound side know that it is linked? You mentioned a term called "subscription." Although this is a workable method, it is really tedious to link a site while at the same time subscribing, at least with the current technology. Do you have a satisfactory solution to this problem?

    Second, a deeper concern is that backlines violate the basic structure of web links, i.e. one outbound and one inbound. A normal link only has one target and it represents one typical meaning in a web document. Backlinks, however, are from one to many and many of the backlinks have the same meanings, while some others are not. This complexity of semantics makes the issue of backlinks be extremely difficult. It is no longer just about subscription, but also about the complexity of various subscriptions and the automated methods on disambiguating the semantics. In the global scale, you can see how difficult this knowlege management problem could become.

    I wish to see how you might adjust these concerns in your future posts. Anyway, this is a good discussion about the future web. Keep on going!

    best,

    -- Yihong

  2. Posted by Manuel Vila on November 1, 2007 at 11:15am

    Thanks Yihong for your encouragement! :) You are actually quite right, my vision doesn't fit the existing web at all! My approach is to forget everything that exists today in order to reinvent the web. I am not sure what will emerge, for now I have just the premonition that it might be interesting.

  3. Posted by Elliot Turner on November 1, 2007 at 9:55pm

    Great thoughts; I agree that we need more bidirectional capability within the web. The "web of static documents" mindset can only take us so far. Subscriptions offer many benefits, but significant drawbacks as well. Scalability concerns are hard to overcome, as are many of the legal issues that can complicate hierarchical publish-subscribe systems. Some attempts have been made at defining standardized web-based subscription mechanisms, WS-Notification being one of them. None of these have gotten much traction outside of very narrow applications, however.

    The concept of "back links" is interesting. I personally feel something along these lines will end up 'winning the day' -- lightweight approaches that are easy to adopt, and usable when annotating "dirty" datasets.

Post a comment

  1. Spam protection by Akismet