FEB 15th 2008

The Web as we know it today is an ecosystem of people, documents, machines, and an exponentially increasing amount of unstructured information. Everyone is free to change the landscape of the Web, and millions of us (people, that is) have taken our crack at it, shaping it how we see fit. This generally entails creating our own Web sites, but anyone contributing in any way is actively changing the way the Web is structured. Changes to the Web's structure will only become more obvious and pervasive as we approach the full-scale vision of the Semantic Web.

The Web has survived (perhaps even thrived) because of the simple fact that it is the most fault-tolerant public information system on the planet. We can remove a Web site - a node in the global Web graph - but its removal is unlikely to affect the overall structure of the Web in such a way as to cause damage to its usefulness or availability. In the Semantic Web, making a small change to a single RDF document can have cascading (possibly disruptive) effects on the people, Web sites, and even machine agents that reference it. Therefore, the Semantic Web must also retain this inherent fault-tolerance or be doomed to failure.

We cannot forgot one important fact about the Web: It's made by people, for people. People do dumb things, make mistakes, deceive, spitefully do wrong, and the list goes on. The Web has been able to continue to thrive because it accepts all forms of input; the good, and the bad. Neither can disrupt the state of the Web because the Web is not a central database, but rather as was stated earlier, it's more like an ecosystem of free agents (you and I). The Semantic Web, while often thought of as "the Web as a database," will still need to behave like the current Web.

Problems will arise. There are a few problems that I can think of off the top of my head, a few of which some very bright people may have already come up with solutions for. Here are the problems as I have foreseen:

  • Ontologies changing without applications being aware that a change has been made, and still attempting to operate on the old version of the ontology.
  • A person, Web site, or machine agent that relies on a certain piece of RDF encoded information being available by URI reference, but the information is no longer retrievable due to 404.
  • When relied-upon information is compromised, spammed, or even made dangerously erroneous (perhaps the worst-case scenarios).

Don't be discouraged into thinking these are problems we can't overcome. We can switch out some of the terminology and those three problems surely exist today. In the same ways that we deal with an RSS feed that is no longer available or is full of spam, we must deal with these problems in the future.

Is there a viable solution to the ever changing nature of the Semantic Web? I believe so, perhaps through some kind of versioning system. Ontology versioning can either be accomplished through formal versioning of the ontology by its creator, or even an automatic caching+versioning done by date of use. The author of an ontology can create "version maps" that allow an agent to "walk up or down" the version timeline to always ensure that the meaning between information is preserved. Will we start seeing ontology/vocabulary caches spring up? Probably, but you heard it first.

The Semantic Web will have a funny way of just working out, much like the current Web has. You can't break it, you can't spam the whole thing even if you tried. You also cannot create a pure, error-free Semantic Web. The sooner we embrace the beauty of the Web's chaos, the closer we will be to understanding how we can get there. For further reading along these lines, check out Uche Ogbuji's recent article titled The Semantic Web's Controlled Chaos.

About the author

James Simmons

It's my goal to help bring about the Semantic Web. I also like to explore related topics like natural language processing, information retrieval, and web evolution. I'm the primary author of Semantic Focus and I'm currently working on several Semantic Web projects.

Trackback URL for this entry:

http://www.semanticfocus.com/blog/tr/id/786281/

Spam protection by Akismet

Comments for this entry:

No one has left a comment for this entry. Be the first!

Post a comment

  1. Spam protection by Akismet