Some People Will Never Support the Semantic Web
Published 2 years ago by James Simmons
This entry is a response to I will never support the Semantic Web by Brian of d'bug.
I'm getting tired of reading about how the Semantic Web is some kind of pipe dream that will never be realized. The Semantic Web is completely and entirely within our technological reach. People may have been given the impression that we cannot create the Semantic Web because of its complexity, the number of years it has been in development, or even the unanswered questions that still exist for certain problems we will face. These are valid reasons to doubt our progress, but progress is certainly what we are making.
I've read Brian's blog before and find most of what he writes to be interesting. However I'm singling out his post because I feel it may be particularly damaging to the credibility of the Semantic Web to users who don't know enough about it to form an educated opinion. I am also calling him out because he said that he is not a pessimist, but rather a realist who has "done the research" in order to come to his conclusions.
"Supposedly, the assemblage and delivery of this information can be accomplished when responsible bodies not unlike the W3C, can decide on a semantic language."
The Semantic Web relies on more than the semantic language (RDF) created by the W3C. RDF is undoubtedly the core of the Semantic Web, but merely deciding on a language does not bring us to the Semantic Web, nor does throwing all of our information into RDF statements and hoping it will all just come together and "work."
"The Semantic Web will be built to compliment, and eventually replace some current languages and technologies."
Compliment, yes, however the new languages are not meant to replace existing languages. The Semantic Web is a layer atop of the current Web in many ways, including the reliance on the standards we are already using.
"Developers creating Web sites will be responsible for implementing the proper semantic tags, user agents will interpret these tags, infer that certain relationships exist, and provide a set of results."
Yes and no. This is a dangerous statement to throw out there without providing the arguments against that. Does your average blog owner (or any Website owner for that matter) know the inner workings of their content management system? Most do not. In fact, most people do not know basic HTML. It's true that the Webmaster will be responsible for "semantifying" his or her Website, but we shouldn't assume that the content management system would not take care of this. As I've previously suggested, content management systems will help usher in the Semantic Web by doing the legwork for the average site owner.
People are always going to be people. It doesn't matter if we have a Semantic Web that relies on some degree of knowledge engineering (for advanced users) or we have the current Web which relies on the HTML/CSS/JS stack. You don't have to have a degree in Computer Science to have a node in the Web and that cannot change when transitioning between the current Web and the Semantic Web. Part of the reason the Web was successful is because of the low barrier of entry, so we must preserve that.
"It sounds rather nice. Search engines will no longer be necessary, and brilliant personal user agents could even replace browsers altogether."
By no means will search engines no longer be necessary. Search engines will likely always have a place on the Web. Information retrieval is not going to be abolished by the Semantic Web. Don't forget, while the Semantic Web is a Web of interconnected data, not all data is intended to be marked up in triple form. We will still have Websites, and they will still have normal content built by today's standards. This is an often overlooked fact about the Semantic Web.
Why would intelligent user agents replace Web browsers? It almost sounds as though his vision of the Semantic Web is a Web without Webpages.
His first argument is that the Semantic Web will be unable to identify relationships from natural language.
"In other words, there is more than one way to describe something. Aside from strict rules that govern some data formats, like a phone number or mailing address, a description can vary significantly."
Natural language processing and the Semantic Web are two totally different fields of study, and while I have always believed that NLP will help usher in the Semantic Web in many ways (by providing a top-down approach) it has nothing to do with the Semantic Web itself.
He goes on to say that he is considered to be of average size in his area (Houston, Texas) but in some other part of the world (the Philippines was his example) he would be considered a large individual. Interpretation based upon experience and environment will naturally be an issue as it is an issue today. The difference is that with standards like RDF and OWL you can map one ontological meaning to another, and you can map the Houston "average" with the Philippine "large." User agents of the Semantic Web will be able to apply such mappings for people of various locals to give them the correct perspective.
"The Semantic Web will not bring us any closer to this reality because machines are unable to interpret data that is subjective unless complex algorithms are built around it. These are algorithms that already exist today, and are built by humans to search the Web as it exists."
Machines will interpret the data as it is written. If something is labeled as large, then it is large within the context in which it's being read. Part of the purpose of the new standards is to establish correct context for information and avoid unnecessary ambiguity. If you establish an ontological agreement with another node on the Semantic Web then you have taken into account the differences and can map meanings accordingly. This sounds like a long and arduous process, but much of the mapping will be handled by "the crowds" working together much in the way Web 2.0 has brought more power to data by allowing the community to interact with it.
His second and third arguments is that the Semantic Web will be just as susceptible to information manipulation as the current Web, and will fail miserably at identifying trusted sources. This is, in my opinion, probably the only real valid argument in his entire post. Much like today, there will be spam and made-for-AdSense sites, and new problems will arise such as information poisoning (passing bad or malicious semantic data off as good semantic data).
"Even if the Semantic Web could manage the inferred relationships between two sources of information, who is to say that the content presented is accurate? It may be formatted properly, and communicate details and specifications based upon recognized standards, but what "weight" will it be given? Google has a PageRank, and Alexa an Alexa Rating, just to name two. What will the Semantic Web use to disseminate and aggregate data when almost identical relationships are available for consideration?"
PageRank and Alexa Rating are not systems for establishing the trustworthiness of information or its accuracy. New systems must be devised in order to accommodate this. My theory of how the trust layer will work for the Semantic Web is that we will have multiple ranking systems, or multiple layers within one ranking system to accommodate the need to find trusted sources of information. Here are just a few things I think we will need to take into consideration with the trust ranking system:
- Is this source legitimately bringing information to the table and not just spamming?
- What neighborhood does this source of information live in?
- Do a lot of people cite this source as useful?
- Is the source linked to or does it link to bad neighborhoods?
- Does this source have authority in its domain?
- Does the source contain duplicated content from other sources?
If we have a global knowledge database established (likely proprietary, but not necessarily) we can do fact checking and validation as well to ensure that the semantic data is not misleading based on what is already known about certain topics. The ranking and trust systems we see today are not part of the Web itself, but a mechanism created for the Web to keep it sane. In many ways this is how the Semantic Web will work as well.
He goes on to give an example of a shopper who wants to buy a pair of jeans and gives us this information about what the shopper wants:
- Desired product: Jeans
- Price range: $25-$50
- Retailer's physical location: No more than 15 miles from the shopper
- In-store pickup: Required
These requirements are easily met if the information about each retailer in the shopper's area (and the products they carry) are stored with RDF. This is usually called parametric search. From a data standpoint it is simple to accomplish this kind of narrowing-down and can be done without semantic technologies, however his issues are with the trustworthiness of the source.
"Theoretically, a Semantic Web accompanied by a user agent will be able to assist her with this seemingly precise purchase. That is, until she discovers not all merchants are actually selling the item she wants. They are advertising products that competitors have in stock, but do not inform her that she must special order the item."
This example, like most, has no context. What is the starting point of this search? Is it starting from a node such as Yahoo Shopping where all merchants are verified? If we wish to freely query the Semantic Web, we can expect to find information that cannot be trusted. The Semantic Web does not inherently try to reason about the trustworthiness of information, it is the job of the trust layer (in whatever manifestation) to allow some information to pass through while blocking bad information. As technology advances we will develop more sophisticated systems of establishing the trust rank we require.
"Presumably, in the Semantic Web, she will be able to use semantics to tag Web sites as untrusted. When her friend (with whom she has a semantic relationship), decides to purchase an identical pair of jeans, he will be assisted with this additional information. Unfortunately, in her frustration, she has mistakenly tagged several of her friend's favorite retailers as untrusted."
Information about the trustworthiness of a source cannot be based solely on people's opinions and labels. We have learned from that past that people alone cannot be a trusted source of information about trusted sources. We learned this by watching search engines evolve their algorithms to rely less and less on a Webmaster's input about his own site and more and more about its connections and relationships to other sites of authority.
Google cares very little about your meta tags because the person benefiting from them (you) can put anything you want to in them to inflate your own popularity. This makes meta tags an untrustworthy source for ranking pages by search engines, and is the same logic that will be applied to the Semantic Web and Joe User's input about my favorite retailer.
Finally, there is one more thing I want to bring up:
"The Semantic Web only serves to diminish the factors that make us human, and it will characterize our uniqueness through a series of predefined tags. The current Web offers us the ability to express ourselves in an unbound context, and the interpretations of those expressions can never be duplicated by a computer."
The "unbound context" he speaks of is called a lack of structure. That is not a pro, it's a con. We can characterize our uniqueness anyway we wish, because we can create any vocabulary we want in order to express anything we can think of. The Semantic Web is not about using one vocabulary or one ontology, but about the cooperation of nodes to establish ontological agreements. If you feel like you can't express yourself in the Semantic Web, you are given the tools to change that.
About the author
Trackback URL for this entry:
http://www.semanticfocus.com/blog/tr/id/19283/
Spam protection by Akismet
Post a comment


Posted by Brian Reindel on September 13, 2007 at 5:23pm
Hi James, Thank you for taking the time to respond. This is a wonderfully constructed rebuttal with some very valid points. I would like to address a few of these here for your readers. Regarding the personal user agents in my post -- the concept is derived from an illustration by Tim Berners-Lee in Scientific American. Whether or not the access is to Web pages, or simply data (utilizing a specification such as RDF), is unknown from the example he gives. I am still not convinced that an ontology, or subsets of communicating ontologies, would provide an accurate evolving map of terms across a spectrum as broad as the Web. In limited contexts, yes, ontological meaning can be attached to language, which is why the Semantic Web works so well with intranets and intra-applications. It can be confined and maintained, and business rules can be written around it. My fear is that in order for a Semantic Web to be successful, a governing body will need to police how the technology is used, and how information is distributed. At this impasse, I agree it might sound somewhat like a doomsday device, but it is definitely worth raising the concern. It could be that we are not close enough to a Semantic Web to truly understand all the requirements, but the trusted source scenario is also a bit too nebulous for my taste. In my reading, it appears that defining a "trust rank" will not be the sole responsibility of software or a user agent. It will not simply be an interpretation of the data available, but the assessment of trust will in some part belong to the semantic language itself. This may be a misunderstanding on my part, but there is some confusion regarding this portion of the discussion. Especially when discussing semantics with developers who believe that it is almost entirely the responsibility of the language, and not the user agents. I really enjoyed reading your post, and I hope to hear from you again.
Posted by OJ on September 13, 2007 at 8:53pm
It's so good to see two people have a solid argument/discussion about a given topic from different sides and NOT get personal. What a breath of fresh air!
Posted by James on September 14, 2007 at 8:58am
Brian, glad you enjoyed my post :) I noticed you used the word evolving, I assume because we will constantly need to update and change the ontologies we work with. Some will not have a high volume of change, but ontologies covering topics like computer science or any other rapidly changing and expanding topic will definitely need to continue to expand and change. My question is where do you think ontologies will fall short? Do you have an example of a concept found on the Web (or any concepts) that would be difficult? I do believe you're right that we are not close enough to the Semantic Web to truly understand all of its requirements. The core problems are very well written about and understood, such as getting your information into RDF statements. But as you climb the stack, the layer at the very top, the trust layer, is the fuzziest and least talked about. Even if we don't have a solid, tangible solution to establishing trusted sources of information I believe it is all within our grasp. I am hoping that the Semantic Web does not get reduced to a bunch of commercial entities "cooperating" to create the Commercial Semantic Web, or any kind of governing body controlling the big picture for that matter ;) Cheers!
Posted by Mihai Campean on September 14, 2007 at 9:39am
Guys, your discussion is remarkable, and it was the best thing I read today. I am also on the Semantic Web side, and I commented on Brian's post also. It is great when such a debate leads to something constructive in the end. IMO, if the Semantic Web starts to get adopted, even if it has flaws initially, it will represent a leap in how we build our web applications, and it will be the step towards closing the gap between computers and us. I believe it stands in our power to make the best of it and a good standard for representing knowledge in a way that it can be easily processed will be a good evolution.
Posted by James on September 14, 2007 at 10:05am
Thanks Mihai :) I have to say it's refreshing to throw ideas back and forth with someone of an entirely different perspective! I agree that the Semantic Web can forever change how we interact with computers, and I hope we get there sooner than later.
Posted by Cody Burleson on September 14, 2007 at 8:50pm
The Semantic Web promises to infuse the existing Web with a combination of metadata, structure, and various emerging technologies so that machines can derive meaning from information, make more intelligent choices, and complete complex tasks with significantly reduced human intervention. It is a dramatic vision because it will transform the existing Web in devastatingly powerful ways. Yet it is also realistic and obtainable. The vision and examples that Tim Berners-Lee has expressed are not at all far-fetched. I think people have a tendency to take the vision too far into the sci-fi realm, too soon. They imagine the Semantic Web as one vast integrated 'brain'. They equate reasoning or inferencing to human intelligence and they suggest that machines must be able to interpret subjective chatter in an unbounded context. I have to ask what the value of that would be anyway. Who would write a software program that did not have a bound context, that did not integrate to other bound contexts, and that did not work to achieve the end of helping its users gain or keep a known value? The Semantic Web is a broad abstraction that encompasses several derivative concepts. There can and will be many webs within it that use different technologies and different ontologies or information models. They will be integrated in common concepts and contexts based on shared values. These networks will be linked to foreign networks at and because of the value they seek to gain or keep. If someone wants to write an agent that can scrub through all of these interconnections it will be upon that designer to understand and link the models to the extent that it serves their own value. What is so complicated or overly complex about that? All we're talking about here is a new set of standards that people may follow if they choose to do so - and they will choose to do so because it improves their ability to find the content (knowledge they need), to work with people who can achieve mutual benefit in a similar context, to improve applications, and automate processes. We're talking about technologies that improve the current challenges to system integration. We're talking about inferencing capabilities that allow agents to make assumptions based on assertions that have already been made ("that have already been made" is the operative statement). The computer is reasoning, not thinking. That is to say - it is performing logic based on instructions that it has been given. None of this is new to computer science. What will be new is broadening the scale so that solutions are more universal and interoperable across domains. Instead of imagining use-cases about some woman wanting to buy a pair of jeans in a store not 15 miles away or whatever, why not imagine an entire enterprise, an integration of several companies and systems responsible for making those jeans available and getting them on the shelf or to her door? That is not an unbounded context; it is a broad system of interconnection all under one general context aiming for a common value. What is the cotton yield for the year? How does the weather affect that? How does it differ in the regions of different suppliers? From which trusted supplier do we buy our raw materials and based on what demand? Or better still is the promise for medical science. If we can scrub the symptoms of diseases and match them to various possibilities across a broader range of data that has been structured, we may discover more cures. Who stands to gain from structuring this data? Pharmaceutical companies. Governments. Academic and scientific institutions. Corporations large and small. All that being said, I think both the original article and the rebuttal are important. Together, they illustrate where we are with all of this. What I think is most important is that there is a vision to follow at all. Unless and until someone provides a better vision, or until I have better ideas, I am following the leader. Personally I will always support the best idea on the table at any given time unless and until I have a better one. Instead of listing the reasons why the vision will not work, I am listing questions which deserve a hypothesis, a set of experiments, a conclusion - all designed to disprove or prove in the spirit of improving. I leave you with a quote and my conclusion: "Unlike so many of the inventions that have moved the world, this one truly was the work of one man...the World Wide Web is Berners-Lee's alone. He designed it. He loosed it on the world. And he more than anyone else has fought to keep it open, non-proprietary, and free...It's hard to overstate the impact of the global system he created. It's almost Gutenbergian. He took a powerful communications system that only the elite could use and turned it into a mass medium." - Time Magazine Whether one supports the Semantic Web or not will not erase the force or vision behind it. It is not something that may become. It is. The questions is - how will we continue to shape it together? How will we evolve it and overcome its challenges? How can we improve upon the original vision and the underlying technologies? I am glad that we have a vision at all. More than anything, it simply guides us towards evolving the Web to a next level where it can be. TBL did not define that as a blueprint written in stone. He identified a logical evolution, a direction, and gave it a name. As an architect, I can say that it has already helped me tremendously. I have leveraged the vision and the technology to improve my systems and deliver increased value to my clients. As for the problems and challenges that remain? We've been to the moon. We've mapped the human genome. We've built the Internet and networked the entire world. All we're talking about here is making the existing Web more like a database (instead of just a bunch of pages) so that machines can do a better job. The "Semantic Web" is already working for me. And I am working to improve it.
Posted by Cody Burleson on September 14, 2007 at 8:53pm
P.S... please forgive the unformatted text above. I thought my line breaks would hold as they do in some wikis. Sorry :-(
Posted by James on September 15, 2007 at 12:24am
@Cody: Thanks for the incredible response, you said some pretty inspiring things. It's good to find someone that can defend the vision of the Semantic Web that well ;) Sorry for the text formatting, I'm rolling out a new version of the site within the next week (big big changes) and there will be proper text formatting, including markup.
Posted by Andreas on September 19, 2007 at 5:56am
I think the Semantic Web is already here: http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData Two billion triples is quite a good start to work with, I think.
Posted by James on September 19, 2007 at 8:36am
I'm interested to see what the LinkingOpenData project leads to. They are forming a lot of connections between a lot of data, more than has ever been linked before on such a massive scale and between so many providers (at least in the realm of semantics).