5 Problems of the Semantic Web
Published 3 years ago by James Simmons
I like to consider myself fair and balanced when speaking about most topics. To educate the uneducated and to balance things out a bit I have compiled a list of 5 problems we will likely run into when we reach the Semantic Web. Each problem is a side-effect of advances in technology, rushes to fill new niches, or the previous two plus the desire to make a quick dollar.
1. Reduced anonymity on the Web
Unless you're already taking active measures to keep yourself non-indexed you may find that in the Semantic Web information about your identity, interests, and habits are trivial to discover. When you sign up for an account on sites like MySpace, Digg, Slashdot, etc you are feeding them information about yourself during and after registration — with your activities and contributions.
As the amount of available personal information increases we could begin seeing Websites that rely on querying the "Web as a database" for information about its visitors for mission critical functionality. If (once?) this change takes place having personal information on the Web may become the comfortable norm. One day we may see a shift in the importance of anonymity. Openness and transparency may become the "in thing."
2. Increased invasion of privacy
This problem stems from the issue of reduced anonymity on the Semantic Web. A Web that exposes vast amounts of information about everyone has its drawbacks. One downside to having so much information easily accessible to anyone is there will always be someone ready to abuse that information to make a quick dollar.
We may find ourselves in a new era of unwanted personalization. Contextual ads that examine a Website's content for hints of interests may be replaced with ads that target specific visitors based on their personal preferences, behaviors, lifestyle, friends, income, etc. In a similar way we will likely notice that e-commerce Websites will become better at figuring out just what it is we are going to want next.
Invasion of privacy brought about by the abuse of personal information — which would be more accessible than ever — will prove itself quite annoying. But we already have privacy issues now, don't we? If you've ever gotten a spam email then you know the answer is "yes," and there is so much more room for the problem to get worse before things will get better.
3. Intelligent content scraping
The content scrapers of today are really quite simple compared to what we will have to deal with in the Semantic Web. Essentially the scraper will access a Website or feed and extract and store the desired content. In most situations the content scraper must be customized or otherwise manually configured for the Website or feed (less so with feeds as they follow a standard format).
Content scrapers of the Semantic Web and beyond will be equipped with the ability to read the content within Web documents and feeds. Through natural language processing a semantic content scraper can read a blog entry (or several entries by different authors covering the same topic) and return a brand-new blog entry. The scraper would do so by extracting the facts and statements from the entries and regurgitating the information in another order or in entirely different wording.
The technology does not yet fully exist that would give us the ability to do what I described above, however the bottom-up approach to semantic content scraping would be to scrape the content of metadata written in RDF / OWL. The "bottom-up" scraper would not have the ability to extract information from the content in the way that a top-down content scraper (using an NLP agent) could but I expect to begin seeing this soon, if it hasn't already started.
4. Value paradigm shifts
In The value of current datasets in the Semantic Web I suggested that the ability to easily mine new and non-obvious types of data from the Semantic Web will turn information into more of a commodity than any past advancement in technology. Mix that with how simple it will become to access any kind of information and we may find that information is no longer the bottleneck in our development.
Where do we draw the line between commoditized information and information that would be better served as non-commoditized? Does such a distinction matter? Will simply publishing content make it subject to commoditization? Most Websites earn money through visitors clicking on advertisements and generally attract those visitors with their content. If in the Semantic Web any document published is essentially merged into the big picture, will content publishers continue to try to earn money in this way?
Issues of commoditization are already springing up as we continue to explore the usage of feeds to deliver content to readers. Currently there are really only two solutions to the issue and those are embedding advertisements in feeds and publishing partial content to encourage readers to click into your Website and continue reading. It's possible that if we do not develop ways to generate revenue from commoditized content we will never see the Semantic Web come to fruition because it would receive little commercial backing.
5. Vocabulary incompatibilities
The vocabularies we use to classify information are the backbone of the new information frontier. I say this because with these vocabularies we will classify and apply meaning to otherwise meaningless data (meaningless to a machine that is). One problem we're going to run into is when two different people are using two different vocabularies which happen to use the same terms to describe different meanings.
The problem with multiple vocabularies that contain the same terms but apply different meaning to them is that we destroy the author-intended meaning of the information if we attempt to merge the information. That said, it is bad to assume binary compatibility between the meanings expressed in vocabularies. There will be a great need for an open, unified vocabulary in the Semantic Web.
Wrapping it all up
Most of these issues exist today to a lesser extent and I doubt any will be prohibitive of reaching the Semantic Web. After all, each issue comes from the development of new and innovative technologies altering the landscape of the Internet. We'll get through it.
About the author
Trackback URL for this entry:
http://www.semanticfocus.com/blog/tr/id/192539/
Spam protection by Akismet
Post a comment



Posted by Yihong Ding on February 28, 2007 at 8:44pm
Hi James, Good points. The emergence of Semantic Web may truely affect the so-called "privacy." Softwares are going to be smarter to detect the "context" of a person. They are going to automatically analyze individual web users' behaviors. Although the intuition of all these technologies is to help people more effectively search the web, they might be intentionally abused by bad guys. May more semantics make our world be better or worse? This is a question out of the hands of the computer scientists. Yihong
Posted by James Simmons on February 28, 2007 at 11:23pm
Hello Yihong and thanks! My hope is that the Semantic Web and related technologies will only serve to better human-kind. It's true that privacy issues are abound, but we will see this in any situation where information can be abused. Will the Semantic Web improve our lives? I think it will.
Posted by tim finin on March 1, 2007 at 5:42am
> One problem we're going to run into is when two > different people are using two different vocabularies > which happen to use the same terms to describe > different meanings. The semantic web vision grounded in RDF avoids (most of) this problem by using URIs for terms which denote concepts, objects, relations, properties, etc. There is still some room for ambiguity, but no more than in human languages where we might have somewhat different internal mental concepts for a term we have in common, like chair.
Posted by James Simmons on March 1, 2007 at 7:49am
My concern is that using unique URIs to solve the problem of having two of the exact same term (with different URIs) but both have different author-intended meaning behind them isn't going to work, or wasn't intended for that type of disambiguation. On the other hand, I may just be off my rocker and thinking too much into it. What do you think?
Posted by naught101 on April 15, 2008 at 2:50am
Something else to worry about that you missed is the possibility that the semantic web would lead to a simplification of information. There are many ideas - theoretical and philosophical, that, for example, RDF triplification cannot deal with in a useful way (I'm still learning about the ideas behind the semantic web, but it seems like this method of object>information>subject is a strong part of it).
I would also imagine that there's a lot of qualitative information that could very precisely be described with triplification, but might be completely inaccurate - i.e. one person says "racism is good" (there really are people who say this. Scary, huh?). Simple triplifiable sentence. But obviously biased in a way that a computer couldn't yet pickup, and, when combined with the opposite viewpoint ("racism is bad"), completely useless. There are probably lots of less spurious examples too. One I can think of is that two people will often argue over whether an object is blue or green. Although I guess this kind of non-polarised disagreement could be dealt with by averaging the information (ie. this object is blue-green)
Posted by Danny on April 15, 2008 at 4:11am
While the points you raise are generally valid, few are specific to Semantic Web technologies. Most are essentially social problems that are inevitable as communication systems get more sophisticated - the telephone made invasion of privacy easier, copy & paste made copyright violations easier. Data integration on a global scale is a pretty big step forward. But given the flexibility of in what you can say, Semantic Web technologies can generally offer solutions to the new problems (as well as a lot of the old ones :-)
1. Reduced anonymity on the Web
2. Increased invasion of privacy
In the short term there may well be the perception that Semantic Web technologies are causing this kind of problem, by simplifying access to information that's already public. But search engine indexers, spammer's scrapers and so on can already get at anything out there. Longer term, Semantic Web technologies can help with both of these problems by enabling better control of data. One small existing example is the comment whitelisting on the DIG group blog http://dig.csail.mit.edu/breadcrumbs/node/206 which is using FOAF and OpenID in Drupal.
3. Intelligent content scraping
Again, while automatically reassembling content without respect for copyright would be simpler with richer metadata, it'll also be simpler to put controls in place. Right now, with the minimalistic metadata found in feeds it's possible to recombine and republish material, but it's not easy to make explicit statements regarding what the publisher is willing to allow. The work around Creative Commons is a good step in the right direction, I think the work on content labelling around POWDER should help too.
4. Value paradigm shifts
Pretty much anything can be treated as a commodity, that's fairly orthogonal to any technology. Traditional content publishing has already been hit hard by the reduction in friction Web publishing provides. Right now advertising is still mostly done with a broadcast kind of approach, and does tend to be intrusive. But accurately targetted advertising reduces noise and increases signal, ideally to the point where it's no longer advertising but relevant, required information. Having said all that, bear in mind that the Semantic Web is about a lot more than content publication - it can bring benefits to the core of many different kinds of enterprise. Whatever, commerce will have to adapt, and business models are bound to change as economic selection takes place. Incidentally, Tim Berners-Lee has a good example of where opening up data can be in a businesses interests - go to this transcript: http://talis-podcasts.s3.amazonaws.com/twt20080207_TimBL.html
and search for "bookshop".
A related point which is likely to have interesting consequences is the commodification of services. If standards are followed, system components become interchangeable - to everyone's benefit (except makers of closed proprietary systems). One Apache-based Web host is much like another, so vendors have to compete in other areas. As Semantic Web services become commodities that you can get off the shelf (which is already happening, e.g. http://talis.com/platform ), the developer's job gets easier and they're free to focus on their features.
5. Vocabulary incompatibilities
The use of URIs makes it possible to avoid the kind of naming clashes that can occur elsewhere, but as you say incompatibilities may be a problem. However, rather than expecting everyone to agree on a single global ontology, a far more promising approach is to use the distributed nature of the Web. Producers of data can choose the definitions they deem appropriate, and consumers can treat this data locally as they wish. For one given application it may be appropriate to treat terms X, Y and Z as identical, for another just terms X and Z. Because data is likely to be merged and republished, provenance tracking is becoming more and more important. The only answer to this is more, better metadata and more, better tools to manage it.
Posted by autosalvagedealers on May 6, 2009 at 1:23am
Hello to all ! Great site.
auto spares
auto yellow book
buy an electric car
buy a car in uk
but repossed cars
bank repossessed autos
buy a new car or keep the old one
bawtry motor auction
avis car rental