Revisiting: Does the World Need a Metadata Extraction Service?
Published 6 months ago by James Simmons
11 months ago I posted a short entry that posed the question of whether the world needed a metadata extraction service. I stated that the service could quickly become the largest repository of metadata (in the form of named entites and facts) on the Web if it stored the resulting metadata from each request. Open Calais seems to me to be the "metadata extraction service" I had in mind; it's is a Web service that allows you to automatically annotate content and extract information like facts and named entities (people, places, and organizations, and much more) from unstructured text. If that weren't enough of a good thing, Open Calais returns the metadata in RDF.
Although the question of whether we need it still hasn't been answered, I believe this service could be a catalyst for change towards Semantic Web standards if it is integrated into (or used to create plugins for) the multitudes of open source blogs and other CMS software. Open Calais opens the door to the possibility of lowering the barrier enough for everyday users to publish semantic content.
About the author
Trackback URL for this entry:
http://www.semanticfocus.com/blog/tr/id/772262/
Spam protection by Akismet
Post a comment


Posted by Carver on February 18, 2008 at 4:01pm
I think the idea of a public repository for semantic tags is a great idea, but it also raises the issue of agreed upon taxonomies/ontologies? I have just started learning about the Semantic Web, so I apologize if I am getting this wrong, but before a public repository can be created doesn't a larger system of taxonomies and ontologies have to be created so people don't begin creating conflicting hierarchies and associations? Also, isn't the ultimate idea of a Semantic Web to have engines that parse data for us so humans don't have to spend the time tagging data themselves? Thank you,
- Carver
Posted by James Simmons on February 18, 2008 at 5:42pm
Hi Carver,
>>...before a public repository can be created doesn't a larger system of taxonomies and ontologies have to be created...
The idea behind the Semantic Web is that there won't be "one true ontology to rule them all," so the metadata in the repository would use existing ontologies from the Web.
>>...so people don't begin creating conflicting hierarchies and associations?
My honest answer to the problem you've just stated is "I don't know." I'm simply unsure how we will deal with conflicting ontologies. The closest thing I believe we have going for us is a system for mapping the semantics between disparate ontologies, and to that end I am not sure what the latest progress is.
This is definitely one of the THE biggest issues of the Semantic Web, IMO. If there is another reader out there that can point to a plan of action for this scenario I would like to see it.
>>Also, isn't the ultimate idea of a Semantic Web to have engines that parse data for us so humans don't have to spend the time tagging data themselves?
Yes, you are quite correct. In this case, Open Calais will be run on the backend of say, a blog, and extracted information can be presented back to the blog software to expose how it sees fit. Under no circumstances should laymen be required to learn RDF. :)
Posted by Stuart Robinson on February 27, 2008 at 3:50pm
Nice article. One minor complaint, though. You write: "Open Calais opens the door to the possibility of lowering the barrier enough for everyday users to publish semantic content." The phrasing here is odd. Why? Because nearly everything we publish has semantic content . It's just that it doesn't have formally marked up semantic content. But I'm splitting hairs, of course...