Does the World Need a Metadata Extraction Service?
Published 1 year ago by James Simmons
The other day I was thinking, wouldn't it be interesting to see a site come out that essentially acts as a broker or mirror of metadata from other sites? You could go to this site, enter a URL and have the metadata from that page presented to you in clean, crisp XML. It would be even better if this was turned into a Web service and the API was free for anyone to use. I would imagine there would be quite a bit of mashing potential!
The site could quickly become the largest repository of metadata on the Web if it stores the URL and resulting metadata from each request. This would allow it to gain traction quickly because it would not rely solely on crawling the Web for more URLs to feed upon. I think this has potential but who knows, this may just qualify as the feature of a larger system. Let me know what you think.
About the author
Trackback URL for this entry:
http://www.semanticfocus.com/blog/tr/id/824015/
Spam protection by Akismet
Post a comment


Posted by Patrice on March 6, 2007 at 5:18am
Hello James, We should just start such a site ;-) Indeed, the other day, I was thinking about something that might be included within this concept. But, first of all, I think that, for instance, the Technorati Tags are a basic implementation of such a concept with the following limitations: 1. The metadata is just a simple tag 2. The metadata is not fixed, you can invent new Technorati Tags without any constraints 3. The association between the page and the metadata is store/configured in the page On my side, I was thinking about something similar, but instead of a simple tag, the metadata would be another URL. Indeed, the power of the Web 1.0 was the idea of (hyper)linking pages with each others (it is still working today, by the way!). But, the author of a page does not include all the possible hyperlinks, of course... Indeed, in my daily surfing, when I read a page, next to click on existing hyperlinks, I often have to cut-and-paste a (sequence of) word(s) into the Google Search Box in order to complete my understanding of the subject. Or, I go back to my history, because what I am reading is related to something I have read a couple of days ago… But, when I have done that, this work, i.e. these new connections, is lost for others :-( It would be nice if we are able to all share the “additional� links we are discovering between existing pages. What do you think? /Patrice
Posted by James Simmons on March 6, 2007 at 9:25am
That's an interesting thought and I wish we could easily share the links (semantic links, not hyperlinks) we make between Web documents. I think it would prove to be a much more effective method of discovering connections. The system would allow people to find out more information about a topic than they might have through a traditional keyword search. I'm interested in writing more on this topic to develop a clear idea of how the system would work.