Tag: Information Extraction

The Calais Initiative is almost one month old, and they've already received a large and welcoming response from the development community (1,113 early adopters)! When they weren't busy doing interviews or answering hundreds of emails and forum posts, they were coming up with ways to help spread the technology. They will soon be releasing a Wordpress plugin, followed by plugins for Drupal, Plone and other content management systems. They also express that Calais is not only good for named entity extraction, but can extract other facts from documents. An example they give is "what technologies are associated with what company in a document?" Good luck, Calais team!

Tags:

11 months ago I posted a short entry that posed the question of whether the world needed a metadata extraction service. I stated that the service could quickly become the largest repository of metadata (in the form of named entites and facts) on the Web if it stored the resulting metadata from each request. Open Calais seems to me to be the "metadata extraction service" I had in mind; it's is a Web service that allows you to automatically annotate content and extract information like facts and named entities (people, places, and organizations, and much more) from unstructured text. If that weren't enough of a good thing, Open Calais returns the metadata in RDF.

Although the question of whether we need it still hasn't been answered, I believe this service could be a catalyst for change towards Semantic Web standards if it is integrated into (or used to create plugins for) the multitudes of open source blogs and other CMS software. Open Calais opens the door to the possibility of lowering the barrier enough for everyday users to publish semantic content.

Tags:

Open Calais - a new and smart API from Reuters - finally does what critics say to be the greatest obstacle to the Semantic Web: Taking the metadata burden from the end-user by providing an automatic meta-tagging tool. The principle behind Open Calais is easy: Put in some unstructured text and get in return nicely structured RDF-data. Backed by powerful Text Mining and machine learning techniques the API automatically detects entities like persons, events, countries and other facts.

Open Calais takes account of the fact that the added value of content is hidden in its structure. Uncovering that structure and representing it in a interoperable format makes existing resources more programmable and reusable.

But what is in for Reuters? Nothing less than the biggest structured content repository on the web. Should not we talk about this little fact as well?

Tags:

For just about every area of research there exists documents online describing background information or techniques to accomplish a task in that domain of research. These documents are often referred to as white papers, provided their content is of technical or research orientation. The information held within white papers is essentially accessible by humans only because machines are not able to read and comprehend text in the same way humans can. If machines were able to read white papers and extract information in the same way humans can we would be able to store each fact and piece of knowledge from the documents. This method of indexing would facilitate much more detailed searches, allowing users to search by topic, theory, conclusion, methods, citations, references, etc.

Continue reading Extracting Information from White Paper Text

Tags:

The other day I was thinking, wouldn't it be interesting to see a site come out that essentially acts as a broker or mirror of metadata from other sites? You could go to this site, enter a URL and have the metadata from that page presented to you in clean, crisp XML. It would be even better if this was turned into a Web service and the API was free for anyone to use. I would imagine there would be quite a bit of mashing potential!

Continue reading Does the World Need a Metadata Extraction Service?

Tags: