NOV 15th 2007

The Curse of Knowledge and the Semantic WebThe Curse of Knowledge: the more you know, the more difficult it is for you to communicate knowledge. When we know something, we can hardly imagine not knowing it. The more we learn about something, the more it becomes even harder for us to think of not knowing it. It is generally difficult for experts (who know much) to explain their expertise to laymen (who know little) because experts have to try hard to imagine the scenario when they were not experts. This is the Curse of Knowledge.

NOV 16th 2007

Image credit: Node GardensTo begin with, there is a very simple idea: Websites should themselves indicate their changes to the search engines. I've already touched upon the subject in the previous part of this series, right now search engines have a reversed approach which consists of crawling the Web constantly looking for the slightest modification. Don't you think it's silly? Think about the number of Web pages to visit, imagine the cost to get the lowest frequency between each visit. Consequently, it seems difficult to consider the development of new search engines today. Nevertheless, the advent of the Semantic Web should lead to their multiplication, in a vertical way, while search engines are getting specialized more and more in specific fields.

FEB 13th 2008

While I am still waiting for an invitation from Twine (probably you too?) I have received one from Powerset - natural language search. Powerset obviously is a promising company (and is promising a lot), so I was excited when I was starting to play around with this new tool which still isn't available for the public.

True KnowledgeTrue Knowledge is a natural language search engine and question answering site, but to leave it at that would not do the site justice. What makes it stand out from similar sounding services like Powerset and Freebase? True Knowledge tackles natural language search and question answering (much like Powerset and Hakia), and it also maintains a knowledge base of facts about the world (similar to DBpedia and Freebase). However, what makes True Knowledge stand out is that they've combined these features and encourage their userbase to contribute facts and add new knowledge.

OCT 30th 2008

Cross-Pollinating DBpedia and FreebaseNow that Freebase is available as Linked Data a big question that comes to mind is whether these two major projects will move to assimilate one another. DBpedia and Freebase – two endeavors primarily focused on curating unstructured and semi-structured data about everything and releasing it back into the wild (with structure) – get the bulk of their information from Wikipedia, so the amount of topical overlap is assumed to be extremely high. DBpedia gains new information when it extracts data from the latest Wikipedia dump, whereas Freebase, in addition to Wikipedia extractions, gains new information through its userbase of editors.

DEC 9th 2008

FreebaseFreebase stores millions of entities and assertions about nearly every topic one can ponder (thanks are owed to their seed dataset – Wikipedia – and their amazing community). The amount of information that Freebase stores is incredible, and is a testament to what can be accomplished with the help of a dedicated community and a little (or a lot) of clever software engineering.

JAN 9th 2007

The journey from now to the Semantic Web is a long one. What we currently have on our hands with the current version of the Web are billions of documents totaling terabytes of data. This data is usually found within HTML pages comprised mainly of non-validating markup and very little, if any, meta data.

While there are billions of documents on the Web that contain no meta data whatsoever there is one shining star of hope: Natural Language Processing. NLP can be used to sift through the "garbage" data to extract coherent statements about the information held within.

JAN 20th 2007

Weekend Brain Dump of Ideas

Published 10 years ago by James Simmons

  • Semantic metadata for video and other multimedia?
  • Will a new platform away from the browser have huge success? (ala Joost)
  • How can video games benefit from what we're doing with the Semantic Web?
  • Is Wikipedia the best playground for natural language processors to test their ability?
  • Does the World Wide Web as we know it need to be replaced?
  • Is HTTP inadequate for the future of the Web where streaming and maintaining state are becoming increasingly important?
  • Are we entering another brutal browser war? Maybe this one will be different because we know the importance of compatability
  • Will RDF or RDF/a be adopted by mainstream Web developers to markup semantic metadata?
  • ...Or will something come along that's better suited and easier for beginners to pick-up
  • Are we making any progress as-is towards our goal, or do we need to look for a different approach?
  • Is the best course bottom-up (building the Semantic Web from the ground up by using semantic markup, microformats, RDF, etc) or is it top-down (using natural language processors to read the Web and make sense of it for us).
  • With the freedom to create any RDF vocabulary or any ontology for that matter, will the real power be in mapping my meaning to your meaning?
JAN 26th 2007

A mashup is a hybrid Web application that combines complementary elements from two or more sources to create one integrated experience. Content used in mashups is generally sourced from a third party via an API or from Web feeds (e.g. RSS or Atom). Basically, the point is to take multiple data sources or Web services and turn them into something useful. The idea of combining Web services is not a new one, but it has gained immense traction in recent times and will likely continue to grow in popularity. In this entry I will be discussing both the promising future mashups offer and also potential pitfalls.

FEB 22nd 2007

The value of a dataset may be determined by any number of factors, however it can generally be agreed upon that the data's accuracy, how difficult it is to re-create, its source, and other important factors can affect the value of the data. However, as technology evolves to allow easier access to the information we require, the value of dataset may eventually decrease over time.

