Published 6 years ago by James Simmons
True Knowledge is a natural language search engine and question answering site, but to leave it at that would not do the site justice. What makes it stand out from similar sounding services like Powerset and Freebase? True Knowledge tackles natural language search and question answering (much like Powerset and Hakia), and it also maintains a knowledge base of facts about the world (similar to DBpedia and Freebase). However, what makes True Knowledge stand out is that they've combined these features and encourage their userbase to contribute facts and add new knowledge.
A brief overview of True Knowledge
True Knowledge has combined their technologies to create something that doesn't easily fall into any one category. In fact, you can categorize it as all of the following:
- Question-Answering site
- You can ask questions about any subject and get a direct response. Unlike human-powered Q&A sites, you don't need to wait for someone to respond. The computer answers your question using knowledge stored in a form it can comprehend, and isn't just regurgitating text that it doesn't understand. For this reason it can answer questions it hasn't seen before and can combine knowledge through a process of inference and cross-referencing stored information to produce a reasoned answer.
- Natural language search engine
- True Knowledge also returns search results like a standard search engine, however not without first passing it through their natural language technology. Your query may be a standard question; even if it isn't, they may be able to work out what you are looking for and give you the answer directly. Because of the way facts are assessed you can enjoy a high degree of confidence that any information they retrieve will be accurate (unlike information on any single Web page). You aren't limited to properly constructed questions, you can also use the typical two and three word "keywordese" queries that many search engine users are accustomed to. Where what is typed is just the name of an entity, their technology can produce a small information screen giving core information about the entity (as well as search engine results).
- Wikipedia for facts
- The knowledge in their system comes from two main sources: information they import themselves from various sources (such as the CIA Factbook) and facts added by their userbase. A big part of their technology is enabling users to add knowledge without having to have any technical understanding of the underlying computer processes. Unlike Wikipedia, where the knowledge in each entry is buried in natural language, True Knowledge stores each piece of knowledge as a discrete fact that can be reasoned on. Once a fact has been established with enough evidence it can't be easily changed. Furthermore, facts that contradict this knowledge are also automatically prevented, which helps the system deal with vandalism.
- "Universal database"
- With a typical database-driven application the developers sit down and create a schema. They then write code which manipulates and processes the data in that schema and when the application is finished this code is run by users. The knowledge that such a system can process is extremely narrow and remains so because nothing that happens after launch expands the scope of the application. Users may add data to the tables but the schema remains fixed. True Knowledge is like a database application except that everything in it is amenable to expansion by users. The scope of the knowledge that it can store expands every time a user adds a new class, relation or attribute; and knowledge about every conceivable entity can be put into the system and be used to answer questions.
In short, they've created a platform for representing the world's knowledge in a form that is clear and accessible to humans, as well as being comprehensible to computer.
Information about their architecture
At the heart of the True Knowledge system is the Knowledge Base - a huge database of facts on any topic represented in a form that can be processed by computer. Facts are also inferred by the Knowledge Generator, either using Knowledge Base facts, other generated facts or external feeds of knowledge.
Users can ask questions through a browser interface and those questions are translated via Natural Language Translation into queries expressed in the True Knowledge query language. Their technology has the ability to disambiguate ambiguous questions, including removing interpretations of questions that are unlikely. Questions can also be abbreviated to two or three ("keywordese") words and still be understood - similar to typical keyword search terms.
Their question answering system uses the Knowledge Base and generated facts to answer queries. The API provides an alternative interface to the question answering system from remote computers.
System Assessment further processes existing facts in order to maintain semantic consistency of knowledge. For example, facts can be marked as untrue if they are contradicted by other facts. The browser interface provides a means for users to assess the validity of facts (User Assessment), enabling them to endorse or contradict particular facts. A user's reputation and track record is used to automatically weight this information. In combination with System Assessment this prevents the back-and-forth battles that are common on Wikis.
The Knowledge Base grows through Knowledge Addition, either from users via the browser interface, or imported in volume from external sources.
A key design decision is that all components are extendable by users. In addition to users adding facts, they can also extend the questions that can be translated into whole new areas and even provide new inference rules (and even executable code for steps that involve calculation) for the Knowledge Generator.
True Knowledge API
No service such as this would be complete without an API! They say their API can execute any query you supply it with, however they are in the process of releasing a series of API services. These simple services encapsulate areas of knowledge which are well served by their current Knowledge Base. All these services can be accessed via the same query interface using a single account. Click on the names of the services below to test each one!
- IP Geolocation
- Converts an IP address to a probable geographical location of an internet user (e.g. the user of a website). This geographic knowledge can then be used in subsequent queries to retrieve further relevant facts about the location from the Knowledge Base: including the user's likely language, preferred currency, local time etc.
- Local Time
- Identifies a place either from an IP address obtained automatically or from a supplied string denoting the place and obtains a local time either now or at some past or future time. Possible applications included an online or phone conferencing system wanting to inform the participants about the date/time of the meeting in their local time zone.
- Takes a personal name (first name or full name) and returns the gender inferred by the system for that name. The system applies certain heuristics to a string representing a person's name in an attempt to judge the gender of the person. If the gender can be determined with reasonable probability, then it will be returned. This service would be useful to, for example, a social networking site wishing to use gender-specific language about a user whose name, but not gender, was known.
- Takes an email address and returns the forename inferred from its local-part (if a name can safely be inferred). Businesses with access to users' email addresses but not names could use this to address emails more personally. This service can be combined with the Name-to-Gender service to infer a person's gender from his/her email address.
- Trading Day
- Takes a point in time and a geographical location and returns 'no' if it is a weekend day or a public holiday in the location and 'yes' otherwise.
- Returns a language which can be read by a significant number of people at a location. True Knowledge has complete coverage at the national level and partial coverage for smaller areas. This can be used in combination with the IP Geolocation service to decide which language(s) are appropriate when displaying websites to international users, for example.
- Telephone Number-to-Location
- Returns the geographical location of the specified landline telephone number.
Don't worry, the road doesn't end there. True Knowledge says they are currently working on even more services to add to this list.
Adding knowledge to True Knowledge
Time for some hands-on stuff!
What do True Knowledge and Jurassic Park have in common? Nothing as far as I'm aware of. However, I am going to show you step-by-step how I taught True Knowledge something it didn't know. To be more specific, I'm going to show you how to add new knowledge from start to finish and then how to expand on it. Because True Knowledge seems to update itself in real-time, I was able to see the fruits of my labor right away. Not having to wait for an index to rebuilt made the task of adding knowledge feel more worthwhile.
After playing with a few test queries I tried to find something it didn't know anything about. I asked "who is the author of jurassic park?", which returned the response "I don't know" and a more detailed explanation:
It sounds like "jurassic park" may be a thing that is published that I don't currently know about. If you want, you can add the thing that is published called "jurassic park" to the Knowledge Base.
Incidently the search results that appear along the side the answer are pretty relevant. The first result contains the answer to my question. By chance, the title is exactly my answer.
Clicking the link took me to a screen that asked me to enter the most common name for "a thing that is published." I entered "Jurassic Park." They do ask that you don't enter information about fictional things (e.g., unicorns). I had to think for a moment if Jurassic Park is considered a fictional thing in this context. I came to the conclusion that Jurassic Park is not fictional in the sense that it is both a literary work and the title of several movies so I clicked Submit.
After a quick look at the confirmation page I was ready to proceed. I should note that there are several confirmation pages along the way. If you're comfortable enough with the process you can disable each confirmation page individually by checking the box that says "Don't show me this confirmation page again."
Next I was presented with a possible Wikipedia match and a helpful extract from the page. I was satisfied that the Wikipedia entry presented to me was indeed talking about the very same Jurassic Park so I clicked continue.
The next screen asked me if I knew anything that Jurassic Park is that is more specific than a "thing that is published." It was trying to figure out the name of the class of things Jurassic Park belonged to. I clicked yes, entered "movie" and clicked submit.
True Knowledge is already aware of what a movie is and asks me specifically if what I meant was "movie (connected cinematic narrative)." Satisfied that I had my match I clicked submit and continued on.
This is where I thought things got interesting. The next screen asked me to be more specific about what kind of movie Jurassic Park is and gave me the following options to choose from:
- Made for TV movie
- Made for video movie
- Big screen movie
Since we all know Jurassic Park was a major motion picture I chose "big screen movie" and clicked select. Alternatively if I didn't want to choose any of those refinements (e.g., if they didn't apply) I could simply click Yes and proceed with Jurassic Park labeled as a "movie."
The next screen asked me to enter a phrase that could be used instead of Jurassic Park in all circumstances. Basically they were asking for a short but descriptive phrase that makes it absolutely clear what Jurassic Park is. They give a few examples such as "France, the Republic of France" and "Star Wars, the 1977 adventure action sci-fi movie Star Wars." Going off the Star Wars example I entered "Jurassic Park, the 1993 movie about dinosaurs" and clicked submit.
I was then asked to confirm that the phrase I entered was an unambiguous way of saying Jurassic Park, which would be recognized by anyone wanting to say something about that big screen movie. After confirming a few points about the ambiguity of my phrase I clicked Yes.
I was then asked to enter a few alternate names. I entered "JP" (the US promotional title) and "Jurassic Park 1" (a common way of referring to the original movie after the sequels were released).
Next I had to enter a unique, human readable ID. The page informed me that [jurassic park] was available and auto-populated that value for me. I certainly couldn't think of a better ID so clicked submit.
After submitting the ID I was presented with a list of facts that the system had gathered from the information I entered. Reading through the list of facts you can see how each step along the way input the information into True Knowledge. I am listed as the source for each fact because I have not specified any other sources. Luckily I am able to do that at the bottom of the page. As I want this information to be trustworthy, I included a trustworthy source: The IMDB entry for Jurassic Park.
I entered the URL for the entry on IMDB and clicked add new source. This took me to a mini-process of adding a document stored in a remote system (i.e., a Web page). I clicked OK to start the process.
The next screen asked me to verify that the contents below were what I was expecting. Everything checked out so I clicked confirm.
Now that I have a new source available to me (the IMDB page) I changed the source where appropriate. Once I had the sources set I clicked add these facts to finish up the process of adding new knowledge.
All done! Clicking on OK will take you to a page with your new entry.
The page has a few links for adding more information that would be relevant to the entry.
I wasn't done yet since I still couldn't answer the question "who is the author of jurassic park?" Of course now I have a whole new problem, I told the system that Jurassic Park was a movie, not a literary work. We'll see how the system handles this. On the add knowledge page I selected "add a new fact."
On the add a fact page I was given three textboxes to enter a (subject,object,predicate) tuple about anything. Since I want to enter the author information for Jurassic Park I entered "Michael Crichton" -> "is the author of" -> "Jurassic Park" and clicked submit.
The next screen actually informs me that the system is already aware of Michael Crichton, the American author born in 1942. Since we're both talking about the same person I clicked submit.
On the fact confirmation page that followed I was given the option to go ahead and add the fact as-is or to change the left or right part of the fact (the subject or object). Although the proper course of action would have probably been to create a new entry in True Knowledge for the literary work Jurassic Park, I wanted to see if the property "author" could be applied to an instance of class "movie." I also wanted to determine whether or not something can belong to multiple classes ("book" and "movie"). I chose to add Michael Crichton as the author of Jurassic Park (the movie), and clicked Yes.
When it came time to list sources I told it that I was not the source, and I listed the Wikipedia entry for Jurassic Park and went through the two-step process of adding a Web page.
Now True Knowledge knows about Jurassic Park (the 1993 movie about dinosaurs) and Michael Crichton, the author of Jurassic Park (the literary work). It should be noted that True Knowledge is under the impression that Michael Crichton is actually the author of the movie Jurassic Park.
I tried my original question and this time I got a direct answer, including how it came to that conclusion. So you can apply an author to a movie. It feels weird to me that you can do that, because I don't feel you can be the "author" of a movie (rather, the movie's script and screenplay).
Back on the add a fact page I tell True Knowledge that "Jurassic Park" -> "is a" -> "book."
This time around I'm given three options of what a book might be. I chose the last option, "book (a written work intended to be published as a set of pages bound together on one side)" because I felt it was the best definition of what a book is.
After confirming the fact and adding my source (Wikipedia again) I am informed that "Jurassic Park is a book" contradicts previously inserted facts. In this case, it is apparent that a movie cannot also be a book.
In the end the fact did not get added because it contradicts an existing fact in the system. Today was just my first day, so I'm sure I'll get better at this.
My first impression of True Knowledge
I found my first experience with True Knowledge very satisfying! The user interface is simple and it's hard to get lost trying to do something new. They are still in beta, and as such they still have some polish to apply before the general public is let in, but the product is solid and I can't wait until more users are let in the gates.
I'm interested to see how it will prevail over similar services. Components of True Knowledge compete with many semantic services (Freebase, Hakia, Powerset, DBpedia, etc) and even non-services like Cyc. I am of the opinion that True Knowledge has the winning combination of each approach.
About the author
Trackback URL for this entry:
Spam protection by Akismet
Post a comment