Friday, November 19, 2010
NELL: The Computer That Learns
Tom Mitchell’s two daughters are grown, but watching his newest ‘baby’ learn to read is an unprecedented achievement.
Professor Mitchell leads the team that developed the Never-Ending Language Learner – NELL – a computer system that, over time, is teaching itself to read and understand the web.
“I’ve been interested for many, many years in how machines learn because I’m also interested in how humans learn,” explained Mitchell, who heads Carnegie Mellon’s Machine Learning department – the first and only department of its kind in the world. “NELL comes naturally out of that. The current machine-learning algorithms are very different in style than how you and I learn. They analyze a single data set, output an answer, and then you turn them off. That’s not like us at all! The idea of NELL is to capture a style more like the on-going learning of humans.”
Understanding language – the way humans do – depends on both context and background knowledge gained over time. So NELL scans the web – attempting to “read” hundreds of millions of web pages on a fact-finding mission.
For example, the repeated combination of a phrase like “New York City Marathon” in combination with other words has taught NELL to learn that it’s a “race” and a “sports event.”
Since January, NELL has been running 24/7, extracting new examples of categories and relations each day, expanding its ‘knowledge base’ and continually improving accuracy and efficiency.
Unlike other systems, NELL not only learns over time, but does so nearly autonomously.
Nearly. Just like a child, NELL needs a little human help. A few months back, the researchers noticed a few categories lagging the others in accuracy. For example, NELL had labeled “internet cookies” as “baked goods,” triggering a domino effect of mistakes in that category.
Now, every two weeks, the team spends a few minutes scanning for errors to correct, then sets NELL back to learning. As of October, NELL’s knowledge base contained nearly 440,000 beliefs. By January, it should reach 1 million.
Mitchell hopes that NELL’s growing capability can one day be used as a basis for new research and improved computer ‘reading’ ability. Imagine searching your computer with a question and receiving not a webpage address but an actual answer.
“We want to use Nell’s knowledge base as a starting point for building computers that really can understand individual sentences,” said Mitchell, who is the Fredkin University Professor of Artificial Intelligence and Machine Learning. “This will change the game for trying a new approach to natural language understanding. It will open up numerous possibilities for people to communicate with computers in a much more natural way.”
He sees Carnegie Mellon as the ideal place to develop NELL, which is supported by the Defense Advanced Research Projects Agency, Google and Yahoo.
“CMU has a ‘Well, don’t just talk about it – really do it’ approach to projects,” Mitchell noted. “CMU also has one of the largest computer science organizations of any university in the world. We have a large number of very high quality researchers across many areas. For something like NELL, on this kind of scale, you need a lot of expertise.”
NELL site: http://rtw.ml.cmu.edu/rtw/
Twitter feed: http://twitter.com/cmunell
TechCrunch interview: http://techcrunch.com/2010/10/09/nell-computer-language-carnegie-tctv/
NYTimes article: http://www.nytimes.com/2010/10/05/science/05compute.html?_r=2&src=twt&twt=nytimesscience
Machine Learning Department: http://www.ml.cmu.edu/
Mitchell’s bio: http://www.cs.cmu.edu/~tom/