Research-Machine Learning Department - Carnegie Mellon University

Machine Learning Research

Below is a sampling of active ML research projects and labs.  Additional research projects are described on the home pages of individual faculty. 

autonlab

The AUTON Lab

Our main research is into useful data structures and algorithms for making interesting statistical and learning approaches tractable on large volumes of data. We are very interested in the underlying computer science, mathematics, statistics, and in practical applications of our work. We collaborate closely with food safety analysts public health agencies, nuclear safety experts, managers of fleets of equipment, social networkers, astrophysicists, biologists, drug companies, exploration companies and roboticists.
www.autonlab.org

brainimage

Brain Image Analysis Research Group 

Our group develops statistical machine learning algorithms to analyze fMRI data. We are specifically interested in algorithms that can learn to identify and track the cognitive processes that give rise to observed fMRI data.
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-73/www/index.html

Cell Organizer

Cell Organizer

A team led by Bob Murphy, Director of the Lane Center for Computational Biology and a faculty member in the Machine Learning department, is combining image-derived modeling methods with active learning to build a continuously updating, comprehensive model of protein localization.  Obtaining a complete picture of the localization of all proteins in cells and how it changes under various conditions is an important but daunting task, given that there are on the order of a hundred cell types in a human body, tens of thousands of proteins expressed in each cell types, and over a million conditions (which include presence of potential drugs or disease-causing mutations).  Automated microscopy can help by allowing large numbers of images to be acquired rapidly, but even with automation it will not be possible to directly determine the localization of all proteins in all cell types under all conditions.   http://murphylab.cbi.cmu.edu/CellOrganizer/

 

Pegasus

Databases Group

The databases group at Carnegie Mellon University focuses on high performance database architectures, multimedia, and data mining. We participate in a number of cross-disciplinary efforts, and closely collaborate with a number of other groups at CMU.
http://www.db.cs.cmu.edu/db-site/

logo

GraphLab

Designing and implementing efficient and provably correct parallel machine learning (ML) algorithms can be very challenging. Existing high-level parallel abstractions like MapReduce are often insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. http://graphlab.org/
querindipity

Querendipity

Working scientists need to track an enormous amount of information -in addition to the scientific literature, which is currently growing at a rate of a million articles a year, biologists need to understand when new high-throughput experimental results have been obtained that might impact their work.  The model traditionally used in biology to solve this problem is creation of a manually curated community database of experimental results and literature.  The Querendipity project aims to create a new model for managing and distributing scientific data.  Querendipity is a personalized adaptive information system that works by loosely integrating data of many sorts (including unstructured text) into a single structure that can be queried using "schema-free similarity queries" - which are similar to keyword queries, but allow queries to structured data with few text annotations as well as to text. http://www.cs.cmu.edu/~wcohen/querendipity/

rtw

Read the Web

Can computers learn to read? We think so. "Read the Web" is a research project that attempts to create a computer system that learns over time to read the web. Since January 2010, our computer system called NELL (Never-Ending Language Learner) has been running continuously, attempting to perform two tasks each day:
First, it attempts to "read," or extract facts from text found in hundreds of millions of web pages (e.g., playsInstrument(George_Harrison, guitar)). Second, it attempts to improve its reading competence, so that tomorrow it can extract more facts from the web, more accurately. http://rtw.ml.cmu.edu/rtw/

tvn

SAILING Lab

Laboratory for Statistical Artificial InteLligence & INtegrative Genomics

Projects in Graphical models, Bayesian approaches, inference algorithms, and learning theories for analyzing and mining high-dimensional, longitudinal, and relational data Computational and comparative genomic analysis of biological sequences, systems biological investigation of gene regulation, and statistical analysis of genetic variation, demography and linkage (to diseases) Application of statistical learning in text/image mining, vision, and machine translation
http://www.sailing.cs.cmu.edu/

Selectlab logo

SELECT Lab

Our main long-term research goal is developing efficient algorithms and methods for designing, analyzing, and controlling complex real-world systems. To achieve this goal, our research spans the entire spectrum from theoretical foundations to real-world applications.
http://www.select.cs.cmu.edu/

biology

Systems Biology Group

Our group develops computational methods for understanding the dynamics, interactions and conservation of complex biological systems. As new high-throughput biological data sources become available, they hold the promise of revolutionizing molecular biology by providing a large-scale view of cellular activity. However, each type of data is noisy, contains many missing values and only measures a single aspect of cellular activity. Our computational focus is on methods for large scale data integration. We primarily rely on machine learning and statistical methods. Most of our work is carried out in close collaboration with experimentalists. Many of the computational tools we develop are available and widely used.

http://www.sb.cs.cmu.edu/