Machine Learning Department Research

Below is a sampling of active ML research projects and labs.  Additional research projects are described on the home pages of individual faculty. Examples of projects open to undergraduates are on our ML Minor Senior Projects page. In-depth descriptions of selected current MLD research projects are available on the ML@CMU blog.


Brain Image Analysis Research Group 

Our group develops statistical machine learning algorithms to analyze fMRI data. We are specifically interested in algorithms that can learn to identify and track the cognitive processes that give rise to observed fMRI data.

Cell Organizer

Cell Organizer

A team led by Bob Murphy, Department Head for Computational Biology and a faculty member in the Machine Learning department, is combining image-derived modeling methods with active learning to build a continuously updating, comprehensive model of protein localization.  Obtaining a complete picture of the localization of all proteins in cells and how it changes under various conditions is an important but daunting task, given that there are on the order of a hundred cell types in a human body, tens of thousands of proteins expressed in each cell types, and over a million conditions (which include presence of potential drugs or disease-causing mutations).  Automated microscopy can help by allowing large numbers of images to be acquired rapidly, but even with automation it will not be possible to directly determine the localization of all proteins in all cell types under all conditions.


Databases Group

The databases group at Carnegie Mellon University focuses on high performance database architectures, multimedia, and data mining. We participate in a number of cross-disciplinary efforts, and closely collaborate with a number of other groups at CMU.

Delphi Research Group

Delphi Research Group (Epidemiological Forecasting)

Epidemiological forecasting is critically needed for decision making by public health officials, commercial and non-commercial institutions, and the general public. We developed multiple award winning forecasting technologies, based on statistical machine learning and other techniques. Our long term vision is to make epidemiological forecasting as universally accepted and useful as weather forecasting is today. We have participated, and done very well, in all epidemiological forecasting challenges organized by the U.S. government to date: Influenza 2013—2014 (CDC); Chikungunya 2015 (DARPA); Dengue 2009—2014 (White House OSTP); Influenza 2014—2015 (CDC, winner); Influenza 2015—2016 (CDC, winner); Influenza 2016—2017 (CDC, winner), Influenza 2017—2018 (CDC, triple winner).



Working scientists need to track an enormous amount of information -in addition to the scientific literature, which is currently growing at a rate of a million articles a year, biologists need to understand when new high-throughput experimental results have been obtained that might impact their work.  The model traditionally used in biology to solve this problem is creation of a manually curated community database of experimental results and literature.  The Querendipity project aims to create a new model for managing and distributing scientific data.  Querendipity is a personalized adaptive information system that works by loosely integrating data of many sorts (including unstructured text) into a single structure that can be queried using "schema-free similarity queries" - which are similar to keyword queries, but allow queries to structured data with few text annotations as well as to text.


Read the Web

Can computers learn to read? We think so. "Read the Web" is a research project that attempts to create a computer system that learns over time to read the web. Since January 2010, our computer system called NELL (Never-Ending Language Learner) has been running continuously, attempting to perform two tasks each day:
First, it attempts to "read," or extract facts from text found in hundreds of millions of web pages (e.g., playsInstrument(George_Harrison, guitar)). Second, it attempts to improve its reading competence, so that tomorrow it can extract more facts from the web, more accurately.



Laboratory for Statistical Artificial InteLligence & INtegrative Genomics

Projects in Graphical models, Bayesian approaches, inference algorithms, and learning theories for analyzing and mining high-dimensional, longitudinal, and relational data Computational and comparative genomic analysis of biological sequences, systems biological investigation of gene regulation, and statistical analysis of genetic variation, demography and linkage (to diseases) Application of statistical learning in text/image mining, vision, and machine translation

Selectlab logo


Our main long-term research goal is developing efficient algorithms and methods for designing, analyzing, and controlling complex real-world systems. To achieve this goal, our research spans the entire spectrum from theoretical foundations to real-world applications.


Systems Biology Group

Our group develops computational methods for understanding the dynamics, interactions and conservation of complex biological systems. As new high-throughput biological data sources become available, they hold the promise of revolutionizing molecular biology by providing a large-scale view of cellular activity. However, each type of data is noisy, contains many missing values and only measures a single aspect of cellular activity. Our computational focus is on methods for large scale data integration. We primarily rely on machine learning and statistical methods. Most of our work is carried out in close collaboration with experimentalists. Many of the computational tools we develop are available and widely used.



Our main research is into useful data structures and algorithms for making interesting statistical and learning approaches tractable on large volumes of data. We are very interested in the underlying computer science, mathematics, statistics, and in practical applications of our work. We collaborate closely with food safety analysts public health agencies, nuclear safety experts, managers of fleets of equipment, social networkers, astrophysicists, biologists, drug companies, exploration companies and roboticists.