Research Projects
 
 

This page is a list of some of the projects for which ML faculty may be interested in recruiting students. Within each project there can be lines of research which range in size from a semester's work to an entire thesis (or beyond). So, this page is intended as a resource for students looking for a thesis advisor, for a KDD project, or to collaborate for any other reason.

Linking Human and Machine Learning
Ken Koedinger (koedinger@cmu.edu), William Cohen (wcohen@cs.cmu.edu), or Richard Scheines (scheines@andrew.cmu.edu), PhD and KDD project opportunities

A number of projects within the Pittsburgh Science of Learning Center are pursuing linkages between machine learning and human learning research. These include creation of "simulated students" that learn from demonstrations, problem solving practice, and instruction application of machine learning theory, like co-training or inductive logic programming, to predict or explain human learning and drive new theory in both areas, and data mining of great volumes of student interactions with intelligent tutoring systems and on-line courses. If you are interested in potentially getting involved with a Pittsburgh Science of Learning Center project, contact any of the faculty listed above.
[Date posted: August, 2007]

Anomalous Pattern Detection
Daniel B. Neill (neill@cs.cmu.edu), looking for 1 Ph.D. student and/or shorter projects
We plan to investigate a variety of large-scale anomaly detection problems, including network intrusion detection, terrorist group detection, environmental monitoring of water quality, and tumor detection in medical images. Rather than searching for individual data points that are anomalous, interesting, or unexpected, these problems require us to detect groups of data points with interesting patterns or relationships. Building on our prior work in spatial cluster detection, we are working to develop general and powerful statistical methods, and fast algorithms, for anomalous pattern detection in massive, high-dimensional datasets. [Date posted: August, 2007]


Machine Learning for Disease Surveillance
Daniel B. Neill (neill@cs.cmu.edu), looking for 1 Ph.D. student and/or shorter projects
Automatic disease surveillance systems are essential for early detection of public health threats such as bird flu or bioterrorism. We have developed a system which monitors nationwide public health data (including hospital visits and pharmacy sales) and automatically detects emerging outbreaks of disease. The current system uses new statistical machine learning techniques and fast, scalable algorithms to rapidly detect anomalous disease clusters in massive real-world datasets. We plan to extend this system in a variety of ways, including:

  • Continued improvement of the underlying statistical and algorithmic framework.

  • Bayesian methods for combining multiple data streams.

  • Incorporating new data sources, such as search engine queries.

  • Active model learning, using human relevance feedback to model and distinguish between different outbreak types and other potential causes of a disease cluster.

  • Providing automated tools for public health investigation, characterization, and tracking of discovered outbreaks.[Date posted: August, 2007]

Computational Models of Molecular Evolution
Roni Rosenfeld (roni@cmu.edu), looking for 1 new PhD student in this area
Molecular evolution is a stochastic computational process that has been running on massively parallel hardware for some 10^17 seconds now, and which has resulted in many amazing local maxima along the way. The rapidly growing DNA and protein databases present a historic opportunity to model evolution at an unprecedented quantitative level, with enormous impact on medicine as well as on our fundamental understanding of life. In this project we combine statistical and computational methods to derive biological explanations and pharmacological predictions. [Date posted: August, 2007]

Viruses, Vaccines, and Digital Life
Roni Rosenfeld (roni@cmu.edu), looking for 1 new PhD student in this area
Viruses are the simplest known self-replicating computational system. They also happen to be the leading emerging threat to humanity in the 21st century. Fortunately, the new understanding of life in general and viruses in particular as digital programs opens the door to computational methods of defending against these threats. This is a new project launched in collaboration with leading virologists at the University of Pittsburgh whose aim is to combine biological analysis with statistical learning methods to better understand viral evolution and accelerate vaccine development.
Linking human and machine learning [Date posted: August, 2007]

Machine Learning for Identifying and Detecting Existing and Emerging Patterns
Jeff Schneider (jeff.schneider@cs.cmu.edu), Artur Dubrawski (awd@cs.cmu.edu), The Auton Lab is looking for a PhD student or students interested in finding patterns in high dimensional and multi-variate time series data.
Application areas include:
- disease surveillance, early detection of outbreaks
- homeland security, identification of dangerous cargo containers
- food safety, detection of unsafe processing plants and tainted food
- aircraft maintenance, recognizing changing maintenance patterns and identifying underlying causes

The algorithms to be created will need some or all of these features:
- ability to recognize patterns across multivariate time series
- ability to identify newly emerging patterns in the data
- ability to incorporate human feedback into the pattern learning and identification process
- new data structures for efficient computation
- active learning to identify causes of patterns that are detected

The Auton Lab also has projects in astrophysics and drug discovery.
Students may apply their algorithms in those domains as appropriate. [Date posted: August, 2007]