Carnegie Mellon University
Data Analysis Projects

Student Data Analysis Projects

Students are required to demonstrate their grasp of fundamental data analysis and machine learning concepts and techniques in the context of a focused project. The project should focus on a substantive problem involving the analysis of one or more data sets and the application of state-of-the art machine learning and data mining methods, or on suitable simulations where this is deemed appropriate. Or, the project may focus on machine learning methodology and demonstrate its applicability to substantial examples from the relevant literature. The project may involve the development of new methodology or extensions to existing methodology.

Word Sense Disambiguation Using Semi-Supervised Naive Bayes with Ontological Constraints [.pdf] - Jakob Bauer, 12/16

Recurrent Neural Network Embedding for Knowledge-base Completion [.pdf] - Yuxing Zhang, 12/16

Matching Multifrequency Clinical Time Series [.pdf] - Yi Wei, 12/16

Predicting the Onset of Tachycadia for Patients in Intensive Care Units [.pdf] - Lidan Mu, 12/16

The Influence of the Sinking Strike Zone on Major League Baseball's Strikeout Epidemic [.pdf] - Adam Brodie, 12/16

A Penalized Regression Model for the Joint Estimation of eQTL Associations and Gene Network Structure [.pdf] - Micol Marchetti-Bowick, 12/16

The Role of Syntax in Semantic Processing: a Study of Active and Passive Voicings [.pdf] - Nicole Rafidi, 4/16

Noise-Robust Spectral Clustering - Carlton Downey, 4/16

Scalable Gaussian Processes for Characterizing Multidimensonal Change Surfaces [.pdf] - William Herlands, 4/16

Canonical Autocorrelation Analysis for Radiation Threat Detection [.pdf] - Maria De Arteaga, 4/16

Establishing a Statistical Link Between Network Oscillations and Neural Synchrony [.pdf] - Pengcheng Zhou, 4/16

Point Type Inference in Heating, Ventilation and Air Conditioning Systems - Jingkun Gao, 4/16

Source Identification in H1N1 Flu Infection [.pdf]- Bin Deng, 4/16

Informative Student Models for Personalized Education - Joseph Runde, 4/16

Investment Manager Discussions and Stock Returns: a Word Embedding Approach [.pdf] - Lili Gao, 4/16

Exploring consumer spending behavior for prepaid cards using latent topic models and dynamical systems [.pdf] - Su Zhou, 4/16

Functional Linear Models for Brain Data [.pdf] - Junier Oliva, 4/16

Identifying Influential Users in Social Network with Review Data [.pdf] - Yilin He, 4/16

Linear Time Samplers for Supervised Topic Models using Compostional Proposals [.pdf] - Xun Zheng, 4/16

Acoustic Scene Recognition with Deep Learning [.pdf] - Wei Dai, 4/16

Influenza Trend Prediction using Kalman Filters and Particle Filters [.pdf] - Ying Zhang, 4/16

Understanding the relationship between Functional and Structural Connectivity of Brain Networks [.pdf] - Sashank Jakkam Reddi, 11/15

Cost-Effective Feature Selection and Ordering for Personalized Energy Estimates [.pdf] - Kirstin Early, 11/15

Automated Coding of Open-Ended Survey Responses [.pdf] - Dallas Card, 11/15

Predicting Structure in Handwritten Algebra Data from Low Level Features [.pdf] - James Duyck, 11/15

Sample Efficient Learning to Make Decisions with a Focus on Education [.pdf] - Min Hyung Lee, 11/15

Target Predictions of Small Molecules using LINCS Data [.pdf] - Yan Xia, 11/15

Local analysis of traveling waves in the monkey primary motor cortex [.pdf] - Jessica Chemali, 5/15

Machine Learning Methods for Interatomic Potentials: Application to Boron Carbide [.pdf] - Qin Gao, 4/15

Spectral Method for Topic Modeling with Hierarchical Structure - Hsiao-Yu Tung, 4/15

Exploring Spatio-temporal Neural Correlates of Face Learning [.pdf] - Ying Yang, 4/15

Jointly Modeling Aspects, Ratings and Sentiments for Movie Recommendation [.pdf] - Chao-Yuan Wu, 4/15

A New View of Predictive State Methods for Dynamical System Learning [.pdf] - Ahmed Hefny, 4/15

Estimating Accuracy from Unlabeled Data [.pdf] - Anthony Platanios, 4/15

An Analysis of Cross-Device Search [.pdf] - George Montañez, 3/15

Graph Structure Learning from Unlabeled Data for Event Detection [.pdf] - Sriram Somanchi, 3/15

Tumor phylogenetic lineage separation by medoidshift clustering with non-postitive kernel [.pdf] - Lu Xie, 3/15

Consistency in Extending Problem-solving Procedures Indicates Expertise [.pdf] - Qiong Zhang, 11/14

A General Approach to Prediction and Forecasting Crime Rates with Gaussian Processes [.pdf] -Seth Flaxman, 5/14

Time-varying Linear Regression with Total Variation Regularization [.pdf] - Matthew Wytock, 5/14

An integrated approach to validating ChIP-Seq using A* Lasso for Sparse Bayesian Network Learning [.pdf] - Jing Xiang, 4/14

Robust Data-Driven State Estimation for Smart Grid [.pdf] - Yang Weng, 3/14

Prioritizing Malware Analysis [.pdf] - Zhen Tang, 2/14

Automated Discovery of Novel Anomalous Patterns [.pdf] - Edward McFowland III, 12/13

Dynamic Pattern Detection with Temporal Consistency and Connectivity Constraints [.pdf] - Skyler Speakman, 11/13

Sigma-Optimality for Active Learning on Gaussian Random Fields [.pdf] -Yifei Ma, 11/13

Beyond Poisson: Modeling Inter-Arrival Times of Requests in a Datacenter [.pdf] - Da-Cheng Juan, 11/13

Experimental Evaluation of Feature Selection Methods for Clustering [.pdf] -Martin Azizyan, 10/13

Latent Session Model for Web User Clustering: A case study on modeling users of an online real estate website [.pdf] - Haijie Gu, 8/13

Semi-supervised Data Clustering with Coupled Non-negative Matrix Factorization: Sub-category Discovery of Noun Phrases in NELL's Knowledge Base [.pdf] - Chunlei Liu, 8/13

Analysis of Crime in Pittsburgh [.pdf] - Aaditya Ramdas, 5/13

Fast and Effective Similarity Searching in Large MIDI databases [.pdf] - Guangyu Xia, 5/13

What Makes Paris Look Like Paris? [.pdf] - Carl Doersch, 5/13

Semi-supervised context-aware discovery of unknown audio concepts [.pdf] - Antonio Juarez, 3/13

Semi-Supervised Classification for Intracortical Brain-Computer Interfaces [.pdf] - Will Bishop, 12/12

Learning Frames from Text with an Unsupervised Latent Variable Model [.pdf] - Brendan O'Connor, 10/12

TREEGL: Reverse Engineering Tree-Evolving Gene Networks Underlying Developing Biological Lineages [.pdf] - Ankur Parikh, 10/12

Automated Learning of Subcellular Location Patterns in Confocal Fluorescence Images from Human Protein Atlas [.pdf] - Jieyue Li, 10/12

Conditional Sparse Coding and Multiple Regression for Grouped Data [.pdf] - Min Xu, 5/12

Active, semi-supervised learning to utilize human oracles [.pdf] - Robert Fisher, 5/12

Decoding Word Semantics from Magnetocencephalography Time Series Transformations [.pdf] - Alona Fyshe, 5/12

Layered Timeseries Analysis for Smart Grid Agents [.pdf] - Prashant Reddy, 5/12

Learning Global Properties of Scene Images Based on Their Correlational Structures [.pdf] - Wooyoung Lee, 5/12

Understanding the Interaction between Interests, Conversations and Friendships in Facebook [.pdf] - Qirong Ho, 4/12

Trade-offs in Explanatory Model Learning [.pdf] - Madalina Fiterau, 3/12

Dynamics of Visual Category Learning with Magnetoencephalography [.pdf] - Yang Xu, 12/11

Online Detection of Unusual Events in Videos via Dynamic Sparse Coding [.pdf] - Bin Zhao, 11/11

Valid Statistical Inference on Automatically Matched Files [.pdf] - Rob Hall, 11/11

Modeling Correlated Purchase Behavior in Large-Scale Networks:A Markov Random Field (MRF) Approach [.pdf] - Liye Ma, 4/11

Multi-factor Analysis for Classifying fMRI Brain Images [.pdf] - Sung Won Park, 4/11

Learning the Sparsity Parameter in a Generalized Fast Subset Sums Framework for Bayesian Event Detection [.pdf] - Kan Shao, 4/11

CANTINA+: A Feature-rich Machine Learning Framework for Detecting Phishing Web Sites [.pdf] - Guang Xiang, 4/11

Comparing Data Sources in High Dimensions [.pdf] - Di Liu, 3/11

Cross-Species Queries of Large Gene Expression Databases [.pdf] - Hai-Son Le, 3/11

Clustering Under Natural Stability Assumptions [.pdf] - Pranjal Awasthi, 3/11

Automated Unmixing of Complex Protein Subcellular Location Patterns [.pdf] - Tao Peng, 2/11

Extracting Subpopulations from Large Social Networks [.pdf] - Bin Zhang, 2/11

Inferring Rates of Domain Shuffling Using a Birth-Death and Gain Model - Maureen Stolzer, 1/11

Anomaly Detection for Astronomical Data [.pdf] - Liang Xiong, 12/10

An Efficient Proximal Gradient Method for General Structured Sparse Learning [.pdf] - Xi Chen, 11/10

Learning Dynamic Models from Non-sequenced Data [.pdf] - Tzu-Kuo Huang, 11/10

Multiple Domain User Personalization [.pdf] - Yucheng Low, 11/10

Learning Opponent's Strategies In the RoboCup Small Size League [.pdf] - Felipe Trevizan, 10/10

Parallel Splash Belief Propagation [.pdf] - Joseph Gonzalez, 5/10

Trends and Differences in Industrial Safety Perception survey results: Subset Selection and Minimax Bound based Hypothesis Testing - Liu Yang, 5/10

Learning to Tag using Noisy Labels [.pdf] - Edith Law, 4/10

Polonium: Tera-Scale Graph Mining for Malware Detection [.pdf] - Duen Horng Chau, 4/10

Data Mining with MapReduce: Graph and Tensor Algorithms with Applications [.pdf] - Charalampos Tsourakakis, 4/10

Learning Directed Graphical Models from Nonlinear and Non-Gaussian Data [.pdf] - Robert Tillman, 3/10

Semi-parametric Methods for Estimating Time-varying Graph Structure [.pdf] - Mladen Kolar, 2/10

Genetic Population Structure in Pacific Islanders [.pdf] - Suyash Shringarpure, 2/10

Discovery of Student Strategies using Hidden Markov Model Clustering [.pdf] - Benjamin Shih, 1/10

Grasping in Primates: Mechanics and Neural Basis [.pdf] - Lucia Castellanos, 12/09

Parallel WalkSAT with Clause Learning [.pdf] - Austin McDonald, 12/09

Semi-Supervised Discovery of Named Entities and Relations from the Web [.pdf] - Sophie Wang, 11/09

Learning Stable Linear Dynamical Systems [.pdf] - Byron Boots, 6/09

Structured Correspondence Topic Models for Mining Captioned Figures in Biological Literature [.pdf] - Amr Ahmed, 5/09

FilterBoost: Regression and Classification on Large Datasets [.pdf] - Joseph Bradley, 5/09

Learning Compressible Models [.pdf] - Yi Zhang, 5/09

Center-Piece Subgraphs: Problem Definition and Fast Solutions [.pdf] - Hanghang Tong, 8/08

Graph-Based Semi-Supervised Learning as a Generative Model [.pdf] - Jingrui He, 8/08

Information Propagation on the Web: Patterns and a Model [.pdf] - Mary McGlohon, 11/07

Maximum Likelihood Estimation in Latent Class Models for Contingency Table Data [.pdf] - Yi Zhou, 11/07

A Comparison of Methods for Transductive Transfer Learning [.pdf] - Andrew Arnold, 5/07

Learning Factors Analysis - A General Method for Cognitive Model Evaluation and Improvement [.pdf] - Hao Cen, 5/07

The Complexity of Interactive Machine Learning [.pdf] - Stephen Hanneke, 5/07

T-cube: Fast Extraction of Time Series from Large Datasets [.pdf] - Maheshkumar Sabhnani, 5/07

Learning Selectively conditioned Forest Structures with Applications to DBNs and Classification [.pdf] - Brian Ziebart, 5/07

Large-Scale Automated Analysis of Location Patterns in Randomly-Tagged 3T3 Cells [.pdf] - Juchang Hua, 4/07

Continuous Hidden Process Model for Time Series [.pdf] - Yanxin Shi, 4/07

Modeling Networks Using Kronecker Multiplication [.pdf] - Jurij Leskovec, 4/07

Gene Family Classification using a Semi-Supervised Learning Method [.pdf] - Nan Song, 1/07

Feature Reduction for Improved Recognition of Subcellular Location Patterns in Fluorescence Microspoe Images [.pdf] - Kai Huang, 11/06

Intelligent Light Control using Sensor Networks [.pdf] - Vipul Singhvi, 9/06

Incremental Hierarchical Clustering of Text Documents [.pdf] - Nachiketa Sahoo, 5/06

Using Customer's Reported Forecasts to Predict Future Sales [.pdf] - Nihat Altintas, 5/06

Data Mining in Macroeconomic Data Sets [.pdf] - Ping Chen, 4/06

Dynamic Social Network Analysis using Latent Space Models [.pdf] - Purnamrita Sarkar, 4/06

Active Learning for Identifying Function Threshold Boundaries [.pdf] - Brent Bryan, 4/06

Anomoly Detection in Multivariate Time Series [.pdf] - Kustav Das, 3/06

On the Number of Experiments Sufficient and in the Worst Case Necessary to Identify All Causal Relations Among N Variables [.pdf] - Frederick Eberhardt, 9/05

N-1 Experiments Suffice to Determine the Causal Relations Among N Variables [.pdf] - Frederick Eberhardt, 9/05

Conditional Density Estimation using Finite Mixture Models with an Application to Astrophysics [.pdf] - Alex Rojas-Pena, 7/05

Location proteomics - Building subcellular location trees from high resolution 3D fluorescence microsope images of randomly-tagged proteins [.pdf] - Xiang Chen, 5/05

Tabu Search Enhanced Markov Blanket Classifier for High Dimensional Data Sets [.pdf] - Xue Bai, 1/05

Clustering Short Time Series Gene Expression Data [.pdf] - Jason Ernst, 11/04

A Hierarchical Graphical Model for Record Linkage [.pdf] - Pradeep Ravikumar, 5/04

Learning Robust Rules from Data: The GenTree Algorithm [.pdf] - Yiheng Li, 4/04

Advances in Network Tomography [.pdf] - Edoardo Airoldi, 10/03

Improved Recognition of Protein Subcellular Location Patterns via Feature Selection and Classifier Ensembles [.pdf] - Kai Huang, 8/03

Fractal Dimension for Data Mining [.pdf] - Sree Krishna Kumaraswamy, 7/03

Tools for Graph Mining [.pdf] - Yiping Zhan, 6/03

Using Machine Learning to Detect Cognitive States across Multiple Subjects [.pdf] - Xuerui Wang, 5/03

People Tracking Using Many Simple Sensors [.pdf] - Daniel Wilson, 5/03

Simultaneous Localization and Mapping using Sparse Extended Iinformation Filters [.pdf] - Yufeng Liu, 4/03

Multi-agent Learning in Extensive Games with Complete Information [.pdf] - Pu Huang, 1/03

Compromising Privacy with Trail Re-Identification: The REIDIT Algorithms [.pdf] - Bradley Malin, 12/02

Mining Computer Tutor-Student Interaction Data to Assess Students Reading and Predict Future Behavior [.pdf] - Peng Jia, 10/02

A Method for Automatically Finding Interpretations of Reduced Dimension Representations [.pdf] - Marc Fasnacht, 9/02

A Method for Automatically Finding Structural Motifs in Proteins [.pdf] - Marc Fasnacht, 9/02

Learning Rich Neural Network Topologies [.pdf] - Matteo Matteucci, 7/02

The Structure of the Unobserved [.pdf] - Ricardo Silva, 6/02

Learning from Labeled and Unlabeled Data with Label Propagation [.pdf] - Xiaojin Zhu, 6/02

The "DGX" Distribution for Mining Massive, Skewed Data [.pdf] - Zhiqiang Bi, 5/02

Diffusion Kernels on Graphs and Other Discrete Input Spaces [.pdf] - Risi Imre Kondor, 4/02

Large-scale Automated Forecasting using Fractals [.pdf] - Deepayan Chakrabarti, 4/02

Planning for Single and Multiple Actors in Markov Decision Processes with Deterministic Hidden State [.pdf] - Jamie Schulte, 12/01

Boosting and Maximum Likelihood for Expontial Models [.pdf] - Guy Lebanon, 9/01

Framework for using grocery data for early detection of Bio-terrorism attacks [.pdf] - Anna Goldenberg, 9/01

Causal Inference [.pdf] and Additive Models[.pdf] - Tianjiao Chu, 5/01

Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories [.pdf] - Rayid Ghani, 5/01

A Boosting Approach to Topic Spotting on Subdialogues [.pdf] - Kary Myers, 2000