Carnegie Mellon University
Data Analysis Projects

Student Data Analysis Projects

Students are required to demonstrate their grasp of fundamental data analysis and machine learning concepts and techniques in the context of a focused project. The project should focus on a substantive problem involving the analysis of one or more data sets and the application of state-of-the art machine learning and data mining methods, or on suitable simulations where this is deemed appropriate. Or, the project may focus on machine learning methodology and demonstrate its applicability to substantial examples from the relevant literature. The project may involve the development of new methodology or extensions to existing methodology.

Truck Traffic Monitoring with Satellite Images [.pdf] - Lynn H. Kaack, 11/18

Empirical Performance of Approximate Algorithms for Low Rank Approximation [.pdf] - Dimitris Konomis, 8/18

Weakly Supervised Stance Learning Using Social-Media Hashtags [.pdf] - Sumeet Kumar, 12/18

Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection [.pdf] - Lisa Lee, 12/18

Estimating Latent Structure of Acquisitions from Text [.pdf] - Jiayi Li, 12/18

Quantile Regression for Final Hospitalization Rate Prediction [.pdf] - Nuoyu Li, 12/18

Generating Activity Schedules for Human Agents Used in Simulations [.pdf] - Yu Liu, 11/18

Machine Learning-Aided Modeling of Fixed Income Instruments [.pdf] - Daniel Martin, 12/18

Exploiting Labeling Bias in Learning from Positive and Unlabeled Data [.pdf] - Naji Shajarisales, 12/18

Consumer Heterogeneity Modeling by Hierarchical Gaussian Process [.pdf] - Xiaoting Sun, 12/18

Medical Missing Data Imputation by Stackelberg GAN [.pdf] - Hongyang Zhang, 12/18

Learning-based Power and Runtime Modeling for Convolutional Neural Networks [.pdf] - Ermao Cai, 5/18

Semi-Supervised Code Generation by Editing Intents [.pdf] - Edgar Chen, 5/18

Optimization Improving Tomography Compared with HEDM Image Using Data Mining Methods [.pdf] - Rulin Chen, 5/18

A Graph-Based Model to Discover Preference Structure from Choice Data [.pdf] - Cristóbal De La Maza, 5/18

Unsupervised Learning of Procedures from Task Videos [.pdf] - Karan Goel, 5/18

Recovering Developmental Dynamics from Single-Cell Data via Penalized Principal Curves [.pdf] - Slav Kirov, 5/18

Tuning the Molecular Weight Distribution from Atom Transfer Radical Polymerization Using Actor Critic Methods [.pdf] - Haichen Li, 5/18

Computational Modeling of Human Multimodal Language: The MOSEI Dataset and Interpretable Dynamic Fusion [.pdf] - Paul Pu Liang, 5/18

Likelihood-free Inference of Fornax Dark Matter Density Profile [.pdf] - Mao-Sheng Liu, 5/18

Segmenting the Brain via Sparse Inverse Covariance Estimation and Graph-Based Clustering on High-Dimensional fMRI data [.pdf] - Alnur Ali, 11/17

Estimating Heterogeneous Treatment Effects of a Fractions Tutor [.pdf]  - Christoph Dann, 11/17

Generative Adversarial Image Refinement for Handwriting Recognition [.pdf] - Deepak Dilipkumar, 11/17

Automated Phenotyping System for Energy Crops [.pdf] - Simon Shaolei Du, 11/17

Ongoing Influenza Activity Inference with Real-time Digital Surveillance Data [.pdf] - Lisheng Gao, 12/17

Canonical Least Squares Clustering on Sparse Medical Data [.pdf] - Igor Gitman, 12/17

Learning Deep Generative Models With Discrete Latent Variables [.pdf] - Hengyuan Hu, 11/17

Learning Object States from Videos [.pdf] - Liang-Kang Huang, 12/17

Bus Transit Time Prediction using GPS Data with Artificial Neural Networks [.pdf] - Fan Jiang, 11/17

Prediction of Viral Symptoms Through Temporal Data - Mu-Chu Lee, 12/17

MMD GAN: Towards Deeper Understanding of Moment Matching Network [.pdf] - Chun-Liang Li, 11/17

Feature Selection for Real-time Estimates of Influenza-like Illness [.pdf] - Jun Li, 12/17

DAP: LSTM-CRF Auto-encoder [.pdf] - Yuan Liu, 12/17

Neuron Session Alignment in Calcium Imaging Data [.pdf] - Yangyi Lu, 11/17

Dimensionality Reduction of Astronomical Spectroscopic Data using Autoencoders [.pdf] - Quanbin Ma, 12/17

Understanding Cell Reprogramming Through Changes in Gene Expression Networks - Vivek Nangia, 12/17

FarmView: Regression Analysis of 2016 Sorghum Composition [.pdf] - Ben Parr, 12/17

Predicting High-order Chromatin Interactions from Human Genomic Sequence using Deep Neural Networks [.pdf] - Rui Peng, 12/17

Thinking Outside the Bins: Evolution of Galaxy Morphology Over Cosmic Time [.pdf] - Jining Qin, 4/16 ADA paper

Wireless Mesh Network design by Optimization of Spectral Properties and Max-k-Cut [.pdf] - Veeranjaneyulu Sadhanala, 11/17

Understanding the Neural Basis of Speech Production using Machine Learning [.pdf] - Otilia Stretcu, 11/17

Spontaneously Emerging Object Part Segmentation [.pdf] - Yijie Wang, 12/17

Experimental Design Approaches to Maximum Stress Prediction for Lightweight Structure Designs [.pdf] - Yining Wang, 12/17

CASE-QA: Context and Syntax Embeddings for Question Answering on Stack Overflow [.pdf] - Ezra Winston, 12/17

Medical Diagnosis from Laboratory Tests by Combining Generative and Discriminative Learning [.pdf] - Pengtao Xie, 12/17

On the Prediction of Risk for Autism from Common Variants [.pdf] - Lingxue Zhu, 9/15 ADA paper

Greedy Algorithms for Sparse Dictionary Learning [.pdf] - Varun Joshi, 5/17

Robust Detection of Radiation Threat [.pdf] - Eric Lei, 5/17

Using Neural Networks to Improve Single Cell RNA-Seq Data Analysis [.pdf] - Chieh Lin, 5/17

Efficient Fusion of Aggregated Historical Data [.pdf] - Zongge Liu, 5/17

A Co-Evolving Correlated Dynamic Topic Model for Time Series Clickstream and Purchase Data [.pdf] - Hayden Luse, 5/17

Characterization of Crowd Behavior in Institutional and Industrial Settings - Michael Muehl, 5/17

Differential Parameter Learning [.pdf] - Adarsh Prasad, 5/17

A Probabilistic Generative Grammar for Semantic Parsing [.pdf] - Abu Saparov, 5/17

Identifying Air Conditioners Using Monthly Household Electricity Consumption [.pdf] - Evan Sherwin, 5/17 (revised 6/19)

Ordinal Data Analysis via Graphical Models [.pdf] - Arun Sai Suggala, 5/17

Design of Weight-Learning Efficient Convolutional Modules in Deep Convolutional Neural Networks and its Application to

Large-Scale Visual Recognition Tasks [.pdf] - Felix Juefei Xu, 5/17

Adaptive Depth Computational Policies for Efficient Visual Tracking [.pdf] - Chris Ying, 5/17

Learning to Skim Text [.pdf] - Adams Wei Yu, 5/17

Expert-Guided Machine Learning for Synthesizing Ti02 Polymorphs [.pdf] - Chiqun Zhang, 5/17

Domain Adaptation with Adversarial Neural Networks and Auto-Encoders [.pdf] - Han Zhao, 5/17

Word Sense Disambiguation Using Semi-Supervised Naive Bayes with Ontological Constraints [.pdf] - Jakob Bauer, 12/16

Recurrent Neural Network Embedding for Knowledge-base Completion [.pdf] - Yuxing Zhang, 12/16

Matching Multifrequency Clinical Time Series [.pdf] - Yi Wei, 12/16

Predicting the Onset of Tachycadia for Patients in Intensive Care Units [.pdf] - Lidan Mu, 12/16

The Influence of the Sinking Strike Zone on Major League Baseball's Strikeout Epidemic [.pdf] - Adam Brodie, 12/16

A Penalized Regression Model for the Joint Estimation of eQTL Associations and Gene Network Structure [.pdf] - Micol Marchetti-Bowick, 12/16

Batch Policy Gradient Methods for Improving Seq2Seq Conversion Models [.pdf] - Kirthevasan Kandasamy, 12/16

Nonparanormal Distributions & Causal Inference with Single-Cell RNA-Seq Data [.pdf] - Elizabeth Silver, 12/16

The Role of Syntax in Semantic Processing: a Study of Active and Passive Voicings [.pdf] - Nicole Rafidi, 4/16

Noise-Robust Spectral Clustering - Carlton Downey, 4/16

Scalable Gaussian Processes for Characterizing Multidimensonal Change Surfaces [.pdf] - William Herlands, 4/16

Canonical Autocorrelation Analysis for Radiation Threat Detection [.pdf] - Maria De Arteaga, 4/16

Establishing a Statistical Link Between Network Oscillations and Neural Synchrony [.pdf] - Pengcheng Zhou, 4/16

Point Type Inference in Heating, Ventilation and Air Conditioning Systems - Jingkun Gao, 4/16

Source Identification in H1N1 Flu Infection [.pdf]- Bin Deng, 4/16

Informative Student Models for Personalized Education - Joseph Runde, 4/16

Investment Manager Discussions and Stock Returns: a Word Embedding Approach [.pdf] - Lili Gao, 4/16

Exploring consumer spending behavior for prepaid cards using latent topic models and dynamical systems [.pdf] - Su Zhou, 4/16

Functional Linear Models for Brain Data [.pdf] - Junier Oliva, 4/16

Identifying Influential Users in Social Network with Review Data [.pdf] - Yilin He, 4/16

Linear Time Samplers for Supervised Topic Models using Compostional Proposals [.pdf] - Xun Zheng, 4/16

Acoustic Scene Recognition with Deep Learning [.pdf] - Wei Dai, 4/16

Influenza Trend Prediction using Kalman Filters and Particle Filters [.pdf] - Ying Zhang, 4/16

Understanding the relationship between Functional and Structural Connectivity of Brain Networks [.pdf] - Sashank Jakkam Reddi, 11/15

Cost-Effective Feature Selection and Ordering for Personalized Energy Estimates [.pdf] - Kirstin Early, 11/15

Automated Coding of Open-Ended Survey Responses [.pdf] - Dallas Card, 11/15

Predicting Structure in Handwritten Algebra Data from Low Level Features [.pdf] - James Duyck, 11/15

Sample Efficient Learning to Make Decisions with a Focus on Education [.pdf] - Min Hyung Lee, 11/15

Target Predictions of Small Molecules using LINCS Data [.pdf] - Yan Xia, 11/15

Local analysis of traveling waves in the monkey primary motor cortex [.pdf] - Jessica Chemali, 5/15

Machine Learning Methods for Interatomic Potentials: Application to Boron Carbide [.pdf] - Qin Gao, 4/15

Spectral Method for Topic Modeling with Hierarchical Structure - Hsiao-Yu Tung, 4/15

Exploring Spatio-temporal Neural Correlates of Face Learning [.pdf] - Ying Yang, 4/15

Jointly Modeling Aspects, Ratings and Sentiments for Movie Recommendation [.pdf] - Chao-Yuan Wu, 4/15

A New View of Predictive State Methods for Dynamical System Learning [.pdf] - Ahmed Hefny, 4/15

Estimating Accuracy from Unlabeled Data [.pdf] - Anthony Platanios, 4/15

An Analysis of Cross-Device Search [.pdf] - George Montañez, 3/15

Graph Structure Learning from Unlabeled Data for Event Detection [.pdf] - Sriram Somanchi, 3/15

Tumor phylogenetic lineage separation by medoidshift clustering with non-postitive kernel [.pdf] - Lu Xie, 3/15

Consistency in Extending Problem-solving Procedures Indicates Expertise [.pdf] - Qiong Zhang, 11/14

A General Approach to Prediction and Forecasting Crime Rates with Gaussian Processes [.pdf] -Seth Flaxman, 5/14

Time-varying Linear Regression with Total Variation Regularization [.pdf] - Matthew Wytock, 5/14

An integrated approach to validating ChIP-Seq using A* Lasso for Sparse Bayesian Network Learning [.pdf] - Jing Xiang, 4/14

Robust Data-Driven State Estimation for Smart Grid [.pdf] - Yang Weng, 3/14

Prioritizing Malware Analysis [.pdf] - Zhen Tang, 2/14

Automated Discovery of Novel Anomalous Patterns [.pdf] - Edward McFowland III, 12/13

Dynamic Pattern Detection with Temporal Consistency and Connectivity Constraints [.pdf] - Skyler Speakman, 11/13

Sigma-Optimality for Active Learning on Gaussian Random Fields [.pdf] -Yifei Ma, 11/13

Beyond Poisson: Modeling Inter-Arrival Times of Requests in a Datacenter [.pdf] - Da-Cheng Juan, 11/13

Experimental Evaluation of Feature Selection Methods for Clustering [.pdf] -Martin Azizyan, 10/13

Latent Session Model for Web User Clustering: A case study on modeling users of an online real estate website [.pdf] - Haijie Gu, 8/13

Semi-supervised Data Clustering with Coupled Non-negative Matrix Factorization: Sub-category Discovery of Noun Phrases in NELL's Knowledge Base [.pdf] - Chunlei Liu, 8/13

Analysis of Crime in Pittsburgh [.pdf] - Aaditya Ramdas, 5/13

Fast and Effective Similarity Searching in Large MIDI databases [.pdf] - Guangyu Xia, 5/13

What Makes Paris Look Like Paris? [.pdf] - Carl Doersch, 5/13

Semi-supervised context-aware discovery of unknown audio concepts [.pdf] - Antonio Juarez, 3/13

Semi-Supervised Classification for Intracortical Brain-Computer Interfaces [.pdf] - Will Bishop, 12/12

Learning Frames from Text with an Unsupervised Latent Variable Model [.pdf] - Brendan O'Connor, 10/12

TREEGL: Reverse Engineering Tree-Evolving Gene Networks Underlying Developing Biological Lineages [.pdf] - Ankur Parikh, 10/12

Automated Learning of Subcellular Location Patterns in Confocal Fluorescence Images from Human Protein Atlas [.pdf] - Jieyue Li, 10/12

Conditional Sparse Coding and Multiple Regression for Grouped Data [.pdf] - Min Xu, 5/12

Active, semi-supervised learning to utilize human oracles [.pdf] - Robert Fisher, 5/12

Decoding Word Semantics from Magnetocencephalography Time Series Transformations [.pdf] - Alona Fyshe, 5/12

Layered Timeseries Analysis for Smart Grid Agents [.pdf] - Prashant Reddy, 5/12

Learning Global Properties of Scene Images Based on Their Correlational Structures [.pdf] - Wooyoung Lee, 5/12

Understanding the Interaction between Interests, Conversations and Friendships in Facebook [.pdf] - Qirong Ho, 4/12

Trade-offs in Explanatory Model Learning [.pdf] - Madalina Fiterau, 3/12

Dynamics of Visual Category Learning with Magnetoencephalography [.pdf] - Yang Xu, 12/11

Online Detection of Unusual Events in Videos via Dynamic Sparse Coding [.pdf] - Bin Zhao, 11/11

Valid Statistical Inference on Automatically Matched Files [.pdf] - Rob Hall, 11/11

Modeling Correlated Purchase Behavior in Large-Scale Networks:A Markov Random Field (MRF) Approach [.pdf] - Liye Ma, 4/11

Multi-factor Analysis for Classifying fMRI Brain Images [.pdf] - Sung Won Park, 4/11

Learning the Sparsity Parameter in a Generalized Fast Subset Sums Framework for Bayesian Event Detection [.pdf] - Kan Shao, 4/11

CANTINA+: A Feature-rich Machine Learning Framework for Detecting Phishing Web Sites [.pdf] - Guang Xiang, 4/11

Comparing Data Sources in High Dimensions [.pdf] - Di Liu, 3/11

Cross-Species Queries of Large Gene Expression Databases [.pdf] - Hai-Son Le, 3/11

Clustering Under Natural Stability Assumptions [.pdf] - Pranjal Awasthi, 3/11

Automated Unmixing of Complex Protein Subcellular Location Patterns [.pdf] - Tao Peng, 2/11

Extracting Subpopulations from Large Social Networks [.pdf] - Bin Zhang, 2/11

Inferring Rates of Domain Shuffling Using a Birth-Death and Gain Model - Maureen Stolzer, 1/11

Anomaly Detection for Astronomical Data [.pdf] - Liang Xiong, 12/10

An Efficient Proximal Gradient Method for General Structured Sparse Learning [.pdf] - Xi Chen, 11/10

Learning Dynamic Models from Non-sequenced Data [.pdf] - Tzu-Kuo Huang, 11/10

Multiple Domain User Personalization [.pdf] - Yucheng Low, 11/10

Learning Opponent's Strategies In the RoboCup Small Size League [.pdf] - Felipe Trevizan, 10/10

Parallel Splash Belief Propagation [.pdf] - Joseph Gonzalez, 5/10

Trends and Differences in Industrial Safety Perception survey results: Subset Selection and Minimax Bound based Hypothesis Testing - Liu Yang, 5/10

Learning to Tag using Noisy Labels [.pdf] - Edith Law, 4/10

Polonium: Tera-Scale Graph Mining for Malware Detection [.pdf] - Duen Horng Chau, 4/10

Data Mining with MapReduce: Graph and Tensor Algorithms with Applications [.pdf] - Charalampos Tsourakakis, 4/10

Learning Directed Graphical Models from Nonlinear and Non-Gaussian Data [.pdf] - Robert Tillman, 3/10

Semi-parametric Methods for Estimating Time-varying Graph Structure [.pdf] - Mladen Kolar, 2/10

Genetic Population Structure in Pacific Islanders [.pdf] - Suyash Shringarpure, 2/10

Discovery of Student Strategies using Hidden Markov Model Clustering [.pdf] - Benjamin Shih, 1/10

Grasping in Primates: Mechanics and Neural Basis [.pdf] - Lucia Castellanos, 12/09

Parallel WalkSAT with Clause Learning [.pdf] - Austin McDonald, 12/09

Semi-Supervised Discovery of Named Entities and Relations from the Web [.pdf] - Sophie Wang, 11/09

Learning Stable Linear Dynamical Systems [.pdf] - Byron Boots, 6/09

Structured Correspondence Topic Models for Mining Captioned Figures in Biological Literature [.pdf] - Amr Ahmed, 5/09

FilterBoost: Regression and Classification on Large Datasets [.pdf] - Joseph Bradley, 5/09

Learning Compressible Models [.pdf] - Yi Zhang, 5/09

Center-Piece Subgraphs: Problem Definition and Fast Solutions [.pdf] - Hanghang Tong, 8/08

Graph-Based Semi-Supervised Learning as a Generative Model [.pdf] - Jingrui He, 8/08

Information Propagation on the Web: Patterns and a Model [.pdf] - Mary McGlohon, 11/07

Maximum Likelihood Estimation in Latent Class Models for Contingency Table Data [.pdf] - Yi Zhou, 11/07

A Comparison of Methods for Transductive Transfer Learning [.pdf] - Andrew Arnold, 5/07

Learning Factors Analysis - A General Method for Cognitive Model Evaluation and Improvement [.pdf] - Hao Cen, 5/07

The Complexity of Interactive Machine Learning [.pdf] - Stephen Hanneke, 5/07

T-cube: Fast Extraction of Time Series from Large Datasets [.pdf] - Maheshkumar Sabhnani, 5/07

Learning Selectively conditioned Forest Structures with Applications to DBNs and Classification [.pdf] - Brian Ziebart, 5/07

Large-Scale Automated Analysis of Location Patterns in Randomly-Tagged 3T3 Cells [.pdf] - Juchang Hua, 4/07

Continuous Hidden Process Model for Time Series [.pdf] - Yanxin Shi, 4/07

Modeling Networks Using Kronecker Multiplication [.pdf] - Jurij Leskovec, 4/07

Gene Family Classification using a Semi-Supervised Learning Method [.pdf] - Nan Song, 1/07

Feature Reduction for Improved Recognition of Subcellular Location Patterns in Fluorescence Microspoe Images [.pdf] - Kai Huang, 11/06

Intelligent Light Control using Sensor Networks [.pdf] - Vipul Singhvi, 9/06

Incremental Hierarchical Clustering of Text Documents [.pdf] - Nachiketa Sahoo, 5/06

Using Customer's Reported Forecasts to Predict Future Sales [.pdf] - Nihat Altintas, 5/06

Data Mining in Macroeconomic Data Sets [.pdf] - Ping Chen, 4/06

Dynamic Social Network Analysis using Latent Space Models [.pdf] - Purnamrita Sarkar, 4/06

Active Learning for Identifying Function Threshold Boundaries [.pdf] - Brent Bryan, 4/06

Anomoly Detection in Multivariate Time Series [.pdf] - Kustav Das, 3/06

On the Number of Experiments Sufficient and in the Worst Case Necessary to Identify All Causal Relations Among N Variables [.pdf] - Frederick Eberhardt, 9/05

N-1 Experiments Suffice to Determine the Causal Relations Among N Variables [.pdf] - Frederick Eberhardt, 9/05

Conditional Density Estimation using Finite Mixture Models with an Application to Astrophysics [.pdf] - Alex Rojas-Pena, 7/05

Location proteomics - Building subcellular location trees from high resolution 3D fluorescence microsope images of randomly-tagged proteins [.pdf] - Xiang Chen, 5/05

Tabu Search Enhanced Markov Blanket Classifier for High Dimensional Data Sets [.pdf] - Xue Bai, 1/05

Clustering Short Time Series Gene Expression Data [.pdf] - Jason Ernst, 11/04

A Hierarchical Graphical Model for Record Linkage [.pdf] - Pradeep Ravikumar, 5/04

Learning Robust Rules from Data: The GenTree Algorithm [.pdf] - Yiheng Li, 4/04

Advances in Network Tomography [.pdf] - Edoardo Airoldi, 10/03

Improved Recognition of Protein Subcellular Location Patterns via Feature Selection and Classifier Ensembles [.pdf] - Kai Huang, 8/03

Fractal Dimension for Data Mining [.pdf] - Sree Krishna Kumaraswamy, 7/03

Tools for Graph Mining [.pdf] - Yiping Zhan, 6/03

Using Machine Learning to Detect Cognitive States across Multiple Subjects [.pdf] - Xuerui Wang, 5/03

People Tracking Using Many Simple Sensors [.pdf] - Daniel Wilson, 5/03

Simultaneous Localization and Mapping using Sparse Extended Iinformation Filters [.pdf] - Yufeng Liu, 4/03

Multi-agent Learning in Extensive Games with Complete Information [.pdf] - Pu Huang, 1/03

Compromising Privacy with Trail Re-Identification: The REIDIT Algorithms [.pdf] - Bradley Malin, 12/02

Mining Computer Tutor-Student Interaction Data to Assess Students Reading and Predict Future Behavior [.pdf] - Peng Jia, 10/02

A Method for Automatically Finding Interpretations of Reduced Dimension Representations [.pdf] - Marc Fasnacht, 9/02

A Method for Automatically Finding Structural Motifs in Proteins [.pdf] - Marc Fasnacht, 9/02

Learning Rich Neural Network Topologies [.pdf] - Matteo Matteucci, 7/02

The Structure of the Unobserved [.pdf] - Ricardo Silva, 6/02

Learning from Labeled and Unlabeled Data with Label Propagation [.pdf] - Xiaojin Zhu, 6/02

The "DGX" Distribution for Mining Massive, Skewed Data [.pdf] - Zhiqiang Bi, 5/02

Diffusion Kernels on Graphs and Other Discrete Input Spaces [.pdf] - Risi Imre Kondor, 4/02

Large-scale Automated Forecasting using Fractals [.pdf] - Deepayan Chakrabarti, 4/02

Planning for Single and Multiple Actors in Markov Decision Processes with Deterministic Hidden State [.pdf] - Jamie Schulte, 12/01

Boosting and Maximum Likelihood for Expontial Models [.pdf] - Guy Lebanon, 9/01

Framework for using grocery data for early detection of Bio-terrorism attacks [.pdf] - Anna Goldenberg, 9/01

Causal Inference [.pdf] and Additive Models[.pdf] - Tianjiao Chu, 5/01

Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories [.pdf] - Rayid Ghani, 5/01

A Boosting Approach to Topic Spotting on Subdialogues [.pdf] - Kary Myers, 2000