Introduction
The extraordinary spread of computers and online data is changing
forever the way that important decisions are made in many organizations.
Hospitals now analyze online medical records to decide which treatments
to apply to future patients, banks analyze past financial records
to learn to spot future fraud, and factories analyze past operations
to learn to produce higher quality goods. Scientific research in
many fields, notably the biological sciences, is also undergoing
significant change as a result of dramatic increases in online
data. Understanding the most effective ways of using the vast amounts
of data that are now being stored is a significant challenge to
society, and therefore to science and technology, as it seeks to
obtain a return on the huge investment that is being made in computerization
and data collection. Advances in the development of automated techniques
for data analysis and decision making requires interdisciplinary
work in areas such as machine learning algorithms, the statistical
and computational principles that underly these algorithms, database
and data warehousing methods, complexity analysis, data visualization,
privacy and security issues, and application areas such as business,
marketing, and public policy.
Carnegie Mellon
University's doctoral program in Machine
Learning is designed to train students to become
tomorrow's leaders in this rapidly growing area. The program is
part of CMU's Machine Learning Department which is made up of a
multi-disciplinary team of faculty and students across several
academic departments. Machine Learning is dedicated to furthering
scientific understanding of automated learning and to producing
the next generation
of tools for data analysis and decision making based on that understanding.
Today's demand
for expertise in machine learning far exceeds
the supply, and this imbalance will become
more severe over the coming decade. Through a combination of interdisciplinary
coursework, hands-on applications, and cutting-edge research, graduates
of the Ph.D. program in Machine Learning
will be uniquely positioned to pioneer new developments in this
field, and to be leaders in both industry and academia.
Overview of
Ph.D. Program Requirements
- Completion of required courses within 3 years
- Completion
of the Data Analysis project and MS degree, within 2 years
- Mastery of proficiencies in Programming; Teaching; Conference
Presentation and Research skills
- Successful defense of a Ph.D. thesis
Course Requirements
The
curriculum for the Machine Learning Ph.D. is built on a foundation
of five
core courses and three
electives (plus the Data Analysis Project
requirement). These five courses also comprise the required
courses for the
MS degree. Together with the Data Analysis Project requirement,
these should be completed during the first two years of study.
A typical full-time, graduate course load during the first two
years consists each term of two classes (at 12 graduate units per
class) plus 24 units of advanced research. Thus, during the first
two years, a student has the opportunity to take several elective
classes in addition to the five required courses.
The ML curriculum
joins courses with a Computer Science main theme and those with
a Probability and Statistics main theme. These
may be grouped, as follows:
In CS, relevant
sub-fields include: Databases; Machine Learning, Data Mining
and Algorithms applications in areas such as Robotics,
Information Retrieval and AI.
In Statistics (including
Philosophy), the sub-fields include: Statistical modeling (e.g.,
hierarchical and times series); Bayes'
Nets, Causation, and experimental design. The curriculum is based
on core academic courses on Intermediate Statistics, Machine Learning,
Statistical Approaches for Learning & Discovery, Multimedia
Databases, and Algorithms.
The five core courses provide, respectively: a secure foundation in mathematical
statistics, a survey of basic machine learning techniques with numerous applications;
the statistical and probabilistic theoretical underpinnings for these techniques;
an introduction to databases for data mining, and a study of advanced algorithms.
Possible
electives
10-910 Data
Analysis Project Requirement, in the second year, which serves
in lieu of an MS
thesis.
Here is a typical schedule for the first two years of study.
| Fall 1 |
Spring 1 |
| 10-701 Machine Learning |
10-702
Statistical Machine Learning
|
| 10-705 Intermediate Statistics |
15-750 Algorithms |
| 10-920 Research |
10-920 Research |
| Fall 2 |
Spring 2 |
| Elective |
10-910
Independent Study for the Data Analysis Project |
| Elective |
15-826 Databases |
| 10-920 Research |
10-920 Research |
The Data Analysis
Project requirement: 10-910
During the second year a Ph.D. student is required to demonstrate data mining
skills in the context of a focused project. The Data
Analysis
Project may be carried
out either at Carnegie Mellon or at a sponsoring corporate institution under
the joint supervision of the sponsor and a ML faculty. It will be concluded
by a written report (in lieu of a Masters Thesis) in which the student demonstrates
an ability to approach data mining problems in a way that cuts across existing
disciplinary boundaries. The requirement includes giving a ML colloquium
on the Data Analysis Project report. Passing this requirement will be the
judgment of the ML faculty, under the advice of the faculty advisor for the
project.
The
Third Year
During the third year, a Ph.D. student completes the elective course requirements.
One of these three electives is taken from the offerings in Statistics. The
other two advanced electives, chosen in consultation with the students advisor,
form a concentration in one of the allied disciplines with SCS, Biology,
Philosophy, or Tepper School of Business. For those candidates seeking an
academic position after completing the ML Ph.D. degree, the thoughtful selection
of
these three
elective courses is particularly important. As in the each of the first two
years, coursework is supplemented by 24 units/term of research.
The Fourth
Year and Beyond
A Ph.D. student typically presents a thesis proposal no later than the start
of the fourth year, and then spends the fourth and sometimes fifth year working
on their thesis research.
Research
It is expected
that all Ph.D. students engage in active research from their
first semester. Moreover, advisor selection occurs in the first
month of entering the Ph.D. program, with the option to change
at a later time. Roughly half of a student's time should be allocated
to research and lab work, and half to courses until these are
completed.
Financial Support
Machine Learning
is committed to providing full tuition and stipend support for
the
academic
year, for each full-time ML Ph.D. student, for a period
of 5 years. Research opportunities are constrained by funding availability.
ML's funding commitments assume that the student is making satisfactory
progress in the program, as reported to the student at the end of
each academic term. Students are strongly encouraged to compete for
outside fellowships and other sources of financial support. ML will
supplement these outside awards in order to fulfill its obligations
for tuition and stipend support.
Application Information
|