Carnegie Mellon University
September 30, 2016

ML Team wins 1st Place Innovation Award on Data Science

During the Data for Policy 2016 Conference the 1st Place Innovation Award on Data Science was presented to Maria De Arteaga Gonzalez and Artur Dubrawski for their paper: Discovery of Complex Anamalous Patterns of Sexual Violence in El Salvador. Maria is a PhD Student in the Machine Learning & Public Policy Joint PhD program.

Their research focuses on the discovery of anomalous patterns of sexual violence in El Salvador. When sexual violence is a product of organized crime or social imaginary, as previous literature has shown to be the case of El Salvador, links between sexual violence episodes can be understood as a latent structure.  With this assumption in place, and using data of all officially reported rapes in El Salvador over a period of nine years, we aim to uncover complex anomalous patterns. To the best of our knowledge, this is the first use of pattern detection methods to gain understanding of sexual violence. We conduct bivariate analyses of conditional distributions and visualize the results through pivot table heat maps that are easy for practitioners to understand. We also perform spatiotemporal anomaly detection using T-Cube, an efficient data structure developed at the Auton Lab. This allows us to perform fast screening over all the data and find points in time when a given subregion shows statistically significant increases of a specific type of crime. The features defining the type of crime that has increased can include the type of location where the crime occurred (e.g. empty lot), the relationship between the victim and the aggressor (e.g. acquaintance), and the age range of the victim, among others.

Using the proposed framework, we find evidence of patterns that should be addressed by policy makers. One of our most relevant findings corresponds to a pattern in the east of the country where girls between 12 and 14 years old report being raped by their boyfriends at anomalous rates, with a peak taking place in the first half of 2008. Sexual violence among teenagers in this region of the country has been informally discussed in the past, but without any evidence the conversation is often dismissed and attributed to a stereotype rather than a reality. We believe our results provide evidence of a phenomenon that calls for the design of efficient policies.

Finally, we propose that the anomaly detection model used in our research could be implemented as a monitoring device for real-time detection of emerging patterns of sexual violence, which could enable the development of effective policies and responses.

Data for Policy was hosted by the University of Cambridge, UK. This is an interdisciplinary conference focused on the potentials of data science for government and policy-making. This year's main topic was 'Frontiers of Data Science for Government: Ideas, Practices and Projections'. The award was funded by the London Innovation Society (LIS). All early career research submissions to the Data for Policy conference were considered for the award.