Interview de Vassilis Christophides

08.06.2020 - 30.09.2020

Can you present your research during your stay at CY|AS?

Machine Learning (ML) algorithms typically operate by learning patterns in historical data and generalizing them to unseen data. There is growing recognition that even ML models developed with the best of intentions may exhibit discriminatory biases, perpetuate inequality, or perform less well for historically disadvantaged groups. Harms to particular individuals or groups are essentially caused by “biased data” a notion that encompass many forms of “bugs” specific to data-driven decision systems. In this talk, we are presenting statistics and causal analytics approaches to unveil discrimination practices in high-stake applications like criminal justice and predictive policing, credit-worthiness and loans, etc. During my stay at CY|AS we surveyed existing research efforts on ethical ML balancing between fairness and accuracy of predictions. We acknowledge the fact that “biased data” are often due to various imperfections arising during data collection or data processing.  Then, we started investigate how to detect, report and prevent data ethics issues at the earliest possible stage of the pipelines used to build ML models. This call for tools to diagnose whether a given fairness issue might be addressed by collecting more training data from a particular subpopulation or by better cleaning existing training data and to predict how much more data are need to gather or to repair. We are currently working on how we can actively guide upstream data cleaning to jointly optimize fairness and accuracy of downstream ML models.

Does the Fellows-in-Residence program meet your objectives in terms of research and scientific collaboration?

The Fellows-in-Residence program of CY|AS gave me a great opportunity to focus on novel research issues lying at the intersection of Data Engineering tasks with Machine Learning.  The ongoing collaboration of my hosting lab with with PJGN (pôle judiciaire pour la gendarmerie nationale) of Cergy Pontoise provided a strong motivation for starting investigating data ethics issues in ML applications like criminal justice and predictive policing. The multi-disciplinary exchanges with my fellow colleagues (in physics or health) at CY|AS allow me to better consolidate the methodological standpoint of my work in particular the need to render explicit causal effect relationships of multi-variate data in order to address fairness issues of automated decisions.

Describe your impressions of your experience at CY|AS

Despite the fact that the lockdown significantly restricted our exchanges, I really enjoyed my stay this academic year at CY|AS. First, the scientific program of the institute led by Arnaud Lefranc and Flora Koukiou was really exciting: I am feeling lucky to be able to attend guest lectures and seminars covering so many different research areas and topics (from sociology and political sciences, to high energy physics, chemistry and neurosciences). Next, I am really impressed by the commitment of all administrative stuff (Karine Gambier-Leroy, Ratana Pok, Francesca Rondina) in creating a competitive and friendly hosting environment for foreign researchers. Finally, working in my office at the Maison internationale de la recherche (MIR) was really inspiring.

What will this research period bring to you and to your home university?

A whole brand new research activity on data ethics that will be exploited in new graduate courses and research projects.

Do you have other plans for the future, other destinations in mind?

Next year I will prefer to spend more time with my family.