Retour à la liste des PSL-week

C2ENS---13 | Data mining and modeling for behavioral sciences and beyond

Data mining and modeling for behavioral sciences and beyond
30
English
ENS - Département d'Etudes Cognitives, 29 rue d'Ulm, 75005 Paris (accès sans badge ENS par le 24 rue Lhomond);
Lundi 4 mars : salle Langevin (Bâtiment Jaurès, 1e étage);
Mardi 5 mars : salle Langevin (Bâtiment Jaurès, 1e étage);
Mercredi 6 mars : salle Berthier U207 (Bâtiment Jaurès, 2e étage);
Jeudi 7 mars : salle Ribot (Bâtiment Jaurès, RDC);
Vendredi 8 mars : salle Ribot (Bâtiment Jaurès, RDC)
We are currently facing an explosion of data across domains and disciplines.
In this context, the ability to manipulate and understand large amounts of complex, rich, multidimensional data has become critical in science but also for many applications outside academia. Unfortunately, a strong initial training in quantitative sciences (math, physics, computer science) is often thought to be required to understand complex data using various tools from statistics and machine learning.

The objective of this PSL week is to provide students with broader academic backgrounds with knowledge and first-hand experience with several tools that are used to manipulate and understand large amounts of multidimensional data - ranging from statistical analyses, multivariate regressions, dimensionality reduction techniques, to the modeling of the hidden generative processes that give rise to observed data.

Behavioral data are a prime example of complex, rich data that are increasingly collected and used in academic research and beyond to better understand human psychology, predict specific patterns of real-life behavior, but also rethink the relation between mental health disorders.
Students will use these different types of data to get first-hand experience with the several important tools seen in class. They will also get guidelines for arbitrating between different tools to address specific research questions.

Specific aims:
- learn about widely-applied tools for manipulating and understanding large amounts of data;
- develop an understanding of how these tools work and what they can provide in terms of answers without strong initial training in quantitative sciences;
- get first-hand experience with these tools by applying them to different types of data;
- study in detail a set of use cases from recent research papers
1/ Introduction to data mining and modeling;
- Concepts and approaches to the description of data;
- Important statistical measures of data (mean, variance, covariance);
- Conceptual differences in data science: correlation vs causation, explanation vs prediction?
- Dealing with uncertainty in data (sample size, confidence intervals, bootstrapping);

2/ Statistical modeling of data using regression-based techniques;
- Building and fitting regression models to data;
- Understanding the effect of covariates on regression models;
- Improving the robustness of regression models using regularization;

3/ Dimensionality reduction using Principal Component Analysis;
- Understanding the curse of dimensionality in data mining;
- Applying feature extraction using Principal Component Analysis (PCA);
- Understanding and selecting Principal Components (PCs);
- Improving the robustness of regression models using PCA;

4/ Generative modeling of data;
- Differences between descriptive and generative modeling of data;
- Building generative models of observed data;
- Fitting, comparing and validating generative models of the same data;
- Particle filtering methods: how they work and when to use them;

5/ Making predictions from data;
- Differences between explaining and predicting data;
- Understanding classification- and regression-based predictions;
- When a good fit can be bad: Occam's razor, overfitting and cross-validation techniques;

Examples of data types used in class:
- human body dimensions;
- psychometric scores of personality using the Big-5 model;
- interindividual variations in mental health in the general population;
- human decision rates across >10,000 risky choice problems;
- electromagnetic brain responses to oriented stimuli in primary visual cortex;
- artificial neural network activations trained to play slot machines;

Note that this is the first edition of this PSL week: the precise program may therefore still change slightly before the start of the PSL week.

 
Short presentation describing and motivating the use of some of the tools seen in class on a specific dataset that will be seen in class and provided to students

 
Basic knowledge (but strong interest) in linear algebra, probability and statistics

 
WYART Valentin

 
Valentin Wyart (Inserm team leader, LNC2/DEC/ENS-PSL) - teaching supervisor;
Junseok K. Lee (postdoctoral affiliate, LNC2/DEC/ENS-PSL) - teaching assistant