Le Master IASD commence par un semestre de tronc commun consacré aux disciplines fondamentales de l’IA et des sciences des données. À la fin du premier semestre, les étudiants choisissent six cours d’approfondissement pour le second semestre, parmi l’offre d’options. Les étudiants ont également l’occasion de suivre deux semaines intensives PSL permettant des ouvertures thématiques vers d’autres disciplines ou applications. L’année se poursuit par un stage effectué dans un laboratoire de recherche académique ou industriel et se conclut en septembre par la rédaction d’un mémoire et sa soutenance publique.

Tronc commun

Acquisition, extraction et stockage de données (IASD)
Data acquisition, extraction, and storage
Nombre d'heures
27h
ECTS
4 crédits
URL du cours
https://moodle.psl.eu/course/view.php?id=27287
Responsable(s)
Pierre Senellart

L'objectif de ce cours est de présenter les principes et techniques utilisés pour acquérir, extraire, intégrer, nettoyer, prétraiter, stocker et interroger des ensembles de données, qui peuvent ensuite être utilisées comme données d'entrée pour former différents modèles d'intelligence artificielle. Le cours sera constitué d'un mélange de leçons et de travaux pratiques. Nous aborderons les aspects suivants :

  • Acquisition de données Web (exploration Web, API Web, données ouvertes, problèmes juridiques)
  • Extraction d'informations à partir de données semi-structurées
  • Nettoyage de données et déduplication de données
  • Formats de données et modèles de données
  • Stockage et traitement de données dans des bases de données, en mémoire principale ou dans des fichiers simples
  • Introduction au traitement de données à grande échelle avec MapReduce et Spark
  • Introduction à la gestion de données incertaines
Projet science des données (IASD)
Data science lab
Nombre d'heures
27h
ECTS
4 crédits
URL du cours
https://www.lamsade.dauphine.fr/~bnegrevergne/ens/ProjetDataScience/
Responsable(s)
Benjamin Negrevergne

Note : la description de ce cours n'est pas disponible en Français.

The goal of this module is to provide students with a hands-on experience on a novel data-science/AI challenge using state-of-the-art tools and techniques discussed during other classes of this master. Students enrolled in this class will form groups and choose one topic among a list of proposed topics in the core areas of the master such as supervised or unsupervised learning, recommendation, game AI, distributed or parallel data-science, etc. The topics will generally consist in applying a well-established technique on a novel data-science challenge or in applying recent research results on a classical data-science challenge. Either way, each topic will come with its own novel scientific challenge to address. At the end of the module, the students will give an oral presentation to demonstrate their methodology and their findings. Strong scientific rigor as well as very good engineering and communication skills will be necessary to complete this module successfully.

Apprentissage profond pour l'analyse d'images (IASD)
Deep learning for image analysis
Nombre d'heures
27h
ECTS
4 crédits
URL du cours
http://cours.cmm.mines-paristech.fr/wiki/doku.php/deep/start
Responsable(s)
Étienne Decencière

Note : la description de ce cours n'est pas disponible en Français.

Deep learning has achieved formidable results in the image analysis field in recent years, in many cases exceeding human performance. This success opens paths for new applications, entrepreneurship and research, while making the field very competitive. This course aims at providing the students with the theoretical and practical basis for understanding and using deep learning for image analysis applications.

The course will be composed of lectures and practical sessions. Moreover, experts from industry will present practical applications of deep learning.

Lectures will include:

• Artificial neural networks, back-propagation algorithm
• Convolutional neural network
• Design and optimization of a neural architecture
• Successful architectures (AlexNet, VGG, GoogLeNet, ResNet)
• Analysis of neural network function
• Image classification and segmentation
• Auto-encoders and generative networks
• Vision transformers
• Current research trends and perspectives.

During the practical sessions, the students will code in Python, using Keras and Tensorflow. They will be confronted with the practical problems linked to deep learning: architecture design; optimization schemes and hyper-parameter selection; analysis of results.

Prerequisites: Linear algebra, basic probability and statistics

Fondamentaux de l'apprentissage automatique (IASD)
Foundations of machine learning
Nombre d'heures
27h
ECTS
4 crédits
Responsable(s)
Francis Bach

Note : la description de ce cours n'est pas disponible en Français.

The goal of this class is to present old and recent results in learning theory, for the most widely-used learning architectures. This class is geared towards theory-oriented students as well as students who want to acquire a basic mathematical understanding of algorithms used throughout the masters program.

A particular effort will be made to prove many results from first principles, while keeping the exposition as simple as possible. This will naturally lead to a choice of key results that show-case in simple but relevant instances the important concepts in learning theory. Some general results will also be presented without proofs.

The class will be organized in nine three-hour sessions, each with a precise topic (a chapter from the book in preparation "Learning theory from first principles").

Prerequisites: We will prove results in class so a good knowledge of undergraduate mathematics is important, as well as basic notions in probability. Having followed an introductory class on machine learning is beneficial.

Modèles de langage de grande taille (IASD)
Large language models
Nombre d'heures
27h
ECTS
4 crédits
Responsable(s)
Alexandre Allauzen

Note : la description de ce cours n'est pas disponible en Français.

Large language models play an important role to access and generate different kind of contents as well as to communicate in different languages. This course focuses on deep learning methods for natural language processing (NLP), from basics to large language models. The goal is to introduce some important concepts of deep learning and NLP, and what are the theoritical and empirical components for :

  • Natural Language Processing and deep learning basics
  • Sequence models in NLP (n-grams, convolution, and recurrent models)
  • The Transformer architecture
  • Text classification
  • Generative models for NLP

Lab sessions in pytorch complements the course.

Prerequisites:

  • Regular python skills
  • Understanding of basic concepts in machine learning like: logistic regression, loss function, optimization by gradient descent

Rcommended readings:

Optimisation pour l'apprentissage automatique (IASD)
Optimization for machine learning
Nombre d'heures
48h
ECTS
6 crédits
Responsable(s)
Gabriel Peyré

Note : la description de ce cours n'est pas disponible en Français.

This course will review the mathematical foundations for Machine Learning, as well as the underlying algorithmic methods and showcases some modern applications of a broad range of optimization techniques.

Optimization is at the heart of most recent advances in machine learning. This includes of course most basic methods (linear regression, SVM and kernel methods). It is also the key for the recent explosion of deep learning which are state of the art approaches to solve supervised and unsupervised problems in imaging, vision and natural language processing.

This course will review the mathematical foundations, the underlying algorithmic methods and showcases some modern applications of a broad range of optimization techniques. The course will be composed of both classical lectures and numerical sessions in Python. The first part covers the basic methods of smooth optimization (gradient descent) and convex optimization (optimality condition, constrained optimization, duality). The second part will features more advanced methods (non-smooth optimization, SDP programming,interior points and proximal methods). The last part will cover large scale methods (stochastic gradient descent), automatic differentiation (using modern python framework) and their application to neural network (shallow and deep nets).

Further information on the course page.

Apprentissage par renforcement (IASD)
Reinforcement learning
Nombre d'heures
27h
ECTS
4 crédits
Responsable(s)
Olivier Cappé

Note : la description de ce cours n'est pas disponible en Français.

Reinforcement Learning (RL) refers to scenarios where the learning algorithm operates in closed-loop, simultaneously using past data to adjust its decisions and taking actions that will influence future observations. Algorithms based on RL concepts are now commonly used in programmatic marketing on the web, robotics or in computer game playing. All models for RL share a common concern that in order to attain one's long-term optimality goals, it is necessary to reach a proper balance between exploration (discovery of yet uncertain behaviors) and exploitation (focusing on the actions that have produced the most relevant results so far).

The methods used in RL draw ideas from control, statistics and machine learning. This introductory course will provide the main methodological building blocks of RL, focussing on probabilistic methods in the case where both the set of possible actions and the state space of the system are finite. Some basic notions in probability theory are required to follow the course.

  • Probabilistic and statistical tools for RL: Markov chains and conditioning, importance sampling, stochastic approximation, Bayesian modelling, hypothesis testing, concentration inequalities
  • Models: Markov decision processes (MDP), multiarmed bandits and other models
  • Planning: finite and infinite horizon problems, value functions, Bellman equations, dynamic programming, value and policy iteration
  • Basic learning tools: Monte Carlo methods, temporal-difference learning, policy gradient
  • Optimal exploration in multiarmed bandits: the explore vs exploit tradeoff, pure exploration, lower bounds, the UCB algorithm, Thompson sampling
  • Extensions: Contextual bandits, pure exploration, optimal exploration for MDP

References

Options

Apprentissage automatique avancé
Advanced machine learning
Nombre d'heures
24h
ECTS
3 crédits
Responsable(s)
Yann Chevaleyre

Note : la description de ce cours n'est pas disponible en Français.

This research-oriented module will focus on some selected advanced topics in machine learning, including

  • Kernel methods
  • Losses and Algorithms Beyond classification
  • Bayesian Machine Learning
  • Generative machine learning

The evaluation consists in an individual presentation of a research paper in one of the above topics.

Choix social computationnel
Computational social choice
Nombre d'heures
24h
ECTS
3 crédits
Responsable(s)
Jérôme Lang Dominik Peters

Note : la description de ce cours n'est pas disponible en Français.

The aim of this course is to give an overview of the problems, techniques and applications of computational social choice, a multidisciplinary topic at the crossing point of computer science (especially artificial intelligence, operations research, theoretical computer science, multi-agent systems, computational logic, web science) and economics. The course consists of the analysis of problems arising from the aggregation of preferences of a group of agents from a computational perspective. On the one hand, it is concerned with the application of techniques developed in computer science, such as complexity analysis or algorithm design, to the study of social choice mechanisms, such as voting procedures or fair division algorithms. On the other hand, computational social choice is concerned with importing concepts from social choice theory into computing. For instance, social welfare orderings originally developed to analyse the quality of resource allocations in human society are equally well applicable to problems in multi-agent systems or network design. The course will focus on normative aspects, computational aspects, and real-world applications (including some case studies). Program: 1. Introduction to social choice. 2. Computing hard voting rules and preference aggregation functions. Application to aggregating web page rankings. 3. Strategic issues: manipulation, control, game-theoretic analyses of voting. Short introduction to algorithmic mechanism design. 4. Preference aggregation on combinatorial domains. 5. Communication issues in voting: voting with incomplete preferences, elicitation protocols, communication complexity, low-communication mechanisms. 6. Fair division of indivisible goods. 7. Cake cutting algorithms 8. Matching under preferences 9. Coalition formation. 10. Specific applications and case studies (varying every year): rent division, kidney exchange, school assignment, group recommendation systems…

Bibliographie, lectures recommandées

Handbook of Computational Social Choice (F. Brandt, V. Conitzer, U. Endriss, J. Lang, A. Procaccia, eds.), Cambridge University Press, 2016. Algorithmics of Matching Under Preferences (D. Manlove), World Scientific, 2013.

Cours de sciences des données au Collège de France
Data science lectures at Collège de France
Nombre d'heures
24h
ECTS
3 crédits
URL du cours
https://www.college-de-france.fr/fr/chaire/stephane-mallat-sciences-des-donnees-chaire-statutaire/events
Responsable(s)
Stéphane Mallat

Traiter des données pour valider une hypothèse ou estimer des paramètres est longtemps resté du ressort exclusif des statistiques. Cependant, l’augmentation de leur dimension a fait exploser la combinatoire des possibles. Cette malédiction de la dimensionalité est une difficulté centrale de l’analyse de données, que ce soit des images, sons, textes, ou des mesures expérimentales comme en physique, biologie ou économie. Modéliser et représenter les structures cachées des données fait appel à diverses branches des mathématiques, mais aussi à l’informatique. Les algorithmes d’apprentissage statistique, comme les réseaux de neurones, sont configurés pour optimiser l’analyse des données à partir d’exemples. Ils sont à l’origine des résultats spectaculaires de l’intelligence artificielle. Les applications scientifiques, industrielles et sociétales sont considérables, et leurs performances progressent bien plus vite que notre maîtrise de leurs propriétés mathématiques.

La chaire de Stéphane Mallat au Collège de France propose un enseignement de mathématiques appliquées, qui tente de combler le fossé entre la jungle des nouveaux développements algorithmiques et la compréhension des principes généraux sous-jacents. Les applications couvrent tous les aspects du traitement du signal et de l’apprentissage statistique. Au-delà des statistiques et des probabilités, cela fait appel à l’analyse harmonique, à l’optimisation et à la géométrie. L’étude d’applications et de nouveaux algorithmes est proposée dans le cadre de challenges de données, qui sont organisés par la chaire.

L’équipe de recherche de Stéphane Mallat à l’ENS étudie les principes permettant de structurer l’analyse de données pour échapper à la malédiction de la dimensionalité. Elle développe notamment des modèles de réseaux de neurones, basés sur des principes de séparation d’échelles par ondelettes, de parcimonie et d’invariance. Les applications concernent aussi bien la reconnaissance d’images ou de sons que l’estimation de mesures physiques. Pour plus d’informations, on pourra consulter le site internet de l’équipe de recherche.

Apprentissage par renforcement profond et applications
Deep reinforcement learning and applications
Nombre d'heures
24h
ECTS
3 crédits
Responsable(s)
Eric Benhamou

Note : la description de ce cours n'est pas disponible en Français.

What will you learn in this class?

  • Intro and Course Overview
  • Supervised Learning behaviours
  • Intro to Reinforcement Learning
  • Policy Gradients
  • Actor-Critic Algorithms (A2C, A3C and Soft AC)
  • Value Function Methods
  • Deep RL with Q-functions
  • Advanced Policy Gradient (DDPG, Twin Delayed DDPG)
  • Trust Region & Proximal Policy Optimization (TRPO, PPO)
  • Optimal Control and Planning
  • Model-Based Reinforcement Learning
  • Model-Based Policy Learning
  • Exploration and Stochastic Bandit in RL
  • Exploration with Curiosity and Imagination
  • Offline RL and Generalization issues
  • Offline RL and Policy constraints

Why DRL?

  • Is a very promising type of learning as it does not need to know the solution
  • Only needs the rules and good rewards
  • Combines best aspects of deep learning and reinforcement learning.
  • Can lead to impressive results in games, robotic, finance

References

  • Goodfellow, Bengio, Deep Learning
  • Sutton & Barto, Reinforcement Learning: An Introduction
  • Szepesvari, Algorithms for Reinforcement Learning
  • Bertsekas, Dynamic Programming and Optimal Control, Vols I and II
  • Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming
  • Powell, Approximate Dynamic Programming
Analyse de graphes
Graph analytics
Nombre d'heures
24h
ECTS
3 crédits
Responsable(s)
Daniela Grigori

Note : la description de ce cours n'est pas disponible en Français.

The objective of this course course is to give students an overview of the field of graph analytics. The objective of this course course is to give students an overview of the field of graph analytics . Since graphs form a complex and expressive data type, we need methods for representing graphs in databases, manipulating, querying, analyzing and mining them.Moreover, graph applications are very diverse and need specific algorithms.
The course presents new ways to model, store, retrieve, mine and analyze graph-structured data and some examples of applications.
Lab sessions are included allowing students to practice graph analytics: modeling a problem into a graph database and performing analytical tasks over the graph in a scalable manner.

Program

1. Introduction to graph management and mining
2. Graph databases – Neo4J
3. Query language for graphs – Cypher
4. Graph Processing Frameworks (Pregel, .., GraphX)
5. Graph applications : mining social-network graphs, mining logs, fraud detection, ..

Bibliographie, lectures recommandées

Ian Robinson, Jim Weber, Emil Eifrem, Graph Databases, Editeur : O’Reilly (4 juin 2013), ISBN-10: 1449356265
Eric Redmond, Jim R. Wilson, Seven Databases in Seven Weeks – A Guide to Modern Databases and the NoSQL Movement, Publisher: Pragmatic Bookshelf
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing, SIGMOD ’10, ACM, New York, NY, USA, 135-146
Xin, Reynold & Crankshaw, Daniel & Dave, Ankur & Gonzalez, Joseph & J. Franklin, Michael & Stoica, Ion. (2014). GraphX: Unifying Data-Parallel and Graph-Parallel Analytics.
Michael S. Malak and Robin East, Spark GraphX in Action, Manning, June 2016

Apprentissage incrémental, théorie des jeux et applications
Incremental learning, game theory, and applications
Nombre d'heures
24h
ECTS
3 crédits
Responsable(s)
Rida Laraki

Note : la description de ce cours n'est pas disponible en Français.

This course will focus on the behavior of learning algorithms when several agents are competing against one another: specifically, what happens when an agent that follows an online learning algorithm interacts with another agent doing the same? The natural language to frame such questions is that of game theory, and the course will begin with a short introduction to the topic, such as normal form games (in particular zero-sum, potential, and stable games), solution concepts (such as dominated/rationalizable strategies, Nash, correlated and coarse equilibrium notions, ESS), and some extensions (Blackwell approachability). Subsequently, we will examine the long-term behavior of a wide variety of online learning algorithms (fictitious play, regret-matching, multiplicative/exponential weights, mirror descent and its variants, etc.), and we will discuss applications to generative adversarial networks (GANs), traffic routing, prediction, and online auctions.

[1] Nicolò Cesa-Bianchi and Gábor Lugosi, Prediction, learning, and games, Cambridge University Press, 2006.
[2] Drew Fudenberg and David K. Levine, The theory of learning in games, Economic learning and social evolution, vol. 2, MIT Press, Cambridge, MA, 1998.
[3] Sergiu Hart and Andreu Mas-Colell, Simple adaptive strategies: from regret matching to uncoupled dynamics, World Scientific Series in Economic Theory – Volume 4, World Scientific Publishing, 2013.
[4] Vianney Perchet, Approachability, regret and calibration: implications and equivalences, Journal of Dynamics and Games 1 (2014), no. 2, 181–254.
[5] Shai Shalev-Shwartz, Online learning and online convex optimization, Foundations and Trends in Machine Learning 4 (2011), no. 2, 107–194.

Graphes de connaissance, logiques de description, raisonnement sur les données
Knowledge graphs, description logics, reasoning on data
Nombre d'heures
24h
ECTS
3 crédits
Responsable(s)
Michaël Thomazo

Note : la description de ce cours n'est pas disponible en Français.

Introduction to Knowledge Graphs, Description Logics and Reasoning on Data. Knowledge graphs are a flexible tool to represent knowledge about the real world. After presenting some of the existing knowledge graphs (such as DBPedia, Wikidata or Yago) , we focus on their interaction with semantics, which is formalized through the use of so-called ontologies. We then present some central logical formalism used to express ontologies, such as Description Logics and Existential Rules. A large part of the course will be devoted to study the associated reasoning tasks, with a particular focus on querying a knowledge graph through an ontology. Both theoretical aspects (such as the tradeoff between the expressivity of the ontology language versus the complexity of the reasoning tasks) and practical ones (efficient algorithms) will be considered.

Program:

1. Knowledge Graphs (history and uses)
2. Ontology Languages (Description Logics, Existential Rules)
3. Reasoning Tasks (Consistency, classification, Ontological Query Answering)
4. Ontological Query Answering (Forward and backward chaining, Decidability and complexity, Algorithms, Advanced Topics)
References:

— The description logic handbook: theory, implementation, and applications. Baader et al., Cambridge University Press
— Foundations of Semantic Web Technologies, Hitzler et al., Chapman&Hall/CRC
— Web Data Management, Abiteboul et al., Cambridge University Press Prerequisites:
— first-order logic;
— complexity (Turing machines, classical complexity classes) is a plus.

Apprentissage automatique sur Big Data
Machine learning on big data
Nombre d'heures
24h
ECTS
3 crédits
Responsable(s)
Dario Colazzo

Note : la description de ce cours n'est pas disponible en Français.

This course focuses on the typical, fundamental aspects that need to be dealt with in the design of machine learning algorithms that can be executed in a distributed fashion, typically on Hadoop clusters, in order to deal with big data sets, by taking into account scalability and robustness. Nowadays there is an ever increasing demand of machine learning algorithms that scales over massives data sets.
In this context, this course focuses on the typical, fundamental aspects that need to be dealt with in the design of machine learning algorithms that can be executed in a distributed fashion, typically on Hadoop clusters, in order to deal with big data sets, by taking into account scalability and robustness. So the course will first focus on a bunch of main-stream, sequential machine learning algorithms, by taking then into account the following crucial and complex aspects. The first one is the re-design of algorithms by relying on programming paradigms for distribution and parallelism based on map-reduce (e.g., Spark, Flink, ….). The second aspect is experimental analysis of the map-reduce based implementation of designed algorithms in order to test their scalability and precision. The third aspect concerns the study and application of optimisation techniques in order to overcome lack of scalability and to improve execution time of designed algorithm.

The attention will be on machine learning technique for dimension reduction, clustering and classification, whose underlying implementation techniques are transversal and find application in a wide range of several other machine learning algorithms. For some of the studied algorithms, the course will present techniques for a from-scratch map-reduce implementation, while for other algorithms packages like Spark ML will be used and end-to-end pipelines will be designed. In both cases algorithms will be analysed and optimised on real life data sets, by relaying on a local Hadoop cluster, as well as on a cluster on the Amazon WS cloud.

References:

– Mining of Massive Datasets
http://www.mmds.org

– High Performance Spark – Best Practices for Scaling and Optimizing Apache Spark
Holden Karau, Rachel Warren
O’Reilly

Méthodes à noyau pour l'apprentissage
Machine learning with kernel method
Nombre d'heures
24h
ECTS
3 crédits
Responsable(s)
Julien Mairal
Mathématiques de l'apprentissage profond
Mathematics of deep learning
Nombre d'heures
24h
ECTS
3 crédits
Responsable(s)
Bruno Loureiro

Note : la description de ce cours n'est pas disponible en Français.

The past decade has witnessed a surge in the development and adoption of deep learning algorithms to solve day-a-day computational tasks. Surprisingly, the modern deep learning practice defies some of the established textbook intuition provided by traditional statistical learning theory and convex optimisation. For example, deep networks defy the classical bias-variance tradeoff by generalising and interpolating at the same time, and descent-based methods are successfully employed in the optimisation of highly non-convex objectives. These empirical observations translate to exciting theoretical challenges which require new mathematical ideas.

The goal of this course is two-fold. First, to familiarise the student with some of the major mathematical challenges posed by modern machine learning. Second, to give an overview of some of the progress made over the past few years in understanding some of these problems. This includes the large-width limit of neural networks both in the lazy and feature rich regime, the analysis of benign overfitting in overparametrised models, tools from non-convex optimisation theory, among others.

Recherche Monte-Carlo et jeux
Monte-Carlo search and games
Nombre d'heures
24h
ECTS
3 crédits
URL du cours
https://www.lamsade.dauphine.fr/~cazenave/MonteCarloSearch.html
Responsable(s)
Tristan Cazenave

La recherche Monte-Carlo a révolutionné la programmation des jeux. Elle se combine bien avec le Deep Learning pour créer des systèmes qui jouent mieux que les meilleurs joueurs humains à des jeux comme le Go, les Echecs, le Hex ou le Shogi. Elle permet aussi d’approcher des problèmes d’optimisation difficiles. Dans ce cours nous traiterons des différents algorithmes de recherche Monte-Carlo comme UCT, GRAVE ou le Monte-Carlo imbriqué et l’apprentissage de politique de playouts. Nous verrons aussi comment combiner recherche Monte-Carlo et apprentissage profond. Le cours sera validé par un projet portant sur un jeu ou un problème d’optimisation difficile.

Bibliographie, lectures recommandées

Intelligence Artificielle Une Approche Ludique, Tristan Cazenave, Editions Ellipses, 2011.

Problèmes inverses non-convexes
Non-convex inverse problems
Nombre d'heures
18h
ECTS
3 crédits
Responsable(s)
Irène Waldspurger

Note : la description de ce cours n'est pas disponible en Français.

An inverse problem is a problem where the goal is to recover an unknown object (typically a vector with real coordinates, or a matrix), given a few “measurements” of this object, and possibly some information on its structure. In this course, we will discuss examples of such problems, motivated by applications as diverse as medical imaging, optics and machine learning. We will especially focus on the questions: which algorithms can we use to numerically solve these problems? When and how can we prove that the solutions returned by the algorithms are correct? These questions are relatively well understood for convex inverse problems, but the course will be on non-convex inverse problems, whose study is much more recent, and a very active research topic.

The course will be at the interface between real analysis, statistics and optimization. It will include theoretical and programming exercises.

Bases de données NoSQL
NoSQL databases
Nombre d'heures
24h
ECTS
3 crédits
Responsable(s)
Paul Boniol
Le but de ce cours est de passer en revue les technologies de gestion de données qui peuvent être utilisées pour gérer de grands volume de données, au-delà des systèmes de gestion de base de données relationnelles classiques. Ces systèmes, regroupés de manière générique sous l'appellation « NoSQL », regroupent en réalité des approches très hétérogènes : modèles de données différents (par exemple, basés sur des représentation sous forme XML, graphe ou de triplets), compromis différents pour assurer un passage à l'échelle (avec en particulier abandon de propriétés ACID), distribution des données sur un très grand ensemble de nœuds, etc. Il s'agira de présenter les choix de technologies et les problématiques de recherche de certains de ces systèmes, et de comprendre dans quelles situations ils sont utiles.
Nuages de points et modélisation 3D
Point clouds and 3D modeling
Nombre d'heures
24h
ECTS
3 crédits
URL du cours
http://caor-mines-paristech.fr/fr/cours-npm3d/
Responsable(s)
François Goulette

Ce cours donne un panorama des concepts et techniques d’acquisition, de traitement et de visualisation des nuages de points 3D, et de leurs fondements mathématiques et algorithmiques. Le cours abord notamment les thème suivants :

Systèmes de perception 3D
Traitements et opérateurs
Recalage
Segmentation de nuages de points
Reconstruction de courbes et surfaces
Modélisation par primitives
Rendu de nuages de points et maillages
Certaines séances sont complétées d’un TP.

Confidentialité pour l'apprentissage
Privacy for machine learning
Nombre d'heures
24h
ECTS
3 crédits
URL du cours
https://moodle.ens.psl.eu/course/view.php?id=2772
Responsable(s)
Muni Pydi

Note : la description de ce cours n'est pas disponible en Français.

This course covers the basics of Differential Privacy (DP), a framework that has become, in the last ten years, a de facto standard for enforcing user privacy in data processing pipelines. DP methods seek to reach a proper trade-off between protecting the characteristics of individuals and guaranteeing that the outcomes of the data analysis stays meaningful.

The first part of the course is devoted the basic notion of epsilon-DP and understanding the trade-off between privacy and accuracy, both from the empirical and statistical points of view. The second half of the course will cover more advanced aspects, including the different variants of DP and the their use to allow for privacy-preserving training of large and/or distributed machine learning models.

  • Motivations, traditional approaches, randomized response
  • Definition and properties of differential privacy
  • Mechanisms for discrete/categorical data
  • Mechanisms for continuous data
  • Alternative notions of differential privacy
  • Differential privacy for statistical learning
  • Attacks and connections with robustness
  • Local differential privacy and federated learning

This course does not have any prerequisite, except from basic knowledge of probabilities, statistics and Python programming.

Validation is through homeworks (on Python notebooks) and the defense of a group project done on a research paper.

Recommended Readings
Semaine intensive PSL (Data@PSL)
PSL Intensive Week (Data@PSL)
Nombre d'heures
30h
ECTS
2 crédits
URL du cours
https://data-psl.github.io/intensive-week/
Responsable(s)
Alexandre Allauzen

Stage

Stage de recherche IASD (IASD)
IASD research internship
ECTS
10 crédits