Syllabus – Master IASD

The IASD Master’s program begins with a core semester devoted to the fundamental disciplines of AI and data science, consitsing of four common courses and three courses specific to the two tracks, Computer Science and Mathematics. At the end of the first semester, students choose six additional courses for the second semester, including the opportunity to follow an intensive PSL week, allowing them to open up to other disciplines or applications. The year continues with an internship in an academic or industrial research laboratory, ending in September with the writing of the master thesis and its public defense.

Mandatory courses

Data science lab

Projet science des données

Benjamin Negrevergne

Duration: 24h
ECTS: 4 credits
In charge: Benjamin Negrevergne

The goal of this module is to provide students with a hands-on experience on a novel data-science/AI challenge using state-of-the-art tools and techniques discussed during other classes of this master. Students enrolled in this class will form groups and choose one topic among a list of proposed topics in the core areas of the master such as supervised or unsupervised learning, recommendation, game AI, distributed or parallel data-science, etc. The topics will generally consist in applying a well-established technique on a novel data-science challenge or in applying recent research results on a classical data-science challenge. Either way, each topic will come with its own novel scientific challenge to address. At the end of the module, the students will give an oral presentation to demonstrate their methodology and their findings. Strong scientific rigor as well as very good engineering and communication skills will be necessary to complete this module successfully.

Foundations of machine learning

Fondamentaux de l'apprentissage automatique

Francis Bach

Duration: 24h
ECTS: 4 credits
In charge: Francis Bach

The goal of this class is to present old and recent results in learning theory, for the most widely-used learning architectures. This class is geared towards theory-oriented students as well as students who want to acquire a basic mathematical understanding of algorithms used throughout the masters program.

A particular effort will be made to prove many results from first principles, while keeping the exposition as simple as possible. This will naturally lead to a choice of key results that show-case in simple but relevant instances the important concepts in learning theory. Some general results will also be presented without proofs.

The class will be organized in eight three-hour sessions, each with a precise topic (a chapter from the book "Learning theory from first principles").

Prerequisites: We will prove results in class so a good knowledge of undergraduate mathematics is important, as well as basic notions in probability. Having followed an introductory class on machine learning is beneficial.

Optimization for machine learning

Optimisation pour l'apprentissage automatique

Clément Royer

Duration: 24h
ECTS: 4 credits
In charge: Clément Royer

Optimization has long been a fundamental component for modeling and solving classical machine learning problems such as linear regression and SVM classification. It also plays a key role in the training of neural networks, thanks to the development of efficient numerical tools tailored to deep learning.

This course is concerned with developing optimization algorithms for learning tasks, and will consist of both lectures and hands-on sessions in Python. The course will begin by an introduction to the various problem formulations arising in machine and deep learning, together with a refresher on key mathematical concepts (linear algebra, convexity, smoothness). The course will then describe the main algorithms for optimization in data science (gradient descent, stochastic gradient) and their theoretical properties. Finally, the course will focus on the challenges posed by implementing these methods in a deep learning and large-scale environment (automatic differentiation, distributed calculations, regularization).

References:
L. Bottou, F. E. Curtis and J. Nocedal. Optimization Methods for Large-Scale Machine Learning. SIAM Review, 2018.
S. J. Wright and B. Recht. Optimization for Data Analysis. Cambridge University Press, 2022.

Reinforcement learning

Apprentissage par renforcement

Olivier Cappé

Duration: 24h
ECTS: 4 credits
In charge: Olivier Cappé

Reinforcement Learning (RL) refers to scenarios where the learning algorithm operates in closed-loop, simultaneously using past data to adjust its decisions and taking actions that will influence future observations. Algorithms based on RL concepts are now commonly used in programmatic marketing on the web, robotics or in computer game playing. All models for RL share a common concern that in order to attain one's long-term optimality goals, it is necessary to reach a proper balance between exploration (discovery of yet uncertain behaviors) and exploitation (focusing on the actions that have produced the most relevant results so far).

The methods used in RL draw ideas from control, statistics and machine learning. This introductory course will provide the main methodological building blocks of RL, focussing on probabilistic methods in the case where both the set of possible actions and the state space of the system are finite.

Probabilistic and statistical tools for RL: Markov chains and conditioning, importance sampling, stochastic approximation, Bayesian modelling, hypothesis testing, concentration inequalities
Models: Markov decision processes (MDP), multiarmed bandits and other models
Planning: finite and infinite horizon problems, value functions, Bellman equations, dynamic programming, value and policy iteration
Basic learning tools: Monte Carlo methods, temporal-difference learning, policy gradient
Optimal exploration in multiarmed bandits: the explore vs exploit tradeoff, pure exploration, lower bounds, the UCB algorithm, Thompson sampling

References

Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, Second Edition, MIT Press, 2018 http://incompleteideas.net/book/the-book.html
Bandit Algorithms, Tor Lattimore and Csaba Szepesvári, Cambridge University Press, 2020 https://banditalgs.com/

Data acquisition, extraction, and storage (CS)

Acquisition, extraction et stockage de données

Pierre Senellart

Duration: 24h
ECTS: 4 credits
Course URL: https://moodle.psl.eu/course/view.php?id=34943
In charge: Pierre Senellart

The objective of this course is to present the principles and techniques used to acquire, extract, integrate, clean, preprocess, store, and query datasets, that may then be used as input data to train various artificial intelligence models. The course will consist on a mix of lectures and practical sessions. We will cover the following aspects:

Web data acquisition (Web crawling, Web APIs, open data, legal issues)
Information extraction from semi-structured data
Data cleaning and data deduplication
Data formats and data models
Storing and processing data in databases, in main memory, or in plain files
Introduction to large-scale data processing with MapReduce and Spark
Introduction to the management of uncertain data

Deep learning for image analysis (CS)

Apprentissage profond pour l'analyse d'images

Étienne Decencière

Duration: 24h
ECTS: 4 credits
In charge: Étienne Decencière

Deep learning has achieved formidable results in the image analysis field in recent years, in many cases exceeding human performance. This success opens paths for new applications while making the field very competitive. This course aims at providing the students with the theoretical and practical basis for understanding and using deep learning for image analysis applications.

The course will be composed of lectures and practical sessions. Moreover, experts will present practical applications of deep learning.

Lectures will include:

Artificial neural networks, back-propagation algorithm
Convolutional neural network
Design and optimization of a neural architecture
Successful architectures (AlexNet, VGG, GoogLeNet, ResNet)
Image classification and segmentation
Auto-encoders and generative networks
Vision transformers
Current research trends and perspectives

During the practical sessions, the students will code in Python, using Keras and Tensorflow. They will be confronted with the practical problems linked to deep learning: architecture design; optimization schemes and hyper-parameter selection; analysis of results.

Prerequisites: Linear algebra, basic probability and statistics

Large language models (CS)

Modèles de langage de grande taille

Alexandre Allauzen

Duration: 24h
ECTS: 4 credits
In charge: Alexandre Allauzen

Large language models play an important role to access and generate different kind of contents as well as to communicate in different languages. This course focuses on deep learning methods for natural language processing (NLP), from basics to large language models. The goal is to introduce important concepts of deep learning and NLP, including:

Natural Language Processing and deep learning basics
The attention mechanism and transformer architecture
Training of large language models
Applications (text classification, dialogue, etc.)

Lab sessions in pytorch complements the course.

Prerequisites:

Regular python skills
Understanding of basic concepts in machine learning like: logistic regression, loss function, optimization by gradient descent

Bayesian statistics (Maths)

Inférence bayésienne

Judith Rousseau

Duration: 24h
ECTS: 4 credits
In charge: Judith Rousseau

The course will cover different aspects of Bayesian statistics with an emphasis on the theoretical properties of Bayesian methods. The course starts with an introduction Bayesian decision theory from point estimation, to credible regions, testing and model selection and some notion on Bayesian predictive inference. The second part will cover the most important results on Bayesian asymptotics.

Part I. Bayesian decision theory : an Introduction

Prior / Posterior , risks and Bayesian estimators.
Credible regions.
Model selection and tests.

Part II: Bayesian asymptotics; in this part, both well and mis-speciﬁed models will be considered.

Asymptotic posterior distribution: in this part we will study asymptotic normality of the posterior, the penalization induced by the prior and the Bernstein von - Mises theorem. Regular and nonregular models will be treated.
marginal likelihood and consistency of Bayes factors/model selection approaches.
Empirical Bayes methods. This part will review some results on the asymptotic posterior distribution for parametric empirical Bayes methods.
Bayesian bootstrap.
Posterior consistency and posterior convergence rates. This part will ﬁrst cover the case of statistical loss functions using the theory introduced by L. Schwartz and developed by Ghosal and Van der Vaart.

Bibliography

C. P. Robert (2021). The Bayesian Choice.
S. Ghosal and A. van der Vaart (2017): Fundamentals of Bayesian Nonparametrics.
A. van der Vaart (1998): Asymptotic Statistics.

High-dimensional statistics (Maths)

Statistiques en grande dimension

Vincent Rivoirard

Duration: 24h
ECTS: 4 credits
In charge: Vincent Rivoirard

The objective of this course is to deal with statistical problems where the studied data are high dimensional, meaning that the number of parameters to infer is very large, and in some situations much larger than the number of observations. In this course, we shall present different statistical frameworks adapted to the high-dimensional paradigm, the statistical problems that arise and the specific methodologies for solving them. More precisely, for simple regression models or for more elaborate settings modelling functional data, we shall consider methods based on penalized criteria (AIC, BIC, Ridge...), with a special focus on Lasso-type approaches and their variations. Wavelet thresholding techniques and FDR approaches for multiple testing will be also topics at the core of this course. Finally, the course includes an introduction to statistics for functional data, which is a branch of statistics that studies data that can be modeled as random curves.

Prerequisites for this course:

statistics
probability
linear algebra and functional analysis (Fourier analysis, Sobolev spaces, etc.)

Optimal transport (Maths)

Transport optimal

Gabriel Peyré

Duration: 24h
ECTS: 4 credits
In charge: Gabriel Peyré

Optimal Transport (OT) is a key mathematical theory bridging optimization, partial differential equations, and probability. It provides a robust framework for comparing probability distributions and has recently gained prominence as a versatile tool for addressing a wide array of challenges in machine learning, particularly in the context of learning and evaluating generative models. This course explores the core mathematical concepts of OT, alongside recent advancements in scalable numerical solvers. Special emphasis will be placed on modern regularized approaches, which are crucial for addressing high-dimensional learning tasks. Course materials, including a concise textbook, lecture slides, and computational resources, are available online at optimaltransport.github.io.

Topics Covered:

Monge’s Formulation and Brenier’s Theorem
Applications in 1-D and Gaussian Distributions
Kantorovich’s Formulation, Linear Programming, and the Birkhoff-von Neumann Theorem
Metric Properties of OT: Convergence in Law and the Central Limit Theorem
Wasserstein Gradient Flow and Diffusion Models
Entropic Regularization and the Curse of Dimensionality

Optional courses

Computational statistics and Markov Chain Monte Carlo methods

Méthodes de Monte Carlo par chaînes de Markov

Christian Robert

Duration: 24h
ECTS: 4 credits
In charge: Christian Robert

This course covers most principles of Bayesian computing, that is, methods that are used to approximate Bayesian posterior distributions by pseudo-random simulation algorithms. While the outcome of the course comprises an ability to code and implement such methods, the core of the contents is mathematically-based and exploits properties of Markov chains.

Introduction to Bayesian computing challenges
Random Variable Generation
Monte Carlo Integration
Monte Carlo Optimization
Markov Chains
The Metropolis—Hastings Algorithm
The Two-Stage and multi-stage Gibbs Samplers
Variable Dimension Models and Reversible Jump Algorithms
Diffusion based simulation methods and other continuous time extensions
Iterated and Sequential Importance Sampling
Approximate Bayesian Computation

Reference: Robert, C.P., and Casella, G. (2004) Monte Carlo Statistical Methods, Springer

PSL Intensive week (Data@PSL)

Semaine intensive PSL (Data@PSL)

Alexandre Allauzen

ECTS: 2 credits
Course URL: https://data-psl.github.io/intensive-week/
In charge: Alexandre Allauzen

Advanced machine learning

Apprentissage automatique avancé

Yann Chevaleyre

Duration: 24h
ECTS: 4 credits
In charge: Yann Chevaleyre

This research-oriented module will focus on some selected advanced topics in machine learning, including

Online learning algorithms
Losses and algorithms beyond classification
Variational methods for generative machine learning

The evaluation consists in an individual presentation of a research paper in one of the above topics.

Bayesian case studies

Études de cas bayésiennes

Julien Stoehr

Duration: 24h
ECTS: 4 credits
In charge: Julien Stoehr

We shall put in practice classical models for statistical inference in a Bayesian setting, and implement computational methods. Using real data, we shall study various models such as linear regression, capture-recapture, and a hierarchical model. We shall discuss issues of model building and validation, the impact of the choice of prior, and model choice via Bayes Factors. The implementation shall use several algorithms: Markov Chain Monte Carlo, importance sampling, Approximate Bayesian Computation. The course is based on the free software R. Practical information: Large portions of the course are devoting to students coding. Students should bring their own laptop, which must have R installed before the first session; It is recommended to install RStudio (free) as well.

Reference: Bayesian Essentials with R, JM Marin & CP Robert, Springer, 2013

Bayesian machine learning

Apprentissage bayésien

Guillaume Kon Kam King

Duration: 24h
ECTS: 4 credits
In charge: Guillaume Kon Kam King

This advanced course explores two pivotal areas in modern Bayesian methodologies: Bayesian Nonparametrics and Bayesian Deep Learning. Students will delve into foundational concepts such as the Dirichlet Process, infinite mixture models, and Gaussian Processes, along with their applications in clustering, regression, and classification. The course also emphasizes the importance of quantifying uncertainty in Deep Learning and presents the Bayesian approach to do so, covering priors for neural networks, posterior inference methods, and recent ideas like martingale posteriors. Assessment involves a student seminar and homework to solidify understanding. Prerequisites include a strong foundation in Bayesian Statistics and familiarity with Markov Chain Monte Carlo methods. This course equips students with cutting-edge tools to advance their expertise in statistical and machine learning approaches.

Bayesian Nonparametrics

The Dirichlet Process
Infinite Mixture models
Posterior Sampling
Models beyond the Dirichlet Process
Gaussian Processes
Selected applications

Bayesian Deep Learning

Why do we want parameter uncertainty
Priors for Bayesian neural networks
Posterior inference
Martingale Posteriors and generalised Bayesian Inference

Computational social choice

Choix social computationnel

Jérôme Lang Dominik Peters

Duration: 24h
ECTS: 4 credits
In charge: Jérôme Lang Dominik Peters

The aim of this course is to give an overview of the problems, techniques and applications of computational social choice, a multidisciplinary topic at the crossing point of computer science (especially artificial intelligence, operations research, theoretical computer science, multi-agent systems, computational logic, web science) and economics. The course consists of the analysis of problems arising from the aggregation of preferences of a group of agents from a computational perspective. On the one hand, it is concerned with the application of techniques developed in computer science, such as complexity analysis or algorithm design, to the study of social choice mechanisms, such as voting procedures or fair division algorithms. On the other hand, computational social choice is concerned with importing concepts from social choice theory into computing. For instance, social welfare orderings originally developed to analyse the quality of resource allocations in human society are equally well applicable to problems in multi-agent systems or network design. The course will focus on normative aspects, computational aspects, and real-world applications (including some case studies).

Program:
1. Introduction to social choice.
2. Computing hard voting rules and preference aggregation functions. Application to aggregating web page rankings.
3. Strategic issues: manipulation, control, game-theoretic analyses of voting. Short introduction to algorithmic mechanism design.
4. Preference aggregation on combinatorial domains.
5. Communication issues in voting: voting with incomplete preferences, elicitation protocols, communication complexity, low-communication mechanisms.
6. Fair division of indivisible goods.
7. Cake cutting algorithms.
8. Matching under preferences.
9. Coalition formation.
10. Specific applications and case studies (varying every year): rent division, school assignment, group recommendation systems…

Recommended reading:
Handbook of Computational Social Choice (F. Brandt, V. Conitzer, U. Endriss, J. Lang, A. Procaccia, eds.), Cambridge University Press, 2016.
Algorithmics of Matching Under Preferences (D. Manlove), World Scientific, 2013.

Dimension reduction and manifold learning

Réduction de dimension et apprentissage sur variétés

Eddie Aamari

Duration: 24h
ECTS: 4 credits
In charge: Eddie Aamari

Modern machine learning typically deals with high-dimensional data. The fields concerned are very varied and include genomics, image, text, time series, or even socioeconomic data where more and more unstructured features are routinely collected. As a counterpart of this tendency towards exhaustiveness, understanding these data raises challenges in terms of computational resources and human understandability. Manifold Learning refers to a family of methods aiming at reducing the dimension of data while preserving certain of its geometric and structural characteristics. It is widely used in machine learning and experimental science to compress, visualize and interpret high-dimensional data. This course will provide a global overview of the methodology of the field, while focusing on the mathematical aspects underlying the techniques used in practice.

Prerequisites: Linear algebra, basic probability theory, statistics, Python coding

Learning outcomes:

Curse of dimensionality, manifold hypothesis and intrinsic dimension(s)
Multidimensional scaling
Linear dimension reduction (random projections, principal component analysis)
Non-linear spectral methods (kernel PCA, ISOMAP, MVU, Laplacian eigenmaps)
Ad-hoc distance-preserving methods (diffusion maps, LLE)
Probabilistic dimension reduction and clustering (SNE, UMAP)
Neural network-based dimensionality reduction

Bibliography:

Ghojogh, B., M. Crowley, F. Karray, and A. Ghodsi (2023). Elements of dimensionality reduction and manifold learning
Lee, J. A., M. Verleysen, et al. (2007). Nonlinear dimensionality reduction

Graph analytics

Analyse de graphes

Daniela Grigori

Duration: 24h
ECTS: 4 credits
In charge: Daniela Grigori

The objective of this course is to give students an overview of the field of graph analytics for massive graphs. Since graphs form a complex and expressive data type, we need methods for representing graphs in databases, manipulating, querying, analyzing and mining them. Graph applications are very diverse and need specific algorithms (link analysis, community detection, graph clustering, graph similarity, ..). Moreover, considering the underlying relational structures of data modelled as graphs can improve machine learning tasks.
The course presents new ways to model, store, retrieve, mine and analyze graph-structured data and some examples of applications.
Lab sessions are included allowing students to practice graph analytics: modeling a problem into a graph database and performing analytical tasks over the graph in a scalable manner.

Program

1. Introduction to graph management and mining
2. Graph databases – Neo4J
3. Query language for graphs – Cypher
4. Graph Processing Frameworks (Pregel, GraphX, ...)
5. Graph algorithms: Link Analysis, Community detection, Graph clustering, ..
6. Machine learning with graphs : Graph Neural Networks
7. Graph applications: analyzing social-network graphs, mining logs, fraud detection, ..

References

Ian Robinson, Jim Weber, Emil Eifrem, Graph Databases, O’Reilly (4 juin 2013), ISBN-10: 1449356265
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing, SIGMOD ’10, ACM, New York, NY, USA, 135-146
Xin, Reynold & Crankshaw, Daniel & Dave, Ankur & Gonzalez, Joseph & J. Franklin, Michael & Stoica, Ion. (2014). GraphX: Unifying Data-Parallel and Graph-Parallel Analytics.
Michael S. Malak and Robin East, Spark GraphX in Action, Manning, June 2016

Incremental learning, game theory, and applications

Apprentissage incrémental, théorie des jeux et applications

Yannick Viossat

Duration: 24h
ECTS: 4 credits
In charge: Yannick Viossat

This course will focus on the behavior of learning algorithms when several agents are competing against one another: specifically, what happens when an agent that follows an online learning algorithm interacts with another agent doing the same? The natural language to frame such questions is that of game theory, and the course will begin with a short introduction to the topic, such as normal form games (in particular zero-sum, potential, and stable games), solution concepts (such as dominated/rationalizable strategies, Nash, correlated and coarse equilibrium notions, ESS), and some extensions (Blackwell approachability). Subsequently, we will examine the long-term behavior of a wide variety of online learning algorithms (fictitious play, regret-matching, multiplicative/exponential weights, mirror descent and its variants, etc.), and we will discuss applications to generative adversarial networks (GANs), traffic routing, prediction, and online auctions.

References
[1] Nicolò Cesa-Bianchi and Gábor Lugosi, Prediction, learning, and games, Cambridge University Press, 2006.
[2] Drew Fudenberg and David K. Levine, The theory of learning in games, Economic learning and social evolution, vol. 2, MIT Press, Cambridge, MA, 1998.
[3] Sergiu Hart and Andreu Mas-Colell, Simple adaptive strategies: from regret matching to uncoupled dynamics, World Scientific Series in Economic Theory – Volume 4, World Scientific Publishing, 2013.
[4] Vianney Perchet, Approachability, regret and calibration: implications and equivalences, Journal of Dynamics and Games 1 (2014), no. 2, 181–254.
[5] Shai Shalev-Shwartz, Online learning and online convex optimization, Foundations and Trends in Machine Learning 4 (2011), no. 2, 107–194.

Introduction to causal inference

Introduction à l'inférence causale

Fabrice Rossi

Duration: 24h
ECTS: 4 credits
In charge: Fabrice Rossi

This course provides an introduction to causal inference. It covers both the Neyman–Rubin potential outcomes framework and Pearl’s do-calculus. The former is used to introduce the fundamental problem of causal inference and the notion of counterfactuals. The core hypotheses needed for causal identification of average treatment effects are presented: (conditional) exchangeability, positivity, and consistency. Estimation based on generalised linear models and on machine learning approaches is explored, including the double-machine learning approach.

The second part of the course covers Pearl’s do-calculus. The course introduces graphical models, with a focus on directed models, followed by structural causal models. The simple Markovian case is used to link this framework to the potential outcomes one and to derive classical techniques such as the back-door criterion. The semi-Markovian case is then explored as the general way of representing causal hypotheses in the presence of unobserved confounding variables. Identification is revisited in the light of the do-calculus and of the IDC algorithm.

The final part of the course reviews causal discovery algorithms and open research questions.

Knowledge graphs, description logics, reasoning on data

Graphes de connaissance, logiques de description, raisonnement sur les données

Michaël Thomazo

Duration: 24h
ECTS: 4 credits
In charge: Michaël Thomazo

Knowledge graphs are a flexible tool to represent knowledge about the real world. After presenting some of the existing knowledge graphs (such as DBPedia, Wikidata or Yago) , we focus on their interaction with semantics, which is formalized through the use of so-called ontologies. We then present some central logical formalism used to express ontologies, such as Description Logics and Existential Rules. A large part of the course will be devoted to study the associated reasoning tasks, with a particular focus on querying a knowledge graph through an ontology. Both theoretical aspects (such as the tradeoff between the expressivity of the ontology language versus the complexity of the reasoning tasks) and practical ones (efficient algorithms) will be considered.

Program:
1. Knowledge Graphs (history and uses)
2. Ontology Languages (Description Logics, Existential Rules)
3. Reasoning Tasks (Consistency, classification, Ontological Query Answering)
4. Ontological Query Answering (Forward and backward chaining, Decidability and complexity, Algorithms, Advanced Topics)

References:
- The description logic handbook: theory, implementation, and applications. Baader et al., Cambridge University Press
- Foundations of Semantic Web Technologies, Hitzler et al., Chapman&Hall/CRC
- Web Data Management, Abiteboul et al., Cambridge University Press

Prerequisites:
- first-order logic;
- complexity (Turing machines, classical complexity classes) is a plus.

LLM for code and proof

Grands modèles de langage pour le code et la preuve

Marc Lelarge

Duration: 24h
ECTS: 4 credits
In charge: Marc Lelarge

Recent advances in large language models (LLMs) have enabled remarkable progress in program synthesis and code generation. This course explores the foundations and methodologies behind modern neural code generation, with a particular focus on Transformer-based architectures and LLM techniques. The course has two main objectives: (1) to provide students with a deep understanding of the core techniques for training and fine-tuning neural models for code generation, including inference strategies and evaluation metrics specific to code, and (2) to introduce current research in neural program synthesis, highlighting applications in software engineering, reasoning, and formal verification.

Topics covered include:

Transformer architectures, attention mechanisms, and KV-cache for efficient inference.
Tokenization strategies for linguistic and code-based datasets.
Fine-tuning techniques such as LoRA for task-specific adaptation.
Scaling laws for optimizing LLM performance.
Decoding strategies for code generation, including sampling-based and greedy methods.
Retrieval-augmented generation (RAG) for incorporating external knowledge.
Structured generation techniques for syntax-constrained outputs.
Applications of LLMs in formal verification and automated theorem proving.

By the end of the course, students will gain both theoretical insights and hands-on experience in building and evaluating neural models for code generation.

Machine learning on big data

Apprentissage automatique sur Big Data

Dario Colazzo

Duration: 24h
ECTS: 4 credits
In charge: Dario Colazzo

This course focuses on the typical, fundamental aspects that need to be dealt with in the design of machine learning algorithms that can be executed in a distributed fashion, typically on Hadoop clusters, in order to deal with big data sets, by taking into account scalability and robustness. So the course will first focus on a bunch of main-stream, sequential machine learning algorithms, by taking then into account the following crucial and complex aspects. The first one is the re-design of algorithms by relying on programming paradigms for distribution and parallelism based on map-reduce (e.g., Spark, Flink, ….). The second aspect is experimental analysis of the map-reduce based implementation of designed algorithms in order to test their scalability and precision. The third aspect concerns the study and application of optimisation techniques in order to overcome lack of scalability and to improve execution time of designed algorithm.

The attention will be on machine learning technique for dimension reduction, clustering and classification, whose underlying implementation techniques are transversal and find application in a wide range of several other machine learning algorithms. For some of the studied algorithms, the course will present techniques for a from-scratch map-reduce implementation, while for other algorithms packages like Spark ML will be used and end-to-end pipelines will be designed. In both cases algorithms will be analysed and optimised on real life data sets.

References:
- Mining of Massive Datasets http://www.mmds.org
- High Performance Spark; Best Practices for Scaling and Optimizing Apache Spark, Holden Karau, Rachel Warren, O’Reilly

Machine learning with kernel methods

Méthodes à noyau pour l'apprentissage

Julien Mairal

Duration: 24h
ECTS: 4 credits
In charge: Julien Mairal

Many problems in real-world applications of machine learning can be formalized as classical statistical problems, e.g., pattern recognition, regression or dimension reduction, with the caveat that the data are often not vectors of numbers. For example, protein sequences and structures in computational biology, text and XML documents in web mining, segmented pictures in image processing, or time series in speech recognition and finance, have particular structures which contain relevant information for the statistical problem but can hardly be encoded into finite-dimensional vector representations.

Kernel methods are a class of algorithms well suited for such problems. Indeed they extend the applicability of many statistical methods initially designed for vectors to virtually any type of data, without the need for explicit vectorization of the data. The price to pay for this extension to non-vectors is the need to define a so-called positive definite kernel function between the objects, formally equivalent to an implicit vectorization of the data. The “art” of kernel design for various objects have witnessed important advances in recent years, resulting in many state-of-the-art algorithms and successful applications in many domains

The goal of this course is to present the mathematical foundations of kernel methods, as well as the main approaches that have emerged so far in kernel design. We will start with a presentation of the theory of positive definite kernels and reproducing kernel Hilbert spaces, which will allow us to introduce several kernel methods including kernel principal component analysis and support vector machines. Then we will come back to the problem of defining the kernel. We will present the main results about Mercer kernels and semigroup kernels, as well as a few examples of kernel for strings and graphs, taken from applications in computational biology, text processing and image analysis. Finally we will touch upon topics of active research, such as large-scale kernel methods and deep kernel machines.

Mathematics of deep learning

Mathématiques de l'apprentissage profond

Bruno Loureiro

Duration: 24h
ECTS: 4 credits
In charge: Bruno Loureiro

The past decade has witnessed a surge in the development and adoption of deep learning algorithms to solve day-a-day computational tasks. Surprisingly, the modern deep learning practice defies some of the established textbook intuition provided by traditional statistical learning theory and convex optimisation. For example, deep networks defy the classical bias-variance tradeoff by generalising and interpolating at the same time, and descent-based methods are successfully employed in the optimisation of highly non-convex objectives. These empirical observations translate to exciting theoretical challenges which require new mathematical ideas.

The goal of this course is two-fold. First, to familiarise the student with some of the major mathematical challenges posed by modern machine learning, as well as current research topics in theoretical machine learning. Second, to give an overview of some of the progress made over the past few years in understanding some of these problems. For example, we will cover:

Universal approximation theorems for neural networks.
The curse of dimensionality.
The lazy limit of large-width networks and the neural tangent kernel.
Introduction to random matrix theory tools for machine learning.
The double descent phenomena and benign overfitting.
Implicit bias of GD/SGD.

Monte-Carlo search and games

Recherche Monte-Carlo et jeux

Tristan Cazenave

Duration: 24h
ECTS: 4 credits
In charge: Tristan Cazenave

Introduction to Monte Carlo for computer games. Monte Carlo Search has revolutionized computer games. It works well with Deep Learning so as to create systems that have superhuman performances in games such as Go, Chess, Hex or Shogi. It is also appropriate to address difficult optimization problems. In this course we will present different Monte Carlo search algorithms such as UCT, GRAVE, Nested Monte Carlo and Playout Policy Adaptation. We will also see how to combine Monte Carlo Search and Deep Learning. The validation of the course is a project involving a game or an optimization problem.

References
Intelligence Artificielle Une Approche Ludique, Tristan Cazenave, Editions Ellipses, 2011.

Non-convex inverse problems

Problèmes inverses non-convexes

Irène Waldspurger

Duration: 24h
ECTS: 4 credits
In charge: Irène Waldspurger

An inverse problem is a problem where the goal is to recover an unknown object (typically a vector with real coordinates, or a matrix), given a few “measurements” of this object, and possibly some information on its structure. In this course, we will discuss examples of such problems, motivated by applications as diverse as medical imaging, optics and machine learning. We will especially focus on the questions:

Which algorithms can we use to numerically solve these problems?
When and how can we prove that the solutions returned by the algorithms are correct?

These questions are relatively well understood for convex inverse problems, but the course will be on non-convex inverse problems, whose study is much more recent, and a very active research topic.

The course will be at the interface between real analysis, statistics and optimization. It will include theoretical and programming exercises.

NoSQL databases

Bases de données NoSQL

Paul Boniol

Duration: 24h
ECTS: 4 credits
In charge: Paul Boniol

The purpose of this course is to review data management technologies that can be used to manage large volumes of data, beyond traditional relational database management systems. These systems, grouped generically under the name “NoSQL”, actually bring together very heterogeneous approaches: different data models (e.g., based on representations in XML, graph or triple form), different compromises to ensure scalability (in particular, abandoning ACID properties), distribution of data over a very large set of nodes, etc.. This course will present the choices of technologies and the research problems implied by some of these systems, and will explain in which situations they are useful.

Point clouds and 3D modeling

Nuages de points et modélisation 3D

Jean-Emmanuel Deschaud

Duration: 26h
ECTS: 4 credits
In charge: Jean-Emmanuel Deschaud

Topics in trustworthy machine learning

Introduction à l'apprentissage de confiance

Olivier Cappé

Duration: 24h
ECTS: 4 credits
In charge: Olivier Cappé

Trustworthy machine learning refers to the study of machine learning systems that are not only technically proficient but also socially responsible and acceptable to users. The goal is to ensure that these systems are trustworthy to users by addressing key challenges such as:

Privacy: Protecting user data and complying with privacy regulations.
Robustness: Ensuring resilience against adversarial attacks and unexpected inputs.
Fairness: Treating all users and groups equitably, avoiding biases.

A significant portion of the course will cover the Differential Privacy (DP) framework, which has become a standard for enforcing user privacy in data processing over the past decade. By properly calibrated use of randomization, DP aims to reach a trade-off between the protection of individual characteristics and the utility of the learned models.

The remainder of the course will explore key models and concepts in robustness and fairness.

Building on machine learning concepts from the first semester and basic probability and statistics, the course combines lectures, exercise sessions, and Python hands-on labs. Validation is through homework assignments and the defense of a group project based on a research paper.

Internship

IASD research internship

Stage de recherche IASD

ECTS: 10 credits