Title: Using structure to select features in high dimension
Abstract: Many problems in genomics require the ability to identify relevant features in data sets containing many more orders of magnitude than samples. This setup poses different statistical and computational challenges, and traditional feature selection methods fall short. In my talk, I will present several ways to incorporate prior knowledge of the structure of the features to address this problem.
Title: On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
Abstract: Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weights and positions. This is an idealization of the usual way to train neural networks with a large hidden layer. We show that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers. The proof involves Wasserstein gradient flows, a by-product of optimal transport theory. Numerical experiments show that this asymptotic behavior is already at play for a reasonable number of particles, even in high dimension. (Joint work with Lénaïc Chizat)
Title: Learned image reconstruction for high-resolution tomographic imaging
Abstract: Recent advances in deep learning for tomographic reconstructions have shown a great promise to create accurate and high quality images from subsampled measurements in a time considerably shorter than needed by the established nonlinear regularisation methods such as e.g. TV. This new paradigm also offers a new implicit way of expressing prior knowledge through training on a class of images with expected characteristics. In this talk we discuss two common approaches to combining deep learning - here convolutional neural networks (CNN) - with model-based reconstruction techniques on the example of photoacoustic tomography. We also address particular challenges for learned reconstruction for such computationally intensive application.
Title: Deep Inversion, Autoencoders for Learned Regularization of Inverse Problems
Abstract: This talk will highlight how deep learning, inverse problems theory and the calculus of variations can profit from each other. Data-driven deep learning methods have revolutionized many application fields in imaging and data science. Recently, first classical methods from the calculus of variations and inverse problems have been combined with deep learning to effectively estimate hidden parameters. Such variational networks with learned regularization and unrolled gradient flow optimization have enabled deep convolutional neural networks (CNN) to tackle inversion tasks with strongly improved performance. However, even in the context of very basic CNN inversion methods, one fundamental aspect of inverse problems theory is still largely missing: understandable regularization scales addressing ill-posedness, i.e. stability properties of the learned inversion process. In machine learning theory this is often referred to as adversarial attacks. In this talk, we present a latent space analysis of autoencoding networks to learn the regularization of inverse problems in a controlled way. This offers new mathematical tools and insights for addressing the above limitation. Basic deconvolution problems and realistic inversion in photoacoustic tomography illustrate the gain of deep autoencoding networks in inverse problems.
Title: Designing multimodal deep architectures for Visual Question Answering
Abstract: Multimodal representation learning for text and image has been extensively studied in recent years. Currently, one of the most popular tasks in this field is Visual Question Answering (VQA). I will introduce this complex multimodal task, which aims at answering a question about an image. To solve this problem, visual and textual deep nets models are required and, high level interactions between these two modalities have to be carefully designed into the model in order to provide the right answer. This projection from the unimodal spaces to a multimodal one is supposed to extract and model the relevant correlations between the two spaces. Besides, the model must have the ability to understand the full scene, focus its attention on the relevant visual regions and discard the useless information regarding the question.
Title: Random Matrix Advances in Machine Learning
Abstract: Machine learning algorithms, starting from elementary yet popular ones, are difficult to theoretically analyze as (i) they are data-driven, and (ii) they rely on non-linear tools (kernels, activation functions). These theoretical limitations are exacerbated in large dimensional datasets where standard algorithms behave quite differently than predicted, if not completely fail. In this talk, we will show how random matrix theory (RMT) answers all these problems. We will precisely show that RMT provides a new understanding and various directions of improvements for kernel methods, semi-supervised learning, SVMs, community detection on graphs, spectral clustering, etc. Besides, we will show that RMT can explain observations made on real complex datasets in as advanced methods as deep neural networks.
Title: Combinatorial Solutions to Elastic Shape Matching
Abstract: In my presentation, I will focus on four different shape matching problems, namely the matching between two planar shapes, the matching between two 3D shapes, the matching between a shape and an image and the matching between a planar and a 3D shape. In all cases, I will discuss combinatorial formulations for elastic shape matching and show how optimal or near-optimal solutions can be computed using dynamic programming or integer linear programming. The formulation is highly related to optimal transport, yet different.
Title: On the several ways to regularize optimal transport
Abstract: After briefly introducing the optimal transport problem, I will show in this talk how regularization, either explicitly carried out as a statistical procedure or implicitly carried out and presented as a computational trick, is fundamental for optimal transport to work in applications to data sciences. I will present two such regularizations, either by regularizing the transport plan by entropy, or by projecting measures on maximally informative subspaces. (presentations based on joint works with G. Peyré, A. Genevay, F. Bach and F.P. Paty)
Title: Roto-Translation Covariant Convolutional Networks for Medical Image Analysis
Abstract: We propose a framework for rotation and translation covariant deep learning using SE(2) group convolutions, cf.. The group product of the special Euclidean motion group SE(2) describes how a concatenation of two roto-translations results in a net roto-translation. We encode this geometric structure into convolutional neural networks (CNNs) via SE(2) group convolutional layers, which fit into the standard 2D CNN framework, and which allow to generically deal with rotated input samples without the need for data augmentation. We introduce three layers: a lifting layer which lifts a 2D image to an SE(2)-image, i.e., 3D data whose domain is SE(2); a group convolution layer from and to an SE(2)-image; and a projection layer from an SE(2)-image to a 2D image. The lifting and group convolution layers are SE(2)-covariant (i.e. the output roto-translates with the input). The projection layer, a maximum intensity projection over rotations, makes the full CNN rotation invariant. A typical SE(2)-CNN consist of a rotationally-invariant feature encoding part (a sequence of a lifting layer, group convolution layers and a projection layer) followed by (fully connected) output layers in which the outputs are mapped to class probabilities via a sigmoid activation function, a scalar bias and weights. In this work we consider two class problems where we use cross-entropy as a loss function. We address three different medical imaging problems: (1) Mitosis detection in histopathology, (2) Vessel segmentation in retinal imaging, (3) Cell boundary detection in electron microscopy images. Each time we achieve state-of-the-art performance with our SE(2)-CNNs, without the need for data augmentation by rotation and with an increased performance compared to standard CNNs that do rely on augmentation. We show the advantage of including multiple orientation and group convolutions in a fair comparison where the total network capacity is kept constant, and where training and testing data is maintained. Reference:  Erik J Bekkers, Maxime W Lafarge, Mitko Veta, Koen A J Eppenhof, Josien P W Pluim, and Remco Duits, Roto-Translation Covariant Convolutional Networks for Medical Image Analysis, MICCAI 2018.
Title: Robust nonnegative matrix factorisation with the beta-divergence and applications in imaging
Abstract: Data is often available in matrix form, in which columns are samples, and processing of such data often entails finding an approximate factorisation of the matrix into two factors. The first factor (the “dictionary”) yields recurring patterns characteristic of the data. The second factor (“the activation matrix”) describes in which proportions each data sample is made of these patterns. Nonnegative matrix factorisation (NMF) is a popular technique for analysing data with nonnegative values, with applications in many areas such as in text information retrieval, user recommendation, audio signal processing, and hyperspectral imaging. In a first part, the presentation will present a general majorisation-minimisation framework for NMF with the beta-divergence, a continuous family of loss functions that takes the quadratic loss, KL divergence and Itakura-Saito divergence as special cases. Secondly, I will present applications for hyperspectral unmixing in remote sensing and factor analysis in dynamic PET, introducing robust variants of NMF that account for outliers, nonlinear phenomena or specific binding. Joint work with Nicolas Dobigeon.
Title: Optimization meets machine learning for neuroimaging
Abstract: Electroencephalography (EEG), Magnetoencephalography (MEG) and functional MRI (fMRI) are noninvasive techniques that allow to image the active brain. Yet to do so, challenging computational and statistical machine learning problems need to be solved. As data are acquired everyday in both clinical and cognitive neuroscience contexts computations can become a bottleneck. In this talk I will present statistical inference problems relevant for neuroimaging (matrix factorization, sparse regression) and show how novel optimization strategies improve on the state-of-the-art.
Title: Multigrain: a unified image embedding for classes and instances
Abstract: This talk will tackle several problems related to image classification and large-scale image retrieval. While these tasks are both treated with convolutional neural networks, noticeable differences in the training and architectures exist. We propose a network architecture producing compact vector representations that are suited both for image classification and particular object retrieval. It is trained with a simple multi-task objective: we minimize a cross-entropy loss for classification and a ranking loss that determines if two images are identical up to data augmentation, with no need for additional labels. I will then discuss the subsequent coding stage, for which we want to compress such an image representation such that it is suited for retrieval and classification. This is joint work with Maxim Berman, Matthijs Douze, Iasonas Kokkinos, Andrea Vedaldi
Title: Revisiting non-linear PCA with progressively grown autoencoders
Abstract: In this talk I will revisit the old problem of nonlinear dimensionality reduction with hierarchical representations. That is, representations where the first n components induce the n-dimensional manifold (with some degree of smoothness) that best approximates the data points, as in standard PCA. I will introduce a method that allows to progressively grow the latent dimension of an autoencoder, without losing the hierarchy condition. Experimental results using real data in both unsupervised and supervised scenarios will be shown.
Title: A Kernel Perspective for Regularizing Deep Neural Networks
Abstract: We propose a new point of view for regularizing deep neural networks by using the norm of a reproducing kernel Hilbert space (RKHS). Even though this norm cannot be computed, it admits upper and lower approximations leading to various practical strategies. Specifically, this perspective (i) provides a common umbrella for many existing regularization principles, including spectral norm and gradient penalties, or adversarial training, (ii) leads to new effective regularization penalties, and (iii) suggests hybrid strategies combining lower and upper bounds to get better approximations of the RKHS norm. We experimentally show this approach to be effective when learning on small datasets, or to obtain adversarially robust models. This is a joint work with Alberto Bietti, Gregoire Mialon and Dexiong Chen.
Title: Autoencoder Image Generation with Multiscale Sparse Deconvolutions
Abstract: Autoencoders and GAN's can synthesize remarkably complex images, although we still do not understand the mathematical properties of the generated random processes. We introduces a mathematical and algorithmic framework to analyze the principles of such image syntheses. In Wasserstein autoencoders, the coder is trained to transform the input random vector into a lower-dimensional nearly white noise. Images are synthesized from white noise with an inverse deep convolutional generator. We show that the encoder can be computed with a multiscale scattering transform, which mixes input variables at multiple scales. We prove that generating an image model then amounts to solve a sequence of linear deconvolutions at different scales. A deep convolutional generator regularizes this deconvolution by sparsity in dictionaries learned at each scale. Numerical image synthesis will be shown. Joint work with Tomas Anglès.
Title: Predicting aesthetic appreciation of images
Abstract: Image aesthetics has become an important criterion for visual content curation on social media sites and media content repositories. Previous work on aesthetic prediction models in the computer vision community has focused on aesthetic score prediction or binary image labeling. However, raw aesthetic annotations are in the form of score histograms and provide richer and more precise information than binary labels or mean scores. In this talk I will present recent work at Naver Labs Europe on the rarely-studied problem of predicting aesthetic score distributions. The talk will cover the large-scale dataset we collected for this problem, called AVA, and will describe the novel deep architecture and training procedure for our score distribution model. Our model achieves state-of-the-art results on AVA for three tasks: (i) aesthetic quality classification; (ii) aesthetic score regression; and (iii) aesthetic score distribution prediction, all while using one model trained only for the distribution prediction task. I will also discuss our proposed method for modifying an image such that its predicted aesthetics changes, and describe how this modification can be used to gain insight into our model.
Title: Understanding geometric attributes with autoencoders
Abstract: Autoencoders are neural networks which project data to and from a lower dimensional latent space, the projection being learned via training on the data. While these networks produce impressive results, there is as yet little understanding of the internal mechanisms which allow autoencoders to produce such results. In this work, we aim to describe how an autoencoder is able to process certain fundamental image attributes. We analyse two of these attributes in particular : size and position. For the former, we study the case of binary images of disks, and describe the encoding and decoding processes, and in particular that the optimal decoder in the case of a network without biases can be described precisely. In the case of position, we describe how the encoder can extract the position of a Dirac impulse. Finally, we present ongoing work into an approach to create a PCA-like autoencoder, that is to say an autoencoder which presents similar characteristics to the PCA in terms of the interpretability of the latent space. We shall show preliminary experimental results on synthetic data.
Title: An SDCA-powered inexact dual augmented Lagrangian method for fast CRF learning
Abstract: I'll present an efficient dual augmented Lagrangian formulation to learn conditional random field (CRF) models. The algorithm, which can be interpreted as an inexact gradient method on the multiplier, does not require to perform exact inference iteratively, requires only a fixed number of stochastic clique-wise updates at each epoch to obtain a sufficiently good estimate of the gradient w.r.t. the Lagrange multipliers. We prove that the proposed algorithm enjoys global linear convergence for both the primal and the dual objective. Our experiments show that the proposed algorithm outperforms state-of-the-art baselines in terms of speed of convergence. (Joint work with Shell Xu Hu)
Title: Bayesian inversion for tomography through machine learning
Abstract: The talk will outline recent approaches for using (deep) convolutional neural networks to solve a wide range of inverse problems, such as tomographic image reconstruction. Emphasis is on learned iterative schemes that use a neural network architecture for reconstruction that includes physics based models for how data is generated. The talk will also discuss recent developments in using generative adversarial networks for uncertainty quantification in inverse problems.
Title: Unsupervised domain adaptation with application to urban scene analysis
Abstract: In numerous real world applications, no matter how much energy is devoted to build real and/or synthetic training datasets, there remains a large distribution gap between these data and those met at run-time. This gap results in severe, possibly catastrophic, performance loss. This problem is especially acute for automated and autonomous driving systems, where generalizing well to diverse testing environments remains a major challenge. One promising tool to mitigate this issue it unsupervised domain adaptation (UDA), which assumes that un-annotated data from the "test domain" are available at training time, along with the annotated data from the "source domain". We will discuss different ways to approach UDA, with application to semantic segmentation and object detection in urban scenes. We will introduce a new approach, called AdvEnt, that relies on combining adversarial training with minimization of decision entropy (seen as a proxy for uncertainty).
Title: Scalable hyperparameter transfer learning
Abstract: Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization. Typically, BO relies on conventional Gaussian process (GP) regression, whose algorithmic complexity is cubic in the number of evaluations. As a result, GP-based BO cannot leverage large numbers of past function evaluations, for example, to warm-start related BO runs. We propose a multi-task adaptive Bayesian linear regression model for transfer learning in BO, whose complexity is linear in the function evaluations: one Bayesian linear regression model is associated to each black-box function optimization problem (or task), while transfer learning is achieved by coupling the models through a shared neural network. A first set of experiments show that the neural network learns a representation suitable for warm-starting the black-box optimization problems and that BO runs can be accelerated when the target black-box function (e.g., validation loss) is learned together with other related signals (e.g., training loss). The proposed method was found to be at least one order of magnitude faster that methods recently published in the literature. A second set of experiments show that our approach can further be combined with Hyperband, replacing the uniform random sampling of hyperparameter candidates by an adaptive non-uniform sampling procedure. Our extension not only improves the precision resolution of Hyperband but also supports transfer learning, both, within a Hyperband run and across previous hyperparameter tuning tasks. This is joint work with R. Jenatton, L. Valkov, F. Winkelmolen, C. Archambeau and M. Seeger.
Title: Optimal machine learning with stochastic projections and regularization
Abstract: Projecting data in low dimensions is often key to scale machine learning to large high-dimensional data-sets. In this talk we will take take a statistical learning tour of classic as well as recent projection methods: from classical principal component analysis, to sketching and random subsampling. We will show that, perhaps surprisingly, there are number of settings, where it is possible to substantially reduce data dimensions, hence computational costs, without losing statistical accuracy. As a byproduct we derive a massively scalable kernel/Gaussian process solver with optimal statistical guarantees, and excellent performance in a number of large scale problems.
Title: Structured prediction via implicit embeddings
Abstract: In this talk we analyze a regularization approach for structured prediction problems. We characterize a large class of loss functions that allows to naturally embed structured outputs in a linear space. We exploit this fact to design learning algorithms using a surrogate loss approach and regularization techniques. We prove universal consistency and finite sample bounds characterizing the generalization properties of the proposed methods. Experimental results are provided to demonstrate the practical usefulness of the proposed approach.
Title: Learning Representations for Information Obfuscation and Inference
Abstract: Data collection and sharing are pervasive aspects of modern society. This process can either be voluntary, as in the case of a person taking a facial image to unlock his/her phone, or incidental, such as trafﬁc cameras collecting videos on pedestrians. An undesirable side effect of these processes is that shared data can carry information about attributes that users might consider as sensitive, even when such information is of limited use for the task. It is therefore desirable for both data collectors and users to design procedures that minimize sensitive information leakage. Balancing the competing objectives of providing meaningful individualized service levels and inference while obfuscating sensitive information is still an open problem. In this work, we take an information theoretic approach that is implemented as an unconstrained adversarial game between Deep Neural Networks in a principled, data-driven manner. This approach enables us to learn domain-preserving stochastic transformations that maintain performance on existing algorithms while minimizing sensitive information leakage. Joint work with M. Bertran, N. Martinez, Q. Qiu, A. Papadaki, G. Reeves, and M. Rodrigues.
Title: Towards demystifying over-parameterization in deep learning
Abstract: Many modern learning models including deep neural networks are trained in an over-parameterized regime where the parameters of the model exceed the size of the training dataset. Training these models involve highly non-convex landscapes and it is not clear how methods such as (stochastic) gradient descent provably find globally optimal models. Furthermore, due to their over-parameterized nature these neural networks in principle have the capacity to (over)fit any set of labels including pure noise. Despite this high fitting capacity, somewhat paradoxically, neural networks models trained via first-order methods continue to predict well on yet unseen test data. In this talk I will discuss some results aimed at demystifying such phenomena in deep learning and other domains such as matrix factorization by demonstrating that gradient methods enjoy a few intriguing properties: (1) when initialized at random the iterates converge at a geometric rate to a global optima, (2) among all global optima of the loss the iterates converge to one with a near minimal distance to the initial estimate and do so by taking a nearly direct route, (3) are provably robust to noise/corruption/shuffling on a fraction of the labels with these algorithms only fitting to the correct labels and ignoring the corrupted labels. (This talk is based on joint work with Samet Oymak)
Title: Statistical inference in high-dimension and application to medical imaging
Abstract: Medical imaging involves high-dimensional data, yet their acquisition is obtained for limited samples. Multivariate predictive models have become popular in the last decades to fit some external variables from imaging data, and standard algorithms yield point estimates of the model parameters. It is however challenging to attribute confidence to these parameter estimates, which makes solutions hardly trustworthy. In this talk, I will present a new algorithm that assesses parameters statistical significance and that can scale even when the number of predictors p ≥ 10^5 is much higher than the number of samples n ≤ 10^3 , by leveraging structure among features. Our algorithm combines three main ingredients: a powerful inference procedure for linear models –the so-called Desparsified Lasso– feature clustering and an ensembling step. We first establish that Desparsified Lasso alone cannot handle n << p regimes; then we demonstrate that the combination of clustering and ensembling provides an accurate solution, whose specificity is controlled. We also demonstrate stability improvements on two brain imaging datasets.
Title: Contextual Bandit: from Theory to Applications
Abstract: Trading exploration versus exploration is a key problem in computer science: it is about learning how to make decisions in order to optimize a long-term cost. While many areas of machine learning aim at estimating a hidden function given a dataset, reinforcement learning is rather about optimally building a dataset of observations of this hidden function that contains just enough information to guarantee that the maximum is being properly estimated. The first part of this talk reviews the main techniques and results known on the contextual linear bandit. We'll mostly rely on the recent book of Lattimore and Szepesvari (2019) . Indeed, real-world problems often don't behave as the theory would like them to. In the second part of this talk, we want to share our experience in applying bandit algorithms in industry . In particular, it appears that while the system is supposed to be interacting with its environment, the customers' feedback is often delayed or missing and does not allow to perform the necessary updates. We propose a solution to this issue, propose some alternative models and architecture, and finish the presentation with open questions on sequential learning beyond bandits.
 Lattimore, Tor, and Csaba Szepesvári. Bandit algorithms. preprint (2018).
 Vernade, Claire, et al. Contextual bandits under delayed feedback. arXiv preprint arXiv:1807.02089 (2018)
Title: Learning from permutations
Abstract: Changes in image quality or illumination may affect the pixel intensities, without affecting the relative intensities, i.e., the ranking of pixels in an image by decreasing intensity. In order to learn a model robust to such changes, it is therefore of interest to develop machine learning tools to learn from permutations. In this talk I will discuss several approaches to embed the set of permutations to vector spaces allowing computationally efficient learning of linear models, and relate these embeddings to the classical representations of the symmetric group.
Title: Iterative regularization via dual diagonal descent
Abstract: In this talk I wlll consider iterative regularization methods for solving linear inverse problems. An advantage of iterative regularization strategies with respect to Tikhonov regularization is that they are developed in conjunction with an optimization algorithm, adapted to the structure of the problem at hand, and the number of iterations plays the role of a regularization parameter. I will show that dual proximal gradient algorithms can be used as iterative regularization procedures, both in the standard and accelerated version. Theoretical findings are complemented with numerical experiments showing state- of-the-art performances.
Title: Rank optimality for the Burer-Monteiro factorization
Abstract: In the last decades, semidefinite programs have emerged as a a powerful way to solve difficult combinatorial optimization problems in polynomial time. Unfortunately, they are difficult to numerically solve in high dimensions. This talk will discuss a classical heuristic used to speed up the solving, namely the Burer-Monteiro formulation. We will review the main correctness guarantees that have been established for this heuristic, and study their optimality.
Title: Learning high-level reasoning in and from images
Abstract: Humans are able to infer what happened in a video given only a few sample frames. This faculty is called reasoning and is a key component of human intelligence. A detailed understanding requires reasoning over semantic structures, determining which objects were involved in interactions, of what nature, and what were the results of these. To compound problems, the semantic structure of a scene may change and evolve. In this talk we present research in high-level reasoning from images and videos, with the goals of understanding visual content (scene comprehension) or to make predictions of probable future outcomes, or to act in simulated environments based on visual observations. We present neural models addressing these goals through explicit modeling of object relationships; We learn this models from data or from interactions between an agent and an environment.