|
Title: Deep Learning Theory from Statistics to Optimization
Abstract In this tutorial, I will explain the theories of deep learning, especially from the viewpoint of statistical learning theory and optimization theory. While deep learning is used in a variety of fields, there are still many theoretical aspects that remain unexplored, and vigorous research is currently being conducted from a variety of perspectives to clarify them. Deep learning theory can be roughly divided into three aspects: "approximation theory," "generalization theory," and "optimization theory," and those research is being conducted using various mathematical tools such as functional analysis and probability theory. In this tutorial, we will provide an intuitive explanation of these topics, from the basics of statistical learning theory to recent trends in theoretical research such as double descent, neural tangent kernels and mean-field analysis. In particular, we will provide theoretical analyses of questions such as "Why does deep learning produce high prediction performance?" and "Why does it generalize despite the huge number of parameters?". The limitations of current theories are also discussed. Program Schedule First Part (1h45m)     ● 1. Introduction to statistical learning theory and deep learning theory     ● 2. Representation ability of deep neural network         ○ 2.1 Universal approximator         ○ 2.2 Depth separation         ○ 2.3 Approximation error of function spaces with smoothness: Holder space, Sobolev space, Besov space         ○ 2.4 Benefit of adaptivity of deep learning: separation between kernel method and deep learning     ● 3. Generalization error on over-parameterized models         ○ 3.1 Generalization gap of deep neural network models         ○ 3.2 Double descent and benign overfitting Coffee Break (15m) Second Part (1h15m)     ● 1. Optimization and generalization         ○ 1.1 Introduction to Neural tangent kernel (NTK) and mean-field regime         ○ 1.2 Convergence analysis of SGD on NTK         ○ 1.3 Optimization on mean-field regime: McKean-Vlasov process         ○ 1.4 Theories of gradient Langevin dynamics Lecturers Taiji Suzuki Associate Professor, the University of Tokyo, Japan Taiji Suzuki is currently an Associate Professor in the Department of Mathematical Informatics at the University of Tokyo. He also serves as the team leader of "Deep learning theory" team in AIP-RIKEN. He received his Ph.D. degree in information science and technology from the University of Tokyo in 2009. He has a broad research interest in statistical learning theory on deep learning, kernel methods and sparse estimation, and stochastic optimization for large-scale machine learning problems. He served as area chairs of premier conferences such as NeurIPS, ICML, ICLR, AISTATS and a program chair of ACML. He received the Outstanding Paper Award at ICLR in 2021, the MEXT Young Scientists' Prize, Outstanding Achievement Award in 2017 from the Japan Statistical Society, Outstanding Achievement Award in 2016 from the Japan Society for Industrial and Applied Mathematics, and Best Paper Award in 2012 from IBISML. | |||||
|