Geometric Analysis of Deep Learning
Background Modern deep neural networks, especially those in the overparameterized regime with a very large number of parameters, perform impressively well. Traditional learning theories contradict these empirical results and fail to explain this phenomenon, leading to new approaches that aim to understand why deep learning generalizes. A common belief is that flat minima [1] in the parameter space lead to models with good generalization properties. For instance, such models may learn to extract high-quality features from the data, known as representations. However, it has also been shown that models with equivalent performance can exist at sharp minima [2, 3]. These contradictory findings motivate us to study optimization, learned representations, and their impact on generalization from a geometric perspective. ...