Generalization

Background Modern deep neural networks, especially those in the overparameterized regime with a very high number of parameters, perform impressively well. Traditional learning theories contradict these empirical results and fail to explain this phenomenon, leading to new approaches that aim to understand why deep learning generalizes. A common belief is that flat minima [1] in the parameter space lead to models with good generalization characteristics. For instance, these models may learn to extract high-quality features from the data, known as representations. However, it has also been shown that models with equivalent performance can exist at sharp minima [2, 3]. In this project, we will study from a geometric perspective the learned representations of deep learning models and their relationship to the sharpness of the loss landscape. We will also consider in the same framework additional aspects of training that enhance generalization. ...