COLLOQUIUM Benoît Dherin « Why neural networks find simple solutions »
décembre 12 @ 17:00 -18:00
Speakers: Benoît Dherin (Google (Sunnyvale))
Despite their ability to model very complicated functions and equipped with enough parameters to grossly overfit the training dataset, overparameterized neural networks seem instead to learn simpler functions that generalize well. In this talk, we present the notions Implicit Gradient Regularization (IGR) and Geometric Complexity (GC), which shed light on this perplexing phenomenon. IGR helps to guide the learning trajectory towards flatter regions in parameter space for any overparameterized differentiable model. This effect can be derived mathematically using Backward Error Analysis, a powerful and flexible method borrowed from the numerics of ODEs. For neural networks, we explain how IGR translates to a simplicity bias measured by the neural network GC. We will also show how various common training heuristics put a pressure on the GC, creating a built-in geometric Occam’s razor in deep learning.
https://indico.math.cnrs.fr/event/12893/
- wpea_event_id:
- indico-vnt-12893@indico.math.cnrs.fr
- wpea_event_origin:
- ical
- wpea_event_link:
- https://indico.math.cnrs.fr/event/12893/