You are here: Home » Research » Publications Details

Publications of the Department


Franchini, Giorgia, (2021)  - Selezione degli iperparametri nei metodi di ottimizzazione stocastica  - , Tesi di dottorato - (, , Universitą degli studi di Modena e Reggio Emilia ) - pagg. -

Abstract: In the context of deep learning, the computational more expensive phase is the full training of the learning algorithm. Indeed the design of a new learning algorithm requires an extensive numerical investigation with the execution of a significant number of experimental trials. A fundamental aspect in designing a suitable learning algorithm is the selection of the hyperparameters (parameters that are not trained during the learning process), in a static or adaptive way. The aim of this thesis is to investigate the hyperparameters selection strategies on which standard machine learning algorithms are designed. In particular, we are interested in the different techniques to select the parameters related to the stochastic gradient methods used for training the machine learning methodologies. The main purposes that motivate this study are the improvement of the accuracy (or other metrics suitable for evaluating the inspected methodology) and the acceleration of the convergence rate of the iterative optimization schemes. To achieve these purposes, the analysis has mainly focused on the choice of the fundamental parameters (hyperparameters) in the stochastic gradient methods: the steplength, the minibatch size and the potential adoption of variance reduction techniques. In a first approach we considered separately the choice of steplength and minibatch size; then, we presented a technique that combines the two choices. About the steplength selection, we propose to tailor for the stochastic gradient iteration the steplength selection adopted in the full-gradient method known as Limited Memory Steepest Descent method. This strategy, based on the Ritz-like values of a suitable matrix, enables to give a local estimate of the inverse of the local Lipschitz parameter. Regarding the minibatch size the idea is to increasing dynamically, in an adaptive manner (based on suitable validation tests), this size. The experiments show that this training strategy is more efficient (in terms of time and costs) compared with the approaches available in literature. We combine the two parameter choices (steplength and minibatch size) in an adaptive scheme without introducing line search techniques, while the possible increase of the size of the subsample used to compute the stochastic gradient enables to control the variance of this direction. In the second part of the thesis, we introduce an Automatic Machine Learning (AutoML) technique to set these parameters. In particular, we propose a low-cost strategy to predict the accuracy of the learning algorithm, based only on its initial behavior. The initial and final accuracies observed during this beforehand process are stored in a database and used as training set of a Support Vector Machines learning algorithm. The aim is to predict the accuracy of a learning methodology, given its accuracy on the initial iterations of its learning process. In other word, by a probabilistic exploration of the hyperparameter space, we are able to find the setting providing optimal accuracies at a quite low cost. An extensive numerical experimentation was carried out involving convex and non-convex functions (in particular Convolutional Neural Networks). For the numerical experiments several datasets well known in literature have been used, for different problems such as: classification, segmentation, regression. Finally, a computational study is carried out to extend the proposed approaches to other methods, such as: Momentum, ADAM, SVRG. In conclusion, the contribution of the thesis consists in providing useful ideas about an effective and inexpensive selection of the hyperparameters in the class of the stochastic gradient methods.