Smooth and Nonsmooth Large Scale Optimization with Inexact Evaluations

Nathan Allaire

Thèse de doctorat (2025)

Document en libre accès dans PolyPublie

Affichage préliminaire

Libre accès au plein texte de ce document
Conditions d'utilisation: Tous droits réservés
Télécharger (2MB)

Afficher le résumé

Cacher le résumé

Résumé

Dans cette thèse, nous étudions deux grandes classes de problèmes d’optimisation rencontrés en apprentissage automatique et en science des données, en nous concentrant sur les situations où certaines informations essentielles du problème ne sont disponibles qu’approximativement. La première classe concerne l’optimisation à grande échelle, et plus particulièrement l’entraînement de modèles d’apprentissage profond, où la fonction objectif s’exprime comme une espérance sur un vaste ensemble de données. Notre première contribution (Chapitre 4) est d’avoir démontré la faisabilité d’entraîner des modèles de langage au moyen de méthodes d’ordre zéro, posant ainsi les bases de l’optimisation à grande échelle sans rétropropagation du gradient, et permettant une réduction significative de la consommation mémoire. L’article correspondant a été publié dans les Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods et est disponible dans [4]. Dans la continuité de ces travaux, notre seconde contribution est KronZO (Chapitre 5), une méthode d’ordre zéro dans laquelle nous apportons des améliorations méthodologiques significatives. L’article correspondant a été soumis dans la revue Springer Nature Computer Science (SNCS) et est disponible dans [6]. Nous en établissons la convergence sous des hypothèses classiques et montrons, à travers des expériences numériques, que KronZO atteint des performances de pointe parmi les méthodes d’ordre zéro, tout en réduisant la consommation de mémoire. La seconde classe de problèmes étudiés relève de l’optimisation non lisse, où la fonction objectif se compose d’un terme lisse et d’un terme de régularisation non-lisse. Une approche courante pour résoudre ce type de problème est la méthode du gradient proximal, qui repose sur le calcul d’opérateurs proximaux. Nous considérons ici le cas où la fonction objectif, son gradient et l’opérateur proximal sont évalués de manière inexacte. Nous introduisons iR2N (Chapitre 6), une méthode quasi-Newton modifiée pour les problèmes régularisés, spécifique-ment conçue pour traiter ces évaluations inexactes. L’article correspondant a été soumis dans le SIAM Journal on Scientific Computing (SISC) et est disponible dans [5]. Nous établissons la convergence de iR2N sous un ensemble d’hypothèses adaptées et développons un cadre général pour construire des opérateurs proximaux inexacts garantissant la convergence théorique de iR2N. L’efficacité de la méthode est évaluée au moyen d’expériences numériques sur divers problèmes et mettent en évidence les gains significatifs liés à l’utilisation d’informations inexactes. L’implémentation de iR2N est disponible publiquement [14], et les opérateurs proximaux inexacts utilisés dans les expériences sont détaillés dans [2, 3].

Abstract

In this thesis, we study two major classes of optimization problems that arise in machine learning and data science, focusing on situations where key problem information is only available approximately. The first class concerns large-scale optimization, and more specifically the training of deep learning models, where the objective function is expressed as an expectation over a large dataset. Our first contribution (Chapter 4) demonstrates the feasi-bility of training language models using zeroth-order methods, thereby laying the foundations of large-scale optimization without backpropagation and enabling a significant reduction in memory consumption. The corresponding paper was published in the Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods and is available in [4]. Building upon this work, our second contribution, KronZO (Chapter 5), introduces significant methodological improvements. The corresponding paper has been submitted to Springer Nature Computer Science (SNCS) and is available in [6]. We establish its convergence under standard assumptions and show through numerical experiments that KronZO achieves state-of-the-art performance among zeroth-order methods while reducing memory consumption. The second class of problems studied falls within nonsmooth optimization, where the objec-tive function consists of a smooth term and a nonsmooth regularization term. A common approach to such problems is the proximal gradient method, which relies on the computation of proximal operators. We consider here the case where the objective function, its gradi-ent, and the proximal operator are evaluated inexactly. We introduce iR2N (Chapter 6), a quasi-Newton method for regularized problems specifically designed to handle such inexact evaluations. The corresponding manuscript was submitted to the SIAM Journal on Scientific Computing (SISC) and will be available in [5]. We establish the convergence of iR2N under suitable assumptions and develop a general framework for constructing inexact prox-imal operators that preserve its theoretical convergence. The effectiveness of the method is assessed through numerical experiments on various problems, which highlight the substantial efficiency gains obtained by leveraging inexact information. The implementation of iR2N is publicly available at [14], and the inexact proximal operators we use are detailed in [2, 3].

Département:	Département de mathématiques et de génie industriel
Programme:	Doctorat en mathématiques
Directeurs ou directrices:	Sébastien Le Digabel , Dominique Orban et Vahid Partovi Nia
URL de PolyPublie:	https://publications.polymtl.ca/71183/
Université/École:	Polytechnique Montréal
Date du dépôt:	25 mars 2026 14:18
Dernière modification:	25 mars 2026 17:28

Citer en APA 7:	Allaire, N. (2025). Smooth and Nonsmooth Large Scale Optimization with Inexact Evaluations [Thèse de doctorat, Polytechnique Montréal]. PolyPublie. https://publications.polymtl.ca/71183/

Statistiques

Total des téléchargements à partir de PolyPublie

Téléchargements par année

Provenance des téléchargements

Actions réservées au personnel

Afficher document