<  Back to the Polytechnique Montréal portal

Zeroth order optimization for pretraining language models

Nathan Allaire, Mahsa Ghazvini Nejad, Sébastien Le Digabel and Vahid Partovi Nia

Paper (2025)

Open Acess document in PolyPublie and at official publisher
[img]
Preview
Open Access to the full text of this document
Published Version
Terms of Use: Creative Commons Attribution Non-commercial No Derivatives
Download (1MB)
Show abstract
Hide abstract

Abstract

The physical memory for training Large Language Models (LLMs) grow with the model size, and are limited to the GPU memory. In particular, back-propagation that requires the computation of the first-order derivatives adds to this memory overhead. Training extremely large language models with memory-efficient algorithms is still a challenge with theoretical and practical implications. Back-propagation-free training algorithms, also known as zeroth-order methods, are recently examined to address this challenge. Their usefulness has been proven in fine-tuning of language models. However, so far, there has been no study for language model pretraining using zeroth-order optimization, where the memory constraint is manifested more severely. We build the connection between the second order, the first order, and the zeroth order theoretically. Then, we apply the zeroth order optimization to pre-training light-weight language models, and discuss why they cannot be readily applied. We show in p articular that the curse of dimensionality is the main obstacle, and pave the way towards modifications of zeroth order methods for pre-training such models.

Uncontrolled Keywords

Department: Department of Mathematics and Industrial Engineering
Research Center: GERAD - Research Group in Decision Analysis
ISBN: 9789897587306
PolyPublie URL: https://publications.polymtl.ca/64441/
Conference Title: 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2025)
Conference Location: Porto, Portugal
Conference Date(s): 2025-02-23 - 2025-02-25
Publisher: Scitepress
DOI: 10.5220/0013261100003905
Official URL: https://doi.org/10.5220/0013261100003905
Date Deposited: 07 Apr 2025 11:27
Last Modified: 03 Feb 2026 21:17
Cite in APA 7: Allaire, N., Ghazvini Nejad, M., Le Digabel, S., & Partovi Nia, V. (2025, February). Zeroth order optimization for pretraining language models [Paper]. 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2025), Porto, Portugal. https://doi.org/10.5220/0013261100003905

Statistics

Total downloads

Downloads per month in the last year

Origin of downloads

Dimensions

Repository Staff Only

View Item View Item