Tracing and profiling machine learning dataflow applications on GPU

Article de revue (2019)

Document en libre accès dans PolyPublie

Affichage préliminaire

Libre accès au plein texte de ce document
Version finale avant publication
Conditions d'utilisation: Tous droits réservés
Télécharger (1MB)

Afficher le résumé

Cacher le résumé

Abstract

In this paper, we propose a profiling and tracing method for dataflow applications with GPU acceleration. Dataflow models can be represented by graphs and are widely used in many domains like signal processing or machine learning. Within the graph, the data flows along the edges, and the nodes correspond to the computing units that process the data. To accelerate the execution, some co-processing units, like GPUs, are often used for computing intensive nodes. The work in this paper aims at providing useful information about the execution of the dataflow graph on the available hardware, in order to understand and possibly improve the performance. The collected traces include low-level information about the CPU, from the Linux Kernel (system calls), as well as mid-level and high-level information respectively about intermediate libraries like CUDA, HIP or HSA, and the dataflow model. This is followed by post-mortem analysis and visualization steps in order to enhance the trace and show useful information to the user. To demonstrate the effectiveness of the method, it was evaluated for TensorFlow, a well-known machine learning library that uses a dataflow computational graph to represent the algorithms. We present a few examples of machine learning applications that can be optimized with the help of the information provided by our proposed method. For example, we reduce the execution time of a face recognition application by a factor of 5X. We suggest a better placement of the computation nodes on the available hardware components for a distributed application. Finally, we also enhance the memory management of an application to speed up the execution.

Sujet(s):	2700 Technologie de l'information > 2700 Technologie de l'information 2700 Technologie de l'information > 2706 Génie logiciel 2700 Technologie de l'information > 2715 Optimisation 2800 Intelligence artificielle > 2805 Théories de l'apprentissage et de l'inférence
Département:	Département de génie informatique et génie logiciel
Organismes subventionnaires:	CRSNG/NSERC, Google, Ciena, EfficiOS, Prompt
Numéro de subvention:	CRDPJ468687-14
URL de PolyPublie:	https://publications.polymtl.ca/4213/
Titre de la revue:	International Journal of Parallel Programming (vol. 47, no 5-6)
Maison d'édition:	Springer
DOI:	10.1007/s10766-019-00630-5
URL officielle:	https://doi.org/10.1007/s10766-019-00630-5
Date du dépôt:	09 mars 2020 12:52
Dernière modification:	08 avr. 2024 08:07

Citer en APA 7:	Zins, P., & Dagenais, M. (2019). Tracing and profiling machine learning dataflow applications on GPU. International Journal of Parallel Programming, 47(5-6), 973-1013. https://doi.org/10.1007/s10766-019-00630-5

Statistiques

Total des téléchargements à partir de PolyPublie

Téléchargements par année

Provenance des téléchargements

Dimensions

Actions réservées au personnel

Afficher document