<  Retour au portail Polytechnique Montréal

Wait analysis of distributed systems using kernel tracing

Francis Giraldeau et Michel Dagenais

Article de revue (2016)

Document en libre accès dans PolyPublie et chez l'éditeur officiel
[img]
Affichage préliminaire
Libre accès au plein texte de ce document
Version officielle de l'éditeur
Conditions d'utilisation: IEEE OA Publishing Agreement
Télécharger (399kB)
Afficher le résumé
Cacher le résumé

Abstract

We propose a new class of profiler for distributed and heterogeneous systems. In these systems, a task may wait for the result of another task, either locally or remotely. Such wait dependencies are invisible to instruction profilers. We propose a host-based, precise method to recover recursively wait causes across machines, using blocking as the fundamental mechanism to detect changes in the control flow. It relies solely on operating system events, namely scheduling, interrupts and network events. It is therefore capable of observing kernel threads interactions and achieves user-space runtime independence. Given a task, the algorithm computes its active path from the trace, which is presented in an interactive viewer for inspection. We validated our new method with workloads representing major architecture and operating conditions found in distributed programs. We then used our method to analyze the execution behavior of five different distributed systems.We found that the worst case tracing overhead for a distributed application is 18 percent and that the typical average overhead is about 5 percent. The analysis implementation has linear runtime according to the trace size.

Mots clés

Performance measurement, operating systems, tracing, reverse engineering

Sujet(s): 2700 Technologie de l'information > 2700 Technologie de l'information
2700 Technologie de l'information > 2715 Optimisation
2700 Technologie de l'information > 2720 Logiciel de systèmes informatiques
Département: Département de génie informatique et génie logiciel
Organismes subventionnaires: CRSNG/NSERC, EfficiOS, Ericsson Software Research
URL de PolyPublie: https://publications.polymtl.ca/3078/
Titre de la revue: IEEE Transactions on Parallel and Distributed Systems (vol. 27, no 8)
Maison d'édition: IEEE
DOI: 10.1109/tpds.2015.2488629
URL officielle: https://doi.org/10.1109/tpds.2015.2488629
Date du dépôt: 04 mai 2018 16:35
Dernière modification: 19 mai 2023 13:59
Citer en APA 7: Giraldeau, F., & Dagenais, M. (2016). Wait analysis of distributed systems using kernel tracing. IEEE Transactions on Parallel and Distributed Systems, 27(8), 2450-2461. https://doi.org/10.1109/tpds.2015.2488629

Statistiques

Total des téléchargements à partir de PolyPublie

Téléchargements par année

Provenance des téléchargements

Dimensions

Actions réservées au personnel

Afficher document Afficher document