<  Back to the Polytechnique Montréal portal

Wait analysis of distributed systems using kernel tracing

Francis Giraldeau and Michel Dagenais

Article (2016)

Open Acess document in PolyPublie and at official publisher
[img]
Preview
Open Access to the full text of this document
Published Version
Terms of Use: IEEE OA Publishing Agreement
Download (399kB)
Show abstract
Hide abstract

Abstract

We propose a new class of profiler for distributed and heterogeneous systems. In these systems, a task may wait for the result of another task, either locally or remotely. Such wait dependencies are invisible to instruction profilers. We propose a host-based, precise method to recover recursively wait causes across machines, using blocking as the fundamental mechanism to detect changes in the control flow. It relies solely on operating system events, namely scheduling, interrupts and network events. It is therefore capable of observing kernel threads interactions and achieves user-space runtime independence. Given a task, the algorithm computes its active path from the trace, which is presented in an interactive viewer for inspection. We validated our new method with workloads representing major architecture and operating conditions found in distributed programs. We then used our method to analyze the execution behavior of five different distributed systems.We found that the worst case tracing overhead for a distributed application is 18 percent and that the typical average overhead is about 5 percent. The analysis implementation has linear runtime according to the trace size.

Uncontrolled Keywords

Performance measurement, operating systems, tracing, reverse engineering

Subjects: 2700 Information technology > 2700 Information technology
2700 Information technology > 2715 Optimization
2700 Information technology > 2720 Computer systems software
Department: Department of Computer Engineering and Software Engineering
Funders: CRSNG/NSERC, EfficiOS, Ericsson Software Research
PolyPublie URL: https://publications.polymtl.ca/3078/
Journal Title: IEEE Transactions on Parallel and Distributed Systems (vol. 27, no. 8)
Publisher: IEEE
DOI: 10.1109/tpds.2015.2488629
Official URL: https://doi.org/10.1109/tpds.2015.2488629
Date Deposited: 04 May 2018 16:35
Last Modified: 26 Sep 2024 17:54
Cite in APA 7: Giraldeau, F., & Dagenais, M. (2016). Wait analysis of distributed systems using kernel tracing. IEEE Transactions on Parallel and Distributed Systems, 27(8), 2450-2461. https://doi.org/10.1109/tpds.2015.2488629

Statistics

Total downloads

Downloads per month in the last year

Origin of downloads

Dimensions

Repository Staff Only

View Item View Item