Title

Profiling Resource Utilization of Bioinformatics Workflows

Publication Date

5-23-2020

Document Type

Article

Abstract

We present a software tool, the Container Profiler, that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of a containerized job by collecting Linux operating system metrics at the virtual machine, container, and process levels. The Container Profiler can produce utilization snapshots at multiple time points, allowing for continuous monitoring of the resources consumed by a container workflow. To investigate the utility of the Container Profiler we profiled the resource utilization requirements of a multi-stage bioinformatics analytical workflow (RNA sequencing using unique molecular identifiers). We examined the collected profile metrics and confirmed that they were consistent with the expected CPU, disk, network resource utilization patterns for the different stages of the workflow. We also quantified the profiling overhead and found that this was negligible. The Container Profiler is a useful tool that can be used to continuously monitor the resource consumption of long and complex containerized workflows that run locally or on the cloud. This can identify bottlenecks where more resources are needed to improve performance.

Publication Title

arXiv

Publisher Policy

No SHERPA/RoMEO policy available

Open Access Status

OA Disciplinary Repository

Find in your library

Share

COinS