Profiling Resource Utilization of Bioinformatics Workflows
We present a software tool, the Container Profiler, that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of a containerized job by collecting Linux operating system metrics at the virtual machine, container, and process levels. The Container Profiler can produce utilization snapshots at multiple time points, allowing for continuous monitoring of the resources consumed by a container workflow. To investigate the utility of the Container Profiler we profiled the resource utilization requirements of a multi-stage bioinformatics analytical workflow (RNA sequencing using unique molecular identifiers). We examined the collected profile metrics and confirmed that they were consistent with the expected CPU, disk, network resource utilization patterns for the different stages of the workflow. We also quantified the profiling overhead and found that this was negligible. The Container Profiler is a useful tool that can be used to continuously monitor the resource consumption of long and complex containerized workflows that run locally or on the cloud. This can identify bottlenecks where more resources are needed to improve performance.
No SHERPA/RoMEO policy available
Open Access Status
OA Disciplinary Repository
Deng, H., Hung, L.-H., Schooley, R., Perez, D., Arumilli, N., Yeung, K. Y., & Lloyd, W. (2020). Profiling Resource Utilization of Bioinformatics Workflows. ArXiv:2005.11491 [Cs]. http://arxiv.org/abs/2005.11491