Per job isolation of temporary file storage in High Performance Computing (HPC) environments provide benefits in security, efficiency, and administration. HPC system administrators can use the mount_isolation Slurm task plugin to improve security by isolating temporary files where no isolation previously existed. The mount_isolation plugin also increases efficiency by removing obsolete temporary files immediately after each job terminates. This frees valuable disk space in the HPC environment to be used by other jobs. These two improvements reduce the amount of work system administrators must expend to ensure temporary files are removed in a timely manner.Previous temporary file removal solutions were removal on reboot, manual removal, or removal through a Slurm epilog script. The epilog script was the most effective of these, allowing files to be removed in a timely manner. However, HPC users can have multiple supercomputing jobs running concurrently. Temporary files generated by these concurrent or overlapping jobs are only deleted by the epilog script when all jobs run by that user on the compute node have completed. Even though the user may have only one running job, the temporary directory may still contain temporary files from many previously executed jobs, taking up valuable temporary storage on the compute node. The mount_isolation plugin isolates these temporary files on a per job basis allowing prompt removal of obsolete files regardless of job overlap.
College and Department
Ira A. Fulton College of Engineering and Technology; Technology
BYU ScholarsArchive Citation
Satchwell, Steven Tanner, "Isolation of Temporary Storage in High Performance Computing via Linux Namespacing" (2018). Theses and Dissertations. 6873.
bind mounts, mount namespaces, Slurm, supercomputing, HPC, temporary storage