Change to Multi-user Shared Node Access

February 4, 2019 8:00am - 5:00pm EST
Announcements
Brown, Halstead, Rice, Snyder

Research Computing has been assessing how to approach the Meltdown and Spectre security vulnerabilities discovered in Intel processors for some time. Unfortunately, applying the existing patches for these to all cluster nodes could pose a significant reduction in overall performance for our clusters. Since high-performance is paramount for most of our cluster users, this is obviously of great concern to us. However, doing nothing is not an option either.

The vulnerabilities are not at issue unless multiple people are actively sharing a given node, so the best way for us to preserve the performance you have come to rely on is to remove the option for sharing a single node amongst multiple users concurrently. This is the default only on a couple of queues by request, but is currently available by using "naccesspolicy=shared" in PBS. On Monday, February 4th, 2019, we will put in place code to instead change any jobs requesting this to be given "naccesspolicy=singleuser". This will still allow multiple jobs from the same person to share a node, but will not allow multiple different people's jobs to share a single node. No change will be needed to your jobs to allow this, although we would encourage you to request singleuser rather than shared to avoid a warning message about this change.

Packing multiple single-core jobs on a single node is still possible, and you will still be able to utilize the full number of cores in your cluster queue. The Moab scheduler has always preferred to spread jobs from different users to different physical nodes, this node access policy change will simply enforce that preference more strongly.

Front-end submission hosts, which are always shared, have already been patched to avoid the vulnerability, but the performance loss there should not be so damaging. Some clusters with specific use cases, such as Scholar, which is reserved for instructional classroom use, will instead be patched, as the raw CPU performance of nodes there is not so critical. Likewise, Gilberth, which is designed for GPU use, is patched to avoid this issue since GPU performance will not be affected.

While we're confident that the impact of this change should be minimal in practice, please let us know if you have use cases or access patterns that you're concerned about. If you would like to talk to someone on our team about this change, please email us at rcac-help@purdue.edu or stop by one of our open Coffee Hour Consultations. Thank you!

Originally posted: January 7, 2019 11:37am EST

Change to Multi-user Shared Node Access

Follow Us