Skip to main content
Have a request for an upcoming news/science story? Submit a Request

Bell

  • Unscheduled Bell outage

    The Bell cluster began experiencing issues with scheduler database around 11:35am EST. The problem manifests as freezing and/or "socket timed out" and "Unable to contact slurm controller" error messages upon the usual Slurm comman...

  • Unscheduled Bell outage

    The Bell cluster began experiencing issues with its scratch filesystem around 6:30pm EST. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will prov...

  • Research Computing Holiday Break

    Research Computing personnel will observe the university winter break from 5:00pm EST EST on Wednesday, December 22nd, 2021, and will resume normal business hours on Monday, January 3rd, 2022. During this time, Research Computing services will conti...

  • Unscheduled Bell outage

    The Bell cluster began experiencing issues with high load and sluggish performance on the scratch filesystem around 1:20pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this...

  • Unscheduled multiple clusters and Data Depot outage

    The Bell, Brown, Gilbreth, Halstead, Hammer, Scholar, Workbench clusters and Data Depot began experiencing issues with intermittent high load on the Data Depot servers around 4:30pm EDT. Engineers are currently diagnosing the issue and are working to...

  • Unscheduled multiple clusters and Data Depot outage

    The Bell, Brown, Gilbreth, Halstead, Hammer, Scholar, Workbench clusters and Data Depot servers began experiencing issues with Data Depot mounting on Wednesday, September 29th, 2021 around 4:40pm EDT. Engineers are currently diagnosing the issue and...

  • Unscheduled Bell, Brown, Gilbreth, Halstead, Hammer, Scholar, and Data Depot outage

    The Bell, Brown, Gilbreth, Halstead, Hammer, Scholar, and Data Depot cluster began experiencing issues with Data Depot mounting around 7:00am EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been...

  • Unscheduled Data Depot and community clusters outage

    At about 9:30am EDT, Data Depot servers started experiencing a ramping high load. Coupled with an ongoing scaling issues with the metadata subsystem, this caused Data Depot to become increasingly unresponsive for both community clusters and network d...

  • Unscheduled Data Depot outage on multiple clusters

    The Bell, Brown, Gilbreth, Halstead, Scholar, and Workbench clusters began experiencing issues with mounting old Data Depot filesystem around 12:30am EDT. Multiple nodes are flagged offline by an automatic check, and bioinformatics application suite...

  • RCAC Whole-Floor Downtime and Power Work

    The majority of the Research Computing computational resources will be unavailable July 30, 2021 7:00am - August 1, 2021 12:00pm EDT for a whole-floor downtime due to electrical power work in MATH and POD data centers. Along with a required preven...

  • Scheduling Paused on Multiple Clusters

    At about 4:00 pm today (Wednesday, 21 July, 2021) System Engineers found an issue with the schedulers on the Bell, Brown, Gilbreth, Halstead, and Scholar clusters. Job scheduling has been paused while this is being investigated. Symptoms of this pro...

  • Unscheduled Bell outage

    • Last updated:

    The Bell cluster began experiencing issues with its home and scratch directories filesystem around 12:40pm EDT. Problems manifest as hanging new logins and unresponsive established sessions. Engineers are currently diagnosing the issue and are workin...

  • Intermittent Access Failures on Data Depot

    As of Thursday, June 17th, 2021 at 11:00am EDT, users of community clusters may experience intermittent "permission denied" errors while trying to access their files on Data Depot. Errors may come and go, and may appear on both login and c...

  • Whole-Floor Cluster Maintenance

    • Last updated:

    The majority of Research Computing computational resources (Bell, Brown, Gilbreth, Halstead, Hammer, Scholar, WCERES, Workbench, and WSC Hadoop clusters) will be unavailable Tuesday, May 11, 2021 at 5:00pm EDT for Data Depot migration work. The clust...

  • Unscheduled outage on multiple clusters

    • Last updated:

    Due to problems with cooling system in the MATH datacenter, the CMS, Bell, Brown, Gilbreth, Halstead, WCERES, and WSC Hadoop clusters began experiencing issues around 4:00pm EDT. Multiple front-end, compute and storage services are affected. Engineer...

  • ANSYS Fluent software unavailable on Bell

    • Last updated:

    We have received multiple reports about ANSYS Fluent software on Bell cluster being unavailable. We are currently diagnosing the issue and are working to identify a fix. We will provide an update by 6pm tonight.

  • Bell Cluster Maintenance

    • Last updated:

    The Bell cluster will be unavailable Tuesday, February 23, 2021 at 8:00am EST for scheduled maintenance. The cluster will return to full production by %enddatetime%. During this time, Bell will have a maintenance upgrade performed on the software com...

  • Unscheduled Data Depot outage

    • Last updated:

    The Data Depot storage server began experiencing issues around 3:00pm EST on Thursday, February 4th, 2021. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused on all clusters while this issue...

  • Unscheduled Data Depot outage

    • Last updated:

    The Bell, Brown, Gilbreth, Halstead, Rice, Scholar, and Snyder clusters began experiencing issues with their Data Depot mounts around 10:00pm EST. Engineers are currently diagnosing the issue and are working to identify a fix. To avoid job losses for...

  • Unscheduled Bell outage

    • Last updated:

    The Bell cluster began experiencing issues with its scratch filesystem around 4:00pm EST. Engineers are currently diagnosing the issue and have opened a ticket with the vendor to identify a fix. Job scheduling has been paused while this issue is bein...