Scheduled maintenance

18/08, 2017

On Friday, 25th of August maintenance on electric lines in the server room will be carried out. Therefore Hexagon must be switched off.  All related file systems (/work, /work-common) will be also off.

The maintenance will start at 07:00 and according to the plan should last until 13:00 o'clock.

During this time work-common will not be available on  Grunch .

Update:
  • 25.08.2017 07:00: Maintenance has started.
  • 25.08.2017 12:50: Storage controller issues are delaying startup of the machine. We are working on the fix.
  • 25.08.2017 15:05: Storage controller issues were remediated. Some disks are rebuilding for /work-common filesystem, thus performance impact might be expected for a couple of days.
  • 25.08.2017 15:20: Hexagon is up again.
26/09, 2016

/work and /work-common filesystems will be unavailable on Grunch on 18th of October starting from 09:00 o'clock. This downtime is part of the scheduled maintenance advertised at
http://syslog.hpc.uib.no/2016/09/21/hexagon-planned-maintenance-18-10-19-10/.

Length of downtime is up to 8 hours for /work-common and up to 2 days for /work.

Please make sure that by this time there are no jobs using /work or /work-common, to avoid data-loss and/or data corruption.

We will keep you updated here.

Update: 2016-10-19 11:07 /work-common is back online and re-mounted on grunch.
21/09, 2016

We will have a two day planned maintenance on hexagon starting on 18th of October 09:00.

During the maintenance we will carry out filesystem upgrade, firmware upgrades as well service the hardware.

The job submission system has reservation in place, thus jobs which are not able to finish before maintenance start, will not be started.


Update: 2016-10-18 09:35 Maintenance has started, slightly delayed due to traffic jam.

Update: 2016-10-19 12:00 Maintenance has been finished, Hexagon is up and accessible again.
12/02, 2016

There is a scheduled maintenance on UPS and UPS power lines in HPC server room on Saturday, 20th Feb. All HPC resources will be stopped at 8:30, we are expecting this maintenance to finish before 17:00 same day.

Hexagon, FImm, Grunch and other connected to them resources will be unavailable. Hexagon queuing system has reservation in place, so that jobs which are not able to finish before the maintenance will not be started.

Update:
2016-02-20 07:45
 System maintenance has started.
2016-02-20 16:30 /work-common filesystem storage got damaged, recovery progress is ongoing.
2016-02-21 07:30 System maintenance has finished, HPC systems are functional again.
20/10, 2015

During the maintenance we have:

  • applied different firmware updates and patches
  • installed newer libraries, compilers and tools

Please note that all libraries compiled with previous version of PGI will have to be recompiled.

Below you will find the complete list of the newly installed software:

  • CCE 8.4.0
  • Chapel 1.12.0
  • Craype 2.4.2
  • GCC 5.1.0
  • FFTW 3.3.4.5
  • HDF5 1.8.14
  • PGI 15.3.0
  • PerfTools 6.3.0
  • MPI 7.2.5
  • NetCDF 4.3.3.1
  • Totalview 8.15.7
02/10, 2015

There will be a maintenance on Hexagon on October, 20th from 9:00. We are planning to finish by the end of the same day.
Queue system has reservation in place. It will not allow to run jobs which will not finish before the maintenance start.
During this maintenance slot we will:
  • Apply Cray SW patches to improve stability, especially of the /work filesystem.
  • Add qsub filter, it will replace email notifications when the job can’t start or has suboptimal parameters and instead it will provide output to terminal when one submits the job.
P.S. During the maintenance /work-common will not be available on GRUNCH and FIMM.


Update:
 2015-10-20 09:00 - Scheduled maintenance has started.

Update: 2015-10-20 17:57 - Maintenance is finished. Please see changes at 

http://syslog.hpc.uib.no/2015/10/20/hexagon-updated-software/

26/05, 2015

There will be a scheduled maintenance on Hexagon on June 16th starting from 9:00. We are expecting to finish on the evening of the same day. During this maintenance slot we are going to upgrade queue system and perform some extra tasks, including replacing IO card on the metadata server. Access to the machine will be closed and all running jobs will be terminated during this maintenance window. The queuing system has reservation in place so that the jobs which are not able to finish before the maintenance will not start. We are expecting that the idle jobs in the scheduler will not be affected. Update: 2015-06-16 09:15 - Scheduled maintenance has started. Update: 2015-06-16 23:48 - Maintenance has finished. We had to cleanup queue system from all jobs including idle and blocked. Please resubmit.