SAN problems - update #3
We have resolved the problem with the SAN node, however services will continue to be unavailable for the next two to three hours as we are performing an emergency upgrade on the SAN.
DetailsThe logs for the failed node show no hardware issues. This is the same node which caused us to abort the previously scheduled SAN maintenance from two weeks ago. That maintenance window was to enable us to install critical updates recommended by Lefthand. We aborted the maintenance window due to the fact that this same node did not correctly release control of its HA IP address before the upgrade had started. It is felt that tonight's problem may be related and after careful analysis of log files by both us and a senior Lefthand engineer, a reboot of the node was performed and has resulted in all volumes being brought back online without further issue.Because of the problem with the HA IP address not being released, we could not risk rebooting the node without first shutting down all servers using the SAN. As this obviously results in services being inaccessible and because this is our quietest time of the day, Lefthand strongly encouraged us to perform the upgrade tonight to rule out any further problems possibly related to older software versions. We expect to be completed at around 3am BST and will post a further update then.Thank you for your patience. We are deeply sorry for this unexpected downtime, however we hope you can see we are taking the best and most appropriate actions given the circumstances.