UPDATE - 7/5/2015
An update on the continued cloud disruptions Melbourne IT customers are facing, our internal technical teams have worked in collaboration with the vendor and we've identified the root cause of the issue.
Frustratingly, although the 'why' has been identified, we can't at this point determine the trigger point, which is the 'how'.
As we have confirmed the root cause we are altering the architecture of our cluster to accommodate the limitation. This work will progress throughout the next 72 hours, and offers a medium term solution to the platform stability while we evaluate a long term resolution.
Tomorrow we are testing a medium term solution, although at this stage our focus is on building and testing to the needed standard before committing to production.
Again we know this has been a frustrating experience and we appreciate the impact it's had on you and your business. It's been trying at this end as well and we are optimistic that the current re-engineering will give us breathing space to work to a long term solution.
The Melbourne IT cloud hosting environment is currently undergoing severe disruptions. As end users this will be experienced as hosted websites being 'down'.
We understand the fundamental importance of having your sites available online. We are feeling the pain ourselves because we host our core business sites on the same platforms that we offer to our customers and the vast majority of our business is done online. We get it and we are very sorry. We have committed all available resources to identifying and resolving the root cause of the issue.
For those wanting a more technical explanation please read on. Our cloud hosting platform is 'clustered', this means that many servers are responsible for serving webpages. Traffic to those webservers is managed by load balancers that run in active/passive mode. That is, there is a primary load balancer, with a redundant one available to fail over to.
There is currently an issue that causes our primary load balancer to fail. The nature of this failure then forces the secondary to restart rather than come online.
We are actively working with the vendor to resolve the issue as it's something neither us nor the vendor has seen in a production environment previously. A hotfix was supplied last night which was applied and immediately failed and was rolled back just after midnight.
We are presently working on a short term fix internally to potentially mitigate immediate issues or at least limit the impact to the customer base. We have an ETA of today for a new hotfix to be tested and supplied which we will install overnight to try and fix the root cause of this problem.
We know we've let you down. We unreservedly apologize and are doing all that we can to fix it as soon as possible.
If you have questions or comments, please get in touch. We are listening on email@example.com
Chief Customer Officer
MelbourneIT Group Ltd.