Maintenance | Phil (aka MP3Monster)'s Blog

Tags

Cloud, HA, Maintenance, Oracle, PaaS, patching, SOA, SOACS

When you’re using SOA Suite to run round the clock services you need to give a fair bit of thought to your deployment configuration so it becomes possible to perform rolling patches and other maintenance tasks not only to SOA itself but all the way down to the hardware – and at the low levels you have no control on the maintenance process. Although it is very easy to think that the moment you’re using PaaS that these problems are taken care for you, life isn’t as simple as that.

Oracle cloud services typically go through a patching process once a month and usually within a defined 8 hour period on a Friday night. During this period you may lose the use of your servers as the maintenance is performed within a particular availability zone. In an ideal world this would be a rolling process so you don’t lose everything at once. If the maintenance window is used to to deploy SOA Suite patches then although you will be told of the maintenance window you actually wont have an outage, but post the maintenance window your cloud dashboard will have the option to apply the patches at a time that best suits you. Not only that the patch application process is smart enough to apply it in a rolling manner as the Weblogic nodes in the cluster will have information on each other which the patch mechanism can utilise.

So where is the problem. It is very easy to forget that the PaaS platform is virtual, this means the virtualization platform being software will inevitably need patching whether that is for bug fixing, addressing security requirements or adding new capabilities. These kinds of changes today will trigger a service shutdown. Let’s be honest when trying to balance a rolling change and maximise PaaS client density is going to create a monumentally complex problem, so simplicity and and speed of roll-out suggests a small outage is easier. So how do I therefore assure I can maintain a quality of service if I accept this as a necessity?

Well the answer is pretty much the same as an on premise reference architecture. Have SOA with its supporting databases running in a second availability zone that will have a different patch time. This is going to push up the cost as you’ll need a database with Dataguard. Assuming an active-passive model across your centres, as you approach the maintenance window you’ll get your load balancer to route work load to the second location and let the existing workload run dry on the servers due to go through the maintenance process. Then after the maintenance window you’ll reverse the process.

The current gotchya with this is that you pay for SOA by the month so you in effect have to run two clusters, although hour and daily models are coming.With the hourly model you can have the second availability zone ready for use by keeping the DB alive there, but only startup the SOA instances on the hourly rate when you know the maintenance window is going to occur and it is clear there will be an infrastructure impact.

The other sticky point, is presently as the period allocated is upto eight hours, your second centre needs to be running in a timezone with atleast 8 hours difference (allowing time to fail back). This would mean if you are using the Amsterdam or Slough locations your second location is going to West coast US or Asia Pacific once live later this year or Japan. All of which will present serious issues regarding personal data.

I have been told that some signficiant customers have accepted the situation on the basis the downtime in reality isn’t frequent and correlates to low business periods. But I suspect competition and customer demand will force this to change.