Oracle High Availability on Azure – What & Why

Tags

Azure, Cloud, dataguard, Microsoft, Oracle, rac

Many organisations come to cloud from an approach of ‘not my computer’. This is occurs for a number of reasons but considerations such as:

OPEX (operational spend) over CAPEX (capital spend)- converting significant upfront expenditure into an outlay on more regular intervals. Some years ago this might have been approached through lease agreements once you got into the server space
Flexibility in sizing (although many forget that this flexibility does come at a premium)
Ability to host the kit – many organisations won’t have he appropriate physical infrastructure necessary to house servers to a standard that offers the desirable levels of security and assurance for always on capabilities.

But cloud by which I mean IaaS (Infrastructure as a Service), does not really equate to someone housing my computer, or potentially even as simple as virtualising my computer. This comes from several factors:

Really big cloud providers such as Amazon with AWS, Microsoft with Azure, Google, Dropbox are not using run of the mill servers, but build their own servers so they can optimise the design to allow the best VM to server densities
Ability to make hardware be very cost effective, for example Google is well known for by commodity storage and using data distribution techniques to give performance and. Failure resilience.

So how does this relate to Oracle and High Availability? Well when you want to make you data tier of an oracle solution both highly available as well as scaling through scale out you end up using Real Application Cluster (RAC) at the database. Simply providing VM resilience will not give sufficient availability for continuously on conditions, you need the software tier to continuously pickup demand, and availability of servers to do that is handled by the virtualisation tier so if you have a node failure then you will have at least 1 remaining whilst the virtualisation launches another instance.

The problems start because RAC has some platform requirements (disk sharing either virtual or physical) that can’t be offered by all cloud (IaaS) that can be typically established with on premise hardware such as a SAN. Microsoft Azure has one of these very issues meaning it presently can’t run RAC (see here). Amazon doesn’t have this issue (details here) and obviously not be a problem for Oracle cloud (see here).

map The second consideration that tends to get overlooked is data centre level DR. It is very easy to forget regardless how good the data centre is with precautions and redundancy there are some events that can bring a centre down. Even the most sophisticated monitoring and live VM movement can’t avoid the data centre level problems. There are well published illustrations of such issues, the best known are those Amazon have had (probably because it has hit some many customers – Amazon’s own analysis of one event here). So if you want a truly resilient always on, you need Dataguard replicating to another data centre if possible. You can of course use Dataguard within a data centre as well to offset the possibility on not having RAC, but it does mean scaling is limited to what you can do vertically (I.e. More CPU cores, more memory, or disk). It will also place different demands on the design of you application tiers.