accreditation, arcchitecture, autonomics, BCS, Chaos Monkey, ITSO, Kubernetes, monitoring, Netflix, Open Group, semantics, TOGAF, TRM
So I have an objective to get myself certified as an Oracle Technical Architect. Although the training is only open to Oracle and Partners, the exam is open to all. As you may have guessed from my blog posts I use a lot of Oracle technology. However the Technical Architect examination is based largely on Oracle’s IT Strategies library, and usually referred to as ITSO. Before non-Oracle users switch off, the ITSO is actually built around presenting solid good solution agnostic practises, and only once that is laid out does the material overlay Oracle products. So at least 75% percent of the material applies regardless of the vendor (yes cynics will say the practises will naturally lead you to products – but hey someone has to be bad guy). This actually makes it a worthwhile accreditation – as far as any accreditation can go (no I’ve not done a detailed comparison against Open Group’s Certified Architect – very expensive or the BCS accreditation – bound to BCS membership). TOGAF gives your framework, processes, means to communicate, and the ITSO does well at explaining the technical considerations and could be mapped onto the TOGAF Technical Reference Model (TRM) and Standards Information Base (SIB).
The point, I wanted to get across was in the ITSO is an element on Management and Monitoring (E16583-03 if you want the document reference on the Oracle Technology Network). It makes a lot of really good points about monitoring challenges such as bottom up approach where people monitor the parts of the full capability that they’re responsible for, rather than developing monitoring from a business perspective. The rationale for adopting the business based approach is explained (this is not to say you don’t go into the technical measures & monitors of looking at your infrastructure, databases, services etc. But from the business approach you will capture the information to understand reporting from a user perspective which is how you’ll here about issues. Through your detailed monitoring decomposition to get the right specific data points you can then look at correlation of monitoring data for root cause analysis, but also see and .
What the I think the document misses, or at least underemphasises is the ever increasing importance of the monitoring and logging of what is happening as systems and environments become ever more elastic and self managing, and have as IBM call it autonomics. or self healing, self scaling characteristics. So consider trying to diagnose a problem when a user complains of intermittent performance issues, but you have Kubernetes or another tool scaling up your environment for a period and then back down. Only through measuring from a business context will you able to understand when the user might perceive performance as an issue. Then with excellent logging and audit data as to what components are doing at all levels – so services maybe behaving perfectly but your scaling mechanisms are scaling back too soon.
This leads to another consideration, for those organisations that absolutely committed to idea of self healing and proving in resilience production, as the famous Netflix Chaos Monkey does. You need to be able to correlate the monkey’s activities to what is happening in your environment. Has the monkey uncovered an issue that manifests in a manner you hadn’t expected and as a result your user see intermittent issues.
This all leads me to a rather good presentation from Jimmi Dyson at RedHat who showed the simple value of ensuring you can get semantic meaning from logging. As that means you and slice and dice the information to get understanding of what is happening and lead to root cause. In Oracle land Oracle Enterprise Manage (OEM) is ensuring the semantic understanding when it come to known products.
I’ve meandered a bit, so key points consider ITSO or any other vendor equivalent for sources of good practise. Monitor and measure from a business perspective, but still ensure your collecting detailed semantically meaningful metrics.