As a result of my involvement with the UK Oracle User Group I have been given the opportunity to review Oracle Press’ Oracle Big Data Handbook. I have to admit that I am not a Big Data expert (and reviewing this book was an opportunity to build my knowledge a bit more).
So, Chapter 1 starts providing a brief but succinct history of Big Data (from Google’s work with Map Reduce and lesser known technologies such as Swazall and Dremel), the rise of Hadoop. The primary value proposition of Big Data is briefly explored (highlighting the point that actually RDBMS such as Oracle can accommodate lots of data when in a structured form) but Big Data is the nexus of volume, speed, variety (multiple structures, semi structured and unstructured). The book does suggest that in addition to these factors the data Value (a structured transaction have a lot more value than the same quantity of unstructured data which delivers its value when in context with other data).
From here, a brief look at the Oracle BigData landscape which leads nicely to having a layout for the chapters of the book. Ranging from the Oracle Engineered Systems idea to it’s adoption Hadoop through Cloudera, NoData and onto how this becomes a joined up solution with the likes of OBIEE. Passing through Oracle’s extended version of the R language.
In all a brief, succinct and informative intro.
Chapter 2, takes us on the journey of the business value of Big Data ideas, taking us through some examples such as MCI’s campaign the 1990s to develop insight by mining for friends and family information. In its day we called this sort of thing data mining, now its another aspect of big data. The chapter moves onto describing an idea of Information Chain Reaction (ICR) – where output from one stage produces a response in the next. With communication, change and connection being the primary triggers.
The authors make an interesting point, in the book about taking the metrics for volumes of traffic on social sites with a pinch of salt, not because of the possibility of overstatement (although that is a possibility, after all users is an easy measure for investors) but how and when the measurement is done, and even just changes in API or user process. For example adopting an approach that drives users to just reverify their details regularly could create more user activity although deliver no more real information. Most importantly what is the value of the information/traffic to you.
I also love the fact that the book uses quotes from famous individuals to emphasis points, for example:
The temptation to form premature theories upon insufficient data is the bane of our profession.
– Sherlock Holmes