eBook(lets) from O’Reilly

27 Tuesday Sep 2016

Posted by mp3monster in Books, General, Technology

Tags

architecture, Big Data, booklet, books, creative commons, Design, development, ebook, ebooks, free, open books, oreilly, vJUG

I received an email through the virtual Java User Group highlighting the availability of a couple of eBooks around Java published by O’Reilly. The details are below. The books are more booklets (nothing wrong with that). The key difference being that they are shorter and focused on one or two focused subjects (in this case Java 8’s Lambda’s & Streams) which is great because I don’t want a whole Java book again, I just want to get a handle on the key changes and language innovations. It is worth highlighting that these aren’t just ‘free chapters’ which is what you see happen sometimes as the goal of the book is described, doesn’t depend on prior chapters to work the illustrated material and structured with the appropriate cover material contents, index etc so works as a discrete entity.

This approach seems to be coming more common at O’Reilly at least as a marketing device, and we have seen this being done with the Dummies brand where the booklets have then been printed as conference give aways.

Some may argue that this is a reflection of our ever shortening attention span with books. This maybe the case for some, but I suspect it is more about providing some that is more digestible than a ‘free chapter’, but more importantly reflects the recognition that for books that are providing guides (as opposed to reference books – which I’d include patterns books) people don’t want to buy a latest edition of a book where the 1st chapters are exactly the same as the previous edition of the book and that the only significant change is a new section on Lambdas for example.

Any way the latest book details received are:

Introducing Java 8

by Raoul-Gabriel Urma

Offers a practical tutorial to some of the core Java 8 features and gets you programming quickly with Java 8.

Object Oriented vs Functional Programming

by Richard Warburton

Explains the similarities and differences between functional programming and object oriented programming with Java focused examples.

http://insightfullogic.com
@RichardWarburto

The other book(lets) that have drawn my attention to the trend include:

In addition to these Book(let)s O’Reilly offer a range of ‘reports’ such as:

Mapping Big Data, Evaluating Machine Learning Models, Data Driven (the full range can be seen at Free Data Reports)
Open by Design
Design for Voice Interfaces (Siri, Cortana etc)

In addition O’Reilly have a page on ‘Open Books’ (here) – covering significant texts O’Reilly have had some involvement in but published under licenses such as Create Commons.

[18-01-16 Update] New booklet added for Modern Java Script
[13-03-16 Update] New booklet Continuous Delivery With Windows and Dot Net
[14-03-16 Update] New booklet Modern Java EE Design Patterns
[13-08-16 Update] New booklet Release Engineering How Google Builds and Delivers Software and Functional Programming in Python
[16-09-16 Update] new booklet
- 20 Python Libraries You Aren’t Using (But Should)
[19-09-16 Update] New Booklet Monitoring Distributed Systems
[28-09-16]New Booklet Little Book of HTML/CSS Frameworks

Mitigating Risks of Cloud Services

08 Monday Dec 2014

Posted by mp3monster in General, Technology

≈ Leave a comment

Tags

Big Data, data, Data Warehouse, RPO, RTO, SaaS, service, Splunk, XaaS

As previously blogged there are risks with using cloud that differ from self hosting solutions. SaaS, PaaS and all the other XaaS offerings aren’t a panacea. Hopefully you won’t become the next Sony as the provider keeps you patched etc. But if you’re using a SaaS provider that goes bust or you get into litigation with your provider, as result losing access to your data. It could be potentially be months whilst the lawyers sort things out. A horrible situation that no one wants to find themselves in. But how to mitigate such risks?

Any half decent SaaS provider should give directly the means to get a view of all your data through a generic or custom report (s), or will should make available the means for providing an export of your data. The later approach may well come with a cost. If your SaaS solution has a lot of data in place – for example a multinational’s HR solution you may want to just target the extract of deltas. This means extra donkey work and someone to ensure it is happening. How frequently that should depend upon your business needs through an agreed Recovery Point Objective and the tolerance to potential data loss as you can assume you’ll lose everything from the last snapshot. If you have middleware in front of your SaaS service you can have a wiretap to reduce the risk here.

Your net position is in the event of a loss or possibly a prolonged service outage (remember even Amazon have had multi-day failures & not all SaaS solutions follow good cloud practise of being able to fail to secondary centres) is that you have your data and can atleast cobble something together to bridge the gap. Unless you SaaS vendor is offering you something very unique then they’re probably going to have competitors that are more than likely to be glade to help you import the data into their solution for you.

All this for a case of paranoia? Well actually you can have harvest a raft of other benefits from taking full data extracts – for example reconciliation with a view to managing data quality – statistics from Experian show the value of resolving discrepancies. This is to say – that you might find data errors between systems as a result things like edge scenarios such as handling errors in the integration layer. To illustrate the point, let’s assume that your web sales channel is via a SaaS provider and you’re receiving the sales into your on premise ERP for fulfilment and accounting. By taking every week all transactions in the SaaS solution you can identify and discrepancies and reconcile any issues between the sales solution, your finance and fulfilment capabilities to ensure what you have sold is what you have accounted for. If we’re talking about solutions that impact your financial accounting, then for atleast US declarations it maybe necessary to perform such reconciliation in support of Sarbanes Oxley (SOX) requirements.

Add to this a richer data set can be added to your Big Data or Data Warehouse environments allowing you to gain potentially further insights into your activities.

When you are running a hybrid of on premise and cloud solutions or event just cloud but a mix of vendors don’t just think about you application data, but consider whether audit and web traffic information can be retrieved from the vendor – there maybe value in feeding that data into a solution such as Splunk which may then find a pattern of misuse or attack that may not show up with just the monitoring data from your on premise solutions.

The final point I should make, is don’t assume your service provider will let you at the data as described – look at your contracts before any payment or act of agreement. Ideally such checks should be part of your service due diligence activities (along with ESCROW) etc. There are SaaS providers who will consider the data as their property not yours even when the data might be about your employees.

Oracle Big Data Handbook – summary review

05 Monday May 2014

Posted by mp3monster in Books, Oracle, Technology

≈ Leave a comment

Tags

Big Data, Big Data Appliance, book, Endeca, Enterprise R, Hadoop, NoSQL, ODI, Oracle, Oracle Big Data Handbook, Oracle Press

Having written several detailed reviews of Oracle Press’ Oracle Big Data Handbook (links below) I thought it useful to produce a summary. Over all is a very insightful and informative book covering the range of technologies that Oracle offers to address the ‘Big Data’ space from a number of view points such hardware with the Big Data Appliance (BDA), software with NoSQL, Enterprise R and Hadoop along with the various adapters (e.g. ODI) and existing product features that existing products make available to support the big data story and contribute to make a cohesive ecosystem. The book looks beyond the technologies classically linked to the ‘Big Data’ term to explore products such as Endeca. I like the act that the book tries to explain the rational behind some of the approaches adopted and the associated value propositions. Finally book looks at governance, maturity and architectural capabilities. All of which makes for an informative and insightful book.

The book isn’t flawless a few challenges that can make the reading a little frustrating occasionally (at least for me as I went cover to cover), for example,looking at the Big Data Appliance we seem to revisit the hardware specifications multiple times. The data governance perspective is data governance not specific to big data in my opinion. Occasionally the book seems to jump about when explaining a number of related areas which means that using the book as more a reference isn’t so easy. Don’t get me wrong these issues are hugely out weighed by the value it brings.

my detailed reviews:

Oracle Big Data Handbook

Oracle Big Data Handbook – part 3 reviewed

05 Monday May 2014

Posted by mp3monster in Books, General, Oracle, Technology

≈ 2 Comments

Tags

BDA, Big Data, book, CRAN, data mining, Endeca, ODM, Oracle, Oracle Press, Oracle R Enterprise, review, Revolution analytics, RStudio, spatial, warehouse

This is the third and final part of the review of Oracle Press’ Oracle Big Data Handbook (and last part of our detailed review – previous parts can be seen Part 1 here and Part 2 here). With the first sections having introduced the Big Data Appliance and the case for adopting an appliance, followed by an in depth look at the technologies provided on the BDA for storing data we move into the section that really delivers the pay off, namely the mechanics of converting data to information i.e. Analytics. This means this section of the book concentrates on the likes of Oracle Data Mining (ODM), Oracle R Enterprise and Endeca. The first of the chapters in this part of the book looks at different types of analytics you might need to perform for example data mining, predictive analytics, text mining and so on, the result is that the chapter does seem to flip flop between more classic data warehousing (still Big Data in terms of shear data volumes) to the more contemporary hip and trendy of ‘Big Data’ in the form of Hadoop and R. This may work nicely for a DBA/Data Scientist, but as a technologist and enterprise architect it isn’t so easy as personally I’d prefer to get a sense of each product stack then look at how they compliment/overlap. That said, after the first couple of sections where both the tools and ideas are introduced the flip/flopping is quicker making it easier to cope with, but it also makes for a sizeable opening chapter for this section of the book. But let me show you the kinds in sights that can be gained from the book.

ODM extensions are built around the common Oracle toolkits of RDBMS, SQLDeveloper and additional packages to provide powerful visual paradigms and precanned analytics functionality. Not being a data warehouse expert, I like the fact that the book takes time to describe the processes for building a data model and predictive engine and the likely paths through these steps. The books goes onto to explaining the available Excel tooling. Most of this is helped along with the context of a scenario. Given the claim of realtime capability to take a transaction and use a predictive model against the transactions values to ascertain whether the transaction is likely to be indicative of the characteristics being sought after it would have been nice for the book to provide some outline benchmarks for the scenario. Realtime could be interpreted as a second or two. Which when you’re running millions of transactions with small profit margins per transaction means using such capabilities is a also expectation. Still this doesn’t take away from the clarity of the information that is explained.

From ODM the chapter moves onto introducing the R language. What really got my attention by the book is the apparent willingness to engage with an Open Source model (given the other major players in the evolution of R – Google, Facebook, LinkedIn etc you might argue there is no choice). But the book upfront addresses the fact that Oracle hasn’t (or not yet) incorporated an R editor into SQLDeveloper or JDeveloper and the book suggests a specific tool of RStudio. Then there is the engagement with a library of R extensions (CRAN – Comprehensive R Archive Network with over 5000 extensions).

Google Trends view of Analytics Tools

All of which begs the question, what is Oracle’s value proposition in this pace. The book answers this be describing the challenges of using the Open Source edition of R (memory consumption and single threaded characteristics) and how they have addressed those by extending R into Oracle R Enterprise. In addition to these constraints Oracle’s extension recognises and works with the database governance and security layers properly. It is at this point we’re brought back to earlier focus of the BDA as the extensions allow the BDA Hadoop deployment to be used as a data source (along with Oracle RDBMS). In many respects it feels like a similar proposition to Revolution Analytics (other than the RDBMS emphasis being different). As with the Data Mining the example scenario is used to illustrate the applications of R in conjunction with Hadoop and Oracle RDBMS. To support the illustration the different additional libraries are explained (such as the Hadoop connector, RDBMS connector etc).

R enterprise doesn’t stop here, but has been integrated with PL/SQL, OBIEE and BI Publisher meaning that although some of the tools and the core solution are open source Oracle has achieved a rather rich ecosystem – a point not really called out by the book, but the presentation of the details really makes this jump out.

Still with Chapter 9 we move onto text mining for activities such as sentiment analysis and jump back to ODM with an explanation of the product’s capability in this space and the challenges that this kind of analysis presents. Which is followed by a view of the support R offers. The chapter moves onto things like Spatial analytics and so on. The later forms of analytics don’t confront the ideas of ‘Big Data’ based on the book’s opening definition of big data. That isn’t to say that a brief overview of how Oracle Spatial works and its capabilities to support ideas such as Location Intelligence isn’t interesting but I don’t see any differentiation between big data and normal patterns of use for Oracle Spatial. The examples provided such as knowing if there are patterns of location based usage, but such analysis can be done by ensuring a consistent representation of location from which you can select by a range – either a postal/zip code or by latitude and longitude for example, for which there are more cost effective tools and don’t necessitate pulling data out of a Hadoop cluster to perform such analysis. I would conceded that Oracle Spatial has an information rich data set that could be very effective, but to explore such ideas should we not be looking at ideas like that of ESRI’s integration to Hadoop (and more here) for example if Oracle offer such a capability.

Having crossed a range of technologies Chapter 10 briefly talks about IDEs, but then goes for a deeper dive into R covering the supported Open Source Edition and Enterprise Edition (ORE). The differences between the two versions and the licensing issues are well explained. Based on the description what Oracle have done to make the optimisation and ability to transparently leverage the database seems pretty impressive. The only thing to remember is by transparently moving R’s computational load into the DB is what impact on other processes. Oracle have also enabled ORE to access the predictave analytics capabilities that can reside within an Enterprise Database which are also illustrated here.

After looking at ORE’s capabilities the book moves onto its connection to Hadoop for R (Oracle R Connector for Hadoop – ORCH). ORCH provides the means to interact with HDFS along with the file system and RDBMS. The connector allows for the creating of MapReduce jobs using the R language and interacting the the job scheduler. To fully leverage these capabilities you do want to pull in CRAN libraries. The book then walks through a detailed example of using ORCH with MapReduce (including R script elements). This is then followed by a similar set of examples demonstrating direct interaction with HDFS.

Chapter 11, gives us a focus change again, this time to Endeca for Information Discovery. The book takes us through the history of Endeca and Oracle explaining the component naming – before and after the acquisition and the two dimensions of the Endeca product stack – eCommerce specific and for more general BI.

The chapter looks at the Endeca data model as it is a faceted or tagged model (i.e. all values are represented as label & value). The book emphasis’ the benefits of this model – but not downsides (needing to use the label to determine semantic meaning can have performance implications). This is important is it has implications of the flexibility to enrich data that Endeca can then leverage. Once the basic product and technicalities are examined the book actually steps back to explain the differences between BI and information Discovery and therefore the approaches to using these tools. Then onto to the tooling such as the studio, engine and integration capabilities. The book continues to build on the technical side with the classic NFRs of how to make the technology scale. We then flip back to look at a number of example use cases. Before a final jump to look at mechanics of deploying Endeca and getting some development work underway. The sequencing of the chapter sections does seem a little odd, but it does work, but trying to dive into just the technical dimensions alone probably not a practical proposition here.

Big Data governance is taken on the final chapter of the book – Chapter 12. The emphasis here is to look at the definition of Data Management (e.g. definition by Data Management Association – DAMA) and how Big Data relates to this. So the chapter walks through the key data governance factors – many of which are characterised in the diagram above for example focusing on common legislative considerations such as HIPAA, and Patriot’s Act KYC (Know Your Customer) through to EU Data Protection and UK Financial Services Act. Having a breadth wise view of Data Governance then the book starts to look at how Big Data scenarios differ from raw data and day to day data sets. The problem I have with the chapter here, that all the points being made are valid, but they’re not Big Data issues they are any data governance issues. What Big Data does is introduce technology to capture and use data in ways previously not considered so using the technologies in this way may impact declarations that you may have made to data protection registrar e.g. declare you keep customer data to enable order fulfillment but then use the data to determine effectiveness of sales channels would be an issue. But you dont have to have big data technology to create such an issue (the book itself acknowledges that you could do the analysis with older approaches but the difference is it is easier and quicker now). Having described some ‘any data’ guidance for your big data scenarios the book goes into a raft of big data scenarios in different domains and references some of the relevant legislation. If you read the chapter as just data governance it is a good reminder of the different considerations in data governance.

The final Chapter takes on architectural and road mapping considerations. A good way to conclude as this sort of thing will draw on all the preceeding chapters’ points; and this precisely how the chapter starts recapping the value proposition described up front followed the infrastructure considerations – data volumes need to process in parrallel to handle the volumes in timeframes that mean the insights can be assessed and reacted to in a meaningful manner. After the recap the book moves into a maturity model although the origins of the model aren’t clear (I’d have thought that the basis of what is presented is routed in a wider model). This naturally leads into looking at the Oracle Architecture Design Process (OADP). The details of OADP walks thought the goals and mechanics of developing your ‘As-Is’ and ‘To-Be’ architecture so you develop a transitional roadmap. The final step is obviously enabling the journey by developing the human skills necessary to perform such as journey.

Previous Reviews:

Oracle Big Data Handbook – part 2 reviewed

22 Tuesday Apr 2014

Posted by mp3monster in General, Oracle, Technology

≈ 2 Comments

Tags

adaptors, BDA, BerkeleyDB, Big Data, book, connectors, Hadoop, NoSQL, ODI, Oracle, Oracle Press, review, Sleepycat, sqoop

After an excellent start in Part 1 of Oracle Press’ Oracle Big Data Handbook (reviewed here). Part 2 moves on to looking at Apache Hadoop, Oracle’s Big Data Appliance and Oracle’s NoSQL offerings.

So chapter 3 provides a brilliant overview of Hadoop and the echo system that has been developed around it. Addressing the divergent versions of Map Reduce leading to the likes of YARN. Touching on how commericalised versions of Hadoop have been taken forward with this (such as Cloudera).

Apache Hadoop echo system

Moving onto to describe the core solution components such as Node Managers and the relationship to hardware and the use of more commodity kit rather than using nice expensive SAN technology.

Hadoop Structure

So now we have good (pretty much uncoloured by Oracle) view of Hadoop. Which leads into the the next chapter (chapter 4) which looks at why Oracle have taken the approach of an Appliance (which could be seen as contrary to the previous stated adoption of commodity kit).

Oracle Big Data

So as you can see Oracle woven together a set of technologies into an Exadata based platform which would not only deal with Big Data Analytics but ideally support other volume scenario needs so you’re not adding another data silo. all of which fits with Oracle’s Engineered Solutions view point. The book takes on a explains the other factors involved in the BDA design – those of commercial considerations and value propositions in relation to its customer base – very refreshing to see (rather than rationalisation through technical arguments alone).

The book addresses the challenge of why should I go to Oracle for big data? Which is well argued on the experience of very large relational deployments. Oracle’s contributions to Hadoop via Cloudera and so on. The chapter finishes with the argument around cost comparison between buying a comparable hardware solution to build your own cluster. Taking just list prices compared to HP and the hardware costs come in more or less the same, that’s before you account for the fact the Oracle price includes all the software.

Chapter 5 addresses the deployment of the BDA, explaining the configuration process, which with the combination of a tool called Mammoth (appropriate really) and the lies of Puppet seems pretty simple as a lot of the solution is preconfigured on the box ready. all of which is reasonably well explained. my only grumble is that we do seem to revisit the details of the hardware fairly regularly as the details are again presented here, although we go into a deeper dive in the configuration. One surprise that I’d not picked up on is that Oracle have made their NoSQL solution available as open source, although a little digging might contribute to why as it has links back to Sleepycat’s BerkeleyDB that Oracle acquired (more here). As the chapter move through the physical aspects of the deployment it also highlights in clear terms any constraints Oracle imposes to ensure that the whole appliance is supportable, the most significant of these areas is the advanced networking that is setup.

Chapter 5 as it moves through deployment considerations addresses the means to know that the appliance is running properly – so we’re talking about system monitoring not just of the hardware but the distributed nature of Hadoop and Map Reduce. So a brief view of the products deployed is given. Obviously this centres on the Enterprise Manager extensions, but also the component level tooling such as Cloudera’s Hadoop Manager.

Chapter 6 in many respects continues building out the view of Hadoop to describe briefly the analytics tooling both in the Oracle RDBMS, R language and data mining/discovery of Endeca. The interesting points in the chapter are about the relationship with RDBMS particularly as an enterprise data warehouse – something I’ve not seen really addressed elsewhere as the common world view seems to put Hadoop in the same camp as NoSQL which seems to be gaining the zeal and polarity that Linux vs Windows used to have when it comes to RDBMS. But I think the book makes a good case for right tool for the right job.

Oracle’s Strategic Product View

Chapter 7 starts to drill in to how the connector package offers which consider Oracle database data transfer, combining the R language with MapReduce and ODI.

The database connector aim to provide efficiency in transferring data between Hadoop and the Oracle RDBMS over say using Sqoop to transfer data to and from an Oracle database (ODI connectors, JDBC, direct OCI etc). To fully understand the explanation of how this works you do need to understand the basics of MapReduce although as the chapter progresses the relevant MapReduce operations are elaborated upon. As the chapter progresses we start being shown configuration fragments for the different connection approaches.

The final chapter of this section of the book looks at the NoSQL database in detail, starting with high level ideas such as how NoSQL relates to ACID and BASE ideas, dropping down into significant (but valuable) detail by describing how clients are kept in sync through the use of separate threads picking up data about the data partitioning (sharding). Once the key components have been well described the chapter moves onto explain how Oracle has optimized the process to make the NoSQL as performant as is possible whilst providing a solution that is elastic in nature and highly resilient but still predictable in its dynamics.

The chapter finishes off with considerations such as installation, how it integrates with Hadoop and OBIEE.

Overall, this a very informative chapter, occasionally it feels like some of the information is being repeated but in a different structure but it isn’t the end of the world, although if you’re reading from cover to cover you need to just press on.

Part 1 of the book is reviewed here.

Oracle Big Data Handbook – Part 1 Reviewed

07 Monday Apr 2014

Posted by mp3monster in Books, General, Oracle, Technology

≈ 4 Comments

Tags

analytics, Big Data, book, Cloudera, NoData, Oracle, Oracle Press, press, RDBMS, UK Oracle User Group, unstructured data

As a result of my involvement with the UK Oracle User Group I have been given the opportunity to review Oracle Press’ Oracle Big Data Handbook. I have to admit that I am not a Big Data expert (and reviewing this book was an opportunity to build my knowledge a bit more).

So, Chapter 1 starts providing a brief but succinct history of Big Data (from Google’s work with Map Reduce and lesser known technologies such as Swazall and Dremel), the rise of Hadoop. The primary value proposition of Big Data is briefly explored (highlighting the point that actually RDBMS such as Oracle can accommodate lots of data when in a structured form) but Big Data is the nexus of volume, speed, variety (multiple structures, semi structured and unstructured). The book does suggest that in addition to these factors the data Value (a structured transaction have a lot more value than the same quantity of unstructured data which delivers its value when in context with other data).

From here, a brief look at the Oracle BigData landscape which leads nicely to having a layout for the chapters of the book. Ranging from the Oracle Engineered Systems idea to it’s adoption Hadoop through Cloudera, NoData and onto how this becomes a joined up solution with the likes of OBIEE. Passing through Oracle’s extended version of the R language.

In all a brief, succinct and informative intro.

Chapter 2, takes us on the journey of the business value of Big Data ideas, taking us through some examples such as MCI’s campaign the 1990s to develop insight by mining for friends and family information. In its day we called this sort of thing data mining, now its another aspect of big data. The chapter moves onto describing an idea of Information Chain Reaction (ICR) – where output from one stage produces a response in the next. With communication, change and connection being the primary triggers.

The authors make an interesting point, in the book about taking the metrics for volumes of traffic on social sites with a pinch of salt, not because of the possibility of overstatement (although that is a possibility, after all users is an easy measure for investors) but how and when the measurement is done, and even just changes in API or user process. For example adopting an approach that drives users to just reverify their details regularly could create more user activity although deliver no more real information. Most importantly what is the value of the information/traffic to you.