Data Integration with Oracle

15 Tuesday Sep 2020

Posted by mp3monster in Cloud, General, Oracle, Technology

Tags

Cloud, data, DIPC, integration, marketplace, OCI, ODI, ODI CS, Oracle, Spark

Oracle’s data integration product landscape outside of GoldenGate has with the arrival of Oracle Cloud been confusing at times. This has meant finding the right product documentation can be challenging, and knowing which product to use in your own technology road-map can be harder to formulate. I believe the landscape is starting to settle now. But to understand the position, let’s look at the causes of disturbance and the changes that have occurred.

Why the complexity?

This has come from I think a couple of key factors. The organizational challenges triggered by Thomas Kurian’s departure which has resulted in rather than the product organization being essentially in three parts aligning roughly to Infrastructure, Platform and Applications to being two Infrastructure and Apps. Add to this Oracle’s cloud has gone through two revolutions. Generation 1 now called Classic was essentially a recognition that they needed an answer to Microsoft, Google and AWS quickly (Oracle are now migrating customers off classic). Then came Generation 2, which is a more considered strategy which is leveraging not just the lowest level stack of virtualization (network and compute), but driving changes all the way through the internals of applications by having them leverage common technologies such as microservices along with a raft of software services such as monitoring, logging, metering, events, notifications, FaaS and so on. Essentially all the services they offer are also integral to their own offerings. The nice thing about Gen2 is you can see a strong alignment to CNCF (Cloud Native Compute Foundation) along with other open public standards (formal or de-facto such as Microprofile with Helidon and Apache). As a result despite the perceptions of Oracle, modern apps are standard a better chance of portability.

Impact on ODI

Oracle’s Data Integration capabilities, cloud or otherwise have been best known as Oracle Data Integrator or ODI. The original ODI was the data equivelent of SOA Suite implementing Extract Load Transform (ELT) rather than ETL as it meant the Oracle DB was fully leveraged. This was built on the WebLogic server.

Along Came Cloud

Oracle cloud came along, and there is a natural need for ODI capabilities. Like SOA Suite, the first evolution was to provide ODI Cloud Service just like SOA Suite had SOA Cloud Service. Both are essentially the same on-premises product with UIs to manage deployment and configuration.

ODI’s cloud transformation for the cloud lead ODI CS evolving into DIPC (Data Integration Platform Cloud). Very much an evolution, but with a more web centered experience for designing the integrations. However, DIPC is no longer available (except possible to customers already using it).

Whilst DIPC had been evolving the requirement to continue with on-premises ODI capabilities is needed. Whilst we don’t know for sure, we can speculate that there was divergent development happening creating overhead as ODI as an on-prem solution remained. We then saw the arrival of ODI Marketplace, this provides an easier transition, including taking into account licensing considerations. DIPC has been superseded by ODI Marketplace.

Marketplace

Oracle has developed a Marketplace just like the other major players so that 3rd party vendors could offer their technologies on the Oracle cloud, just as you can with Azure and AWS. But Oracle have also exploited it to offer their traditional products normally associated with on-premise deployments in the cloud. As a result we saw ODI Marketplace. A smart move as it offers the possibility of exploiting on-prem licensing into the cloud along with portability.

So far the ODI capabilities in its different forms continued to leverage its WebLogic foundations. But by this time the Gen2 Oracle Cloud and the organizational structures behind it has been well established, and working up the value stack. Those products in the middleware space have been impacted by both the technology strategies and organization. As a result API for example have been aligned to the OCI native space, but Integration Cloud has been moved towards the Apps space. in many respects this reflects a low code vs code native model.

OCI ODI

Earlier this year (2020) Oracle launched a brand new ODI product, to use its full name Oracle Cloud Infrastructure Data Integration. This is an OCI native (i.e. Gen2 solution leveraging microservices technologies).

This new product appears to be a very much ground up build as it exploits Apache Spark and Function as a Service (FaaS) as foundational elements. As a ground up build, it doesn’t inherit all the adapters the original ODI can offer. This does mean as a solution today it is very focused on some specific needs around supporting the data movement between the various Oracle Cloud storage and Database as a Service solutions rather than general ingestion and extraction processes.

Conclusion

Products are evolving, but the direction of travel appears to be resolving. But we are still in that period where there are capability gaps between the Gen2 native solution and the traditional ODI via Marketplace solution. As a result the question becomes less which product, but when and if I have to invest in using ODI Marketplace how to migrate when the native product catches up.

Packt go free with Oracle again for next 24 hours

25 Tuesday Aug 2015

Posted by mp3monster in Books, Oracle

≈ Leave a comment

Tags

11g, ebook, free, ODI, Oracle, Packt, Packt Publishing

Packt have made another Oracle book available for free today as part of their Free Learning initiative. For the next 24 hours you can get a book on Oracle Data Integrator 11g Cookbook from https://www.packtpub.com/packt/offers/free-learning

Oracle Big Data Handbook – summary review

05 Monday May 2014

Posted by mp3monster in Books, Oracle, Technology

≈ Leave a comment

Tags

Big Data, Big Data Appliance, book, Endeca, Enterprise R, Hadoop, NoSQL, ODI, Oracle, Oracle Big Data Handbook, Oracle Press

Having written several detailed reviews of Oracle Press’ Oracle Big Data Handbook (links below) I thought it useful to produce a summary. Over all is a very insightful and informative book covering the range of technologies that Oracle offers to address the ‘Big Data’ space from a number of view points such hardware with the Big Data Appliance (BDA), software with NoSQL, Enterprise R and Hadoop along with the various adapters (e.g. ODI) and existing product features that existing products make available to support the big data story and contribute to make a cohesive ecosystem. The book looks beyond the technologies classically linked to the ‘Big Data’ term to explore products such as Endeca. I like the act that the book tries to explain the rational behind some of the approaches adopted and the associated value propositions. Finally book looks at governance, maturity and architectural capabilities. All of which makes for an informative and insightful book.

The book isn’t flawless a few challenges that can make the reading a little frustrating occasionally (at least for me as I went cover to cover), for example,looking at the Big Data Appliance we seem to revisit the hardware specifications multiple times. The data governance perspective is data governance not specific to big data in my opinion. Occasionally the book seems to jump about when explaining a number of related areas which means that using the book as more a reference isn’t so easy. Don’t get me wrong these issues are hugely out weighed by the value it brings.

my detailed reviews:

Oracle Big Data Handbook

Oracle Big Data Handbook – part 2 reviewed

22 Tuesday Apr 2014

Posted by mp3monster in General, Oracle, Technology

≈ 2 Comments

Tags

adaptors, BDA, BerkeleyDB, Big Data, book, connectors, Hadoop, NoSQL, ODI, Oracle, Oracle Press, review, Sleepycat, sqoop

After an excellent start in Part 1 of Oracle Press’ Oracle Big Data Handbook (reviewed here). Part 2 moves on to looking at Apache Hadoop, Oracle’s Big Data Appliance and Oracle’s NoSQL offerings.

So chapter 3 provides a brilliant overview of Hadoop and the echo system that has been developed around it. Addressing the divergent versions of Map Reduce leading to the likes of YARN. Touching on how commericalised versions of Hadoop have been taken forward with this (such as Cloudera).

Apache Hadoop echo system

Moving onto to describe the core solution components such as Node Managers and the relationship to hardware and the use of more commodity kit rather than using nice expensive SAN technology.

Hadoop Structure

So now we have good (pretty much uncoloured by Oracle) view of Hadoop. Which leads into the the next chapter (chapter 4) which looks at why Oracle have taken the approach of an Appliance (which could be seen as contrary to the previous stated adoption of commodity kit).

Oracle Big Data

So as you can see Oracle woven together a set of technologies into an Exadata based platform which would not only deal with Big Data Analytics but ideally support other volume scenario needs so you’re not adding another data silo. all of which fits with Oracle’s Engineered Solutions view point. The book takes on a explains the other factors involved in the BDA design – those of commercial considerations and value propositions in relation to its customer base – very refreshing to see (rather than rationalisation through technical arguments alone).

The book addresses the challenge of why should I go to Oracle for big data? Which is well argued on the experience of very large relational deployments. Oracle’s contributions to Hadoop via Cloudera and so on. The chapter finishes with the argument around cost comparison between buying a comparable hardware solution to build your own cluster. Taking just list prices compared to HP and the hardware costs come in more or less the same, that’s before you account for the fact the Oracle price includes all the software.

Chapter 5 addresses the deployment of the BDA, explaining the configuration process, which with the combination of a tool called Mammoth (appropriate really) and the lies of Puppet seems pretty simple as a lot of the solution is preconfigured on the box ready. all of which is reasonably well explained. my only grumble is that we do seem to revisit the details of the hardware fairly regularly as the details are again presented here, although we go into a deeper dive in the configuration. One surprise that I’d not picked up on is that Oracle have made their NoSQL solution available as open source, although a little digging might contribute to why as it has links back to Sleepycat’s BerkeleyDB that Oracle acquired (more here). As the chapter move through the physical aspects of the deployment it also highlights in clear terms any constraints Oracle imposes to ensure that the whole appliance is supportable, the most significant of these areas is the advanced networking that is setup.

Chapter 5 as it moves through deployment considerations addresses the means to know that the appliance is running properly – so we’re talking about system monitoring not just of the hardware but the distributed nature of Hadoop and Map Reduce. So a brief view of the products deployed is given. Obviously this centres on the Enterprise Manager extensions, but also the component level tooling such as Cloudera’s Hadoop Manager.

Chapter 6 in many respects continues building out the view of Hadoop to describe briefly the analytics tooling both in the Oracle RDBMS, R language and data mining/discovery of Endeca. The interesting points in the chapter are about the relationship with RDBMS particularly as an enterprise data warehouse – something I’ve not seen really addressed elsewhere as the common world view seems to put Hadoop in the same camp as NoSQL which seems to be gaining the zeal and polarity that Linux vs Windows used to have when it comes to RDBMS. But I think the book makes a good case for right tool for the right job.

Oracle’s Strategic Product View

Chapter 7 starts to drill in to how the connector package offers which consider Oracle database data transfer, combining the R language with MapReduce and ODI.

The database connector aim to provide efficiency in transferring data between Hadoop and the Oracle RDBMS over say using Sqoop to transfer data to and from an Oracle database (ODI connectors, JDBC, direct OCI etc). To fully understand the explanation of how this works you do need to understand the basics of MapReduce although as the chapter progresses the relevant MapReduce operations are elaborated upon. As the chapter progresses we start being shown configuration fragments for the different connection approaches.

The final chapter of this section of the book looks at the NoSQL database in detail, starting with high level ideas such as how NoSQL relates to ACID and BASE ideas, dropping down into significant (but valuable) detail by describing how clients are kept in sync through the use of separate threads picking up data about the data partitioning (sharding). Once the key components have been well described the chapter moves onto explain how Oracle has optimized the process to make the NoSQL as performant as is possible whilst providing a solution that is elastic in nature and highly resilient but still predictable in its dynamics.

The chapter finishes off with considerations such as installation, how it integrates with Hadoop and OBIEE.

Overall, this a very informative chapter, occasionally it feels like some of the information is being repeated but in a different structure but it isn’t the end of the world, although if you’re reading from cover to cover you need to just press on.

Part 1 of the book is reviewed here.