Case Study Hero Background

Case Study

Dow Jones Oracle19c Standby to AWS Aurora PostgreSQL11 Data Migration

Customer: Dow Jones – Global business and financial news provider headquartered in New York

Industry: Media & Internet, Publishing

Business Pain:

More and more companies are embarking on re-platforming their legacy Oracle databases to PostgreSQL. PostgreSQL’s popularity has increased as it offers cost savings with no licensing fees, removes vendor lock-in, and provides the same results, response time, stability, and reliability offered by proprietary vendor solutions.

Dow Jones, a global business and financial news provider embarked on such a journey by re-platforming their Oracle19c Data Warehouse, with over 12 terabytes of data, to AWS Aurora PostgreSQL.

To meet their goals of minimal downtime, low cost, and reliability, the company focused on the development and testing of AWS Database Migration Service (DMS). Tasks were executed to prove out a DMS solution to meet the data migration downtime window. The team initially leveraged DMS task types to perform full data loads only to answer the “how long, how fast” questions to determine the migration downtime run duration. This phase of the project executed iterative scenarios of DMS task configurations and multiple DMS replication server instance types and quantities.

How we fixed it:

TekStream utilized the Oracle19c Standby database as the source to eliminate any data replication workload impact on the Production database and application environment.

The AWS DMS solution was able to migrate and replicate data from the Oracle19c Standby Database to Aurora PostgreSQL to provide for a minimal downtime duration for application workload cut-over to use Aurora PostgreSQL as the primary database.

In addition to AWS DMS, TekStream implemented Change Data Capture (CDC) to provide for a near-zero data sync lag between the source Oracle database and target Aurora PostgreSQL database, reducing the database cut-over downtime and shifting more focus on application cut-over timelines to minimize the overall Production cutover downtime duration. We implemented and tested “failback” procedures, consisting of implementing AWS DMS configurations for Change Data Capture on Aurora PostgreSQL to replicate back to Oracle19c.

The DMS Multi-AZ failover was successfully tested by rebooting the replication instance and monitoring the DMS tasks to validate tasks properly restarted and resumed on the new primary replication instance. TekStream met the production downtime requirement by completing work in under 30 hours.

Data type conversions and mismatches have a direct impact on the effort to test, debug, triage failed DMS tasks due to truncated data or data type mismatches.

Migrating data only is the simplest use case. Implementing Change Data Capture adds another layer of complexity since it involves the understanding of the application workload, change data volume and rate, transactional consistency, and understanding database system change numbers, logical system number, and checkpoint/recovery/restart procedures. TekStream leveraged CloudWatch alarms for monitoring disk space consumption on the DMS replication server.

As a shared server/resource running multiple large parallel data loads, tasks can cause low space issues on the DMS replication server that ultimately lead to DMS tasks failing and the Aurora PostgreSQL database log reporting data truncation, EOF data record not found, and CSV file and their data rows mismatching the target table columns. For example, source ‘date’ data being loaded into target ‘number’ data.

Data validation is a manual effort and a significant time commitment. The DMS API does support ‘validation only’ tasks, but these perform full data counts on both source and target tables, incur additional load and require significant time to run against a 6 billion row source and target table (i.e. several hours). Enabling ‘data validation’ in the DMS task to migrate the data was not a viable option since it performs a full data count of both the source and target table “*after*” the task has been completed. (i.e. double the work). Leveraging the DMS API to describe the task and pull data row counts of the target tables was an option, but you still need to compare to source table row counts.