Estimated to cost avoid $500K annually for the DHA by switching from Teradata to a hybrid solution including Cloud Native Services (CNS), Amazon Redshift, and Simple Storage Service (S3), the Enterprise Data Warehouse (EDW) team was tasked by EIDS PMO to migrate the Military Health System (MHS) Information Platform (MIP) Data Warehouse in September 2020 with the effort to begin in November 2020. This task included migrating 45 TB of data from Teradata to Amazon Redshift and 70 Extract, Transform, Load (ETL) jobs from Informatica to Talend in less than a year. Immediately, the EDW team implemented a “team of teams” orchestration approach and worked hand in hand with EIDS, NIWC, Amazon Web Services (AWS) and Talend support, Datacenter Operations (DCOPS), Information Assurance (IA)/Cybersecurity, and Shared Services teams to re-architect and implement this next generation data warehouse for the MIP. Funded under sustainment dollars, the same team responsible for maintaining an operational data warehouse platform on Teradata and Informatica were the same resources responsible for the data warehouse migration.
Data Warehouse Migration
During this time, the team partnered with AWS and EIDS to provide Redshift training for over 120 users, developing training materials and a post assessment training course. The EDW team also collaborated with the AWS technical team to leverage the AWS S3 copy command method to import data from S3 buckets into Redshift. This effort reduced migration time by approximately 20 minutes per one million rows of data compared to moving data through Java Database Connection (JDBC) alternatives. The Multiplatform Data Acquisition, Collection and Analytics (MDACA) Cloud Storage Explorer was implemented to expedite data processing into Redshift from S3 buckets.
Many challenges surfaced while undertaking this historic effort. Approximately 200 user trouble tickets related to the migration were resolved. The team implemented several workarounds to mitigate and overcome several unexpected risks and obstacles over the duration of the migration effort to meet project deadlines while sustaining the production environment. One early workaround occurred when the Database Administration (DBA) Team manually migrated 150 database schemas after a multitude of bugs were identified with the Schema Conversion Tool (SCT). Another workaround to leverage linked servers between SQL Farm and Redshift was creating additional service accounts for analytic Business Intelligence (BI) tools such as Tableau and SAS to mitigate a bug identified with kerberos authentication in the native Other Database Connection (ODBC) driver. During this effort, we discovered a tool available in Teradata that parsed the HL7 messages for MHS GENESIS data was not available in Redshift. Our mitigation leveraged the MDACA Big Data Virtualization solution to parse the messages, enabling continued support of the high priority files for COVID reporting and continued successful support of the production requirement.
Though several resources were pulled during this migration effort for higher priorities, such as COVID Vax Reporting and the Pandemic Response Registry, the team remained flexible and adapted accordingly to ensure all tasks were completed on time. With the team of teams approach and close stakeholder engagement, the migration was completed in September 2021, taking approximately ten months to complete while simultaneously supporting the production environment without any downtime.