Vendor Data Migration
This document outlines the process for migrating Vendor master data from an on-premises system to a PostgreSQL database. The migration is facilitated using Azure Data Factory (ADF) and involves an intermediary step of storing data in a data lake. The migration process is divided into two main categories based on the type of tables: Reference Data Tables and Base Tables.
Migration Workflow
The migration process comprises the following main steps:
- Extraction of Data from On-Premises System to Data Lake
- Loading Data from Data Lake to PostgreSQL Database.
Each of these steps is handled separately for Reference Data Tables and BO Tables due to differences in the extraction methods.
1. Extraction of Data from On-Premises System to Data Lake Reference Data Tables: -> Pipeline Design:
- ADF Pipeline: Create a dedicated ADF pipeline to extract reference data tables.
- Source: On-premises database (e.g., SQL Server, Oracle).
- Destination: Azure Data Lake Storage (ADLS).
-> Steps:
- Views: Create views in onprem DB in MD schema with same name as table names in cloud.
- Linked Services: Configure linked services for both the on-premises database and ADLS.
- Datasets: Define datasets for the source (on-premises tables) and sink (ADLS).
- Copy Activity: Use the Copy Data activity in forEach loop (to make it dynamic) to transfer data from the on-premises system to ADLS.
- Schedule: Set up a trigger to schedule the pipeline run as required. Configuration Example:
BO Tables:
--TODO
2. Loading Data from Data Lake to PostgreSQL Database.
Reference Data Tables
- Pipeline Design:
- ADF Pipeline: Create a dedicated ADF pipeline for loading reference data tables into PostgreSQL.
- Source: Azure Data Lake Storage (ADLS).
- Destination: PostgreSQL Database.
- Steps:
- Linked Services: Configure linked services for both ADLS and PostgreSQL.
- Datasets: Define datasets for the source (ADLS files) and sink (PostgreSQL tables).
- Data flow: Use the Data flow in a forEach loop (to make it dynamic) with a lookup activity(to get the primary keys of each tables) to transfer data from ADLS to PostgreSQL. It will be an upsert dataflow.
- Mapping: Ensure proper mapping of columns between the source and destination.
- Schedule: Set up a trigger to schedule the pipeline run as required.
BO Tables:
--TODO
Was this page helpful?