About #

This article shows how to run an event-driven pipeline in Azure Data Factory to process SAP data extracted with Xtract Universal into an Azure Storage.

Xtract Universal is a universal SAP extraction platform that is used in this example to extract and upload SAP customer master data to Azure Storage.
An event then triggers an ADF pipeline to process the SAP parquet file, e.g. with Databricks. Xtract Universal supports different file formats for Azure storage, this example uses Apache Parquet, which is a column file format that provides optimizations to speed up queries and is more efficient than CSV or JSON.

Target audience: Customers who utilize Azure Data Factory (ADF) as a platform for orchestrating data movement and transformation.

Note: The following sections describe the basic principles for triggering an ADF pipeline. Keep in mind that this is no best practice document or recommendation.

Prerequisites and Assumptions #

General overview #

Azure Storage

Xtract Universal extracts SAP data and loads it into an Azure Storage as a parquet file. An Azure Storage event trigger is used to run an ADF pipeline for further processing of the SAP file.

ADF Pipelines and Storage Event Triggers

The Master pipeline is triggered by an Azure Storage event and calls a child pipeline for further processing. The Master pipeline has an event trigger based on Azure storage.

The Master pipeline has 2 activities: - write a log to an Azure SQL database. This step is optional. - call a Child pipeline to process the parquet file with Databricks.

This article focuses on the Master pipeline.

The Child pipeline processes the parquet file e.g., with Databricks. The Child pipeline in this example is a placeholder.

Use Azure SQL for logging (optional)

In our scenario the ADF pipeline has an activity that runs a stored procedure to log some entries about the pipeline run into an Azue SQL table.

Step 1: Define the SAP Extraction in Xtract Universal #

Define an SAP extraction and set the destination to Azure Storage.
XU_Extraction

XU_Extraction_AzureDest1

In this example we use the storage account xtractstorage and a container called ke-container:
XU_Extraction_AzureDest1

XU_Extraction_AzureDest1

Step 2: Define the ADF pipelines #

Define 2 pipelines in ADF:

  • a master pipeline called ProcessBlogStorageFile and
  • a child pipeline called ProcessWithDataBricks

The Master pipeline contains two activities:
ADF_Pipeline

The first activity sp_pipelinelog executes an SQL stored procedure to write a log entry to an Azure SQL table. The second activity runs a dummy subpipeline. As both activities are out of the scope of this article, there are no further details.

Define the following parameters:

  • fileName: contains the file Name in the Azure Storage.
  • folderPath: contains the file path in the Azure Storage.

Step 3: Define the Storage Event Trigger in the ADF Pipeline #

Define the trigger as followed: ADF_Pipeline_Trigger00

Use the Storage account name and Container name defined in the Xtract Universal Azure Storage destination:
ADF_Pipeline_Trigger01

ADF_Pipeline_Trigger02

ADF_Pipeline_Trigger03

The event trigger provides the following parameters:

  • @triggerBody().fileName and
  • @triggerBody().folderPath

They are used as input parameters for the Master Pipeline.
ADF_Pipeline_Trigger03

Publish the pipeline.

Step 4: Run the SAP Extraction in Xtract Universal #

Run the extraction in Xtract Universal e.g. using the Run dialog.
Run_Extraction

Step 5: Check the Azure Blob storage #

When the extraction is finished, check the Azure Storage.
Azure_Storage_Parquet

Step 6: Check the log table in Azure SQL #

The log table shows an entry each for the master and child pipeline.
SQL_log

Step 7: Check Trigger and Pipeline Runs in ADF #

In ADF you can also check the trigger and pipeline run:
ADF_Trigger_Run

ADF_Pipeline_Run

Download JSON templates #

Here you can download the code of the master pipeline and the trigger in json format:
Download Trigger as json
Download MASTER pipeline as json