How To Copy Multiple Files From S3 To Azure Data Lake Storage
In this tutorial, I'm going to show you how to copy multiple files automatically from AWS S3 to Azure Data Lake Storage using JSCAPE MFT Server.
Watch the video
Would you prefer to watch a video showing how to automatically copy multiple files from AWS S3 to Azure Data Lake? If so, you may play the video below. Otherwise, just skip it if you wish to continue reading.
I'm going to assume you already have an Azure Data Lake trading partner and an Amazon S3 trading partner. If you don't know how create these, read the blog posts (or watch the videos on those posts):
How To Push Files From Local To Azure Data Lake Based On An Event
How To Connect and Upload Files To an Amazon S3 Trading Partner
Once you have those two trading partners ready, you may then proceed to add a trigger that would copy multiple files from that S3 trading partner to the Azure Data Lake trading partner.
Go to the Triggers module and click the Add button to add a new trigger.
You'll then be given the option to choose a trigger template that best describes your desired workflow. Let's just skip this part for now and click OK.
Give this trigger a name, say, 'copy multiple files from s3 to azure data lake storage'. After that, choose an event type that you want this trigger to listen for. We want this trigger to fire at a particular time, so we just choose the Current Time event type.
Click Next to proceed.
Basically, we want this trigger to fire at 11:30 PM everyday, so we create the expression for that using the Expression Builder. If you don't know how to use the Expression Builder yet, read the post:
Introducing the New Trigger Conditions Expression Builder
Once you have your expression ready, click the Next button.
We're now ready to add the trigger action that would ultimately copy multiple files from your S3 trading partner to your Azure Data Lake trading partner.
Click the Add button to add a new trigger action.
Next, expand the Action drop-down list and then select Trading Partner Synchronization. After that, click OK.
Let me now walk you through the key settings for this Trading Partner Synchronization action.
The first ones you'll encounter are PartnerA and PartnerB. PartnerA is basically the source. It's the trading partner from which the target trading partner will be copying from. So, in our case, that would be the AWS S3 trading partner or 'tp-s3'.
PartnerB, on the other hand, is the target. It's the trading partner to which the source trading partner will be copying to. In our case, that would be the Azure Data Lake trading partner or 'tp - azure data lake'.
Next up are PathA and PathB. PathA is the relative directory path in PartnerA which PathB will be copying from, and PathB is the relative path on PartnerB that PathA will be copying to.
In our case, PathA is 'jscapejohn/folder1' and PathB is 'jscape1/folder1', wherein:
- jscapejohn is a bucket in AWS S3 and folder1 is a folder inside that bucket
- jscape1 is a data lake storage in Azure Data Lake and folder1 is a folder inside it
The next setting we need to specify is the Copy Condition. This is the condition JSCAPE MFT Server will use to determine whether to commence copying (or synchronizing) files each time the predefined schedule of this trigger is up.
If you select:
- different time, JSCAPE MFT Server will commence copying if it sees that file timestamps on A are different from the ones on B;
- different size, JSCAPE MFT Server will commence copying if it sees that file sizes on A are different from the ones on B;
- different content, JSCAPE MFT Server will commence copying if it sees that the content in A is different from the content in B
Let's just choose different time for now.
Another setting you need to specify is the Synchronization Mode. There are four options:
- mirror - New and modified files from A are copied to B; redundant files in B will be deleted;
- synchronize - New and modified files from both paths are copied to each other
- backup - All files from A are copied to B
- contribute - New and modified files from A are copied to B
Based on the descriptions given and your particular use case, you'll likely have to choose between backup and contribute. I'm going to choose contribute, so my trigger is only going to transfer files from A that are new new and/or modified.
Lastly, you need to specify the Result Directory. This is where the results of the copying process will be written to.
Click OK and then drag an arrow from the Start output of the Workflow node to the Trading Partner Synchronization Action node.
Recommended read: Introducing the Redesigned Trigger Action Workflow
Click OK to finalize the trigger creation process.
That's it. Now you know how to configure JSCAPE MFT Server so that you can copy multiple files from AWS S3 to Azure Data Lake Storage.
Try this yourself
Would you like to try this yourself? Download the FREE, fully-functional Starter Edition of JSCAPE MFT Server now.
Download JSCAPE MFT Server Trial