Skip to main content

Connecting to Azure Data Lake Storage Gen2

Import data

Overview

You can make Azure Storage the foundation for building enterprise data lakes on Azure with the help of Azure Data Lake Storage Gen2. You can use SkyPoint AI’s built-in connector for importing data from Azure Data Lake Storage Gen2. This document will guide you through the process of connecting Azure Data Lake Storage Gen2 to Skypoint AI.

Prerequisite

You will need the following details to configure and import data using Azure Data Lake Storage Gen2 connector:
  • Storage account name
  • Account key details
  • Storage path.

Import data using Azure Data Lake Storage Gen2 connector

Follow the below steps to create a new dataflow for the Azure Data Lake Storage Gen2 import connector:
  1. Go to Dataflow > Imports.
  2. Click New dataflow.
The Set dataflow name page appears. Alt text
  1. In the Set dataflow name page, type Dataflow name in the Name text area.
  2. Click Next.
The Choose connector page appears. Alt text :::note
  1. Azure Data Lake Storage Gen2: Use this to import each CSV file as an entity.
  2. Azure Data Lake Storage Gen2 v2: Use this to import multiple CSV files in a folder as a single entity.
:::

To choose Azure Data Lake Storage Gen2 connector

  1. In the Choose connector page, select Azure Data Lake Storage Gen2 connector.
:::note You can use the Search feature to find the connector. Also, the Azure Data Lake Storage Gen2 connector is available under both Analytics and Cloud categories. ::: The Set dataflow name page appears. Alt text
  1. Type a Display Name for your dataflow in the text area.
  2. Type a Description for your dataflow in the text area.
  3. Click Next.
The Configuration page appears. Alt text

To configure Azure Data Lake Storage Gen2

Follow the below steps to configure the connection to Azure Data Lake Storage Gen2:
  1. Type the Storage account name in the text area.
  2. Type the Account key in the text area.
  3. Click the Folder icon in the Storage path text area.
Once you select your Storage path, the Table Details columns appear. Alt text :::note Upon selecting Azure Data Lake Storage Gen2 v2, Table Details appear as per the image below. ::: Alt text
  1. Enter the Table Details to process the data.
ItemDescription
PurposeOption to assign a purpose (Data or Metadata) for each table.
Data
Loads customer data.
Metadata
Loads Metadata.
File NameDisplays the name of the file that you imported.
Folder NameDisplays the name of the folder that are present in the selected folder. (Applicable only for Azure Data Lake Storage Gen2 v2)
Table NameDisplays the imported table name.
Datetime formatDisplays a number of Datetime Formats and Skypoint AI is set to automatically detect them.
DelimiterDisplays available separators for the variables in the imported data.
First Row as HeaderCheck the box for the system to automatically collect the data according to the Header Contents.
Advanced SettingsSelect the options to fine tune the Import process with minute details.
  1. If necessary, apply the Advance settings to modify the default settings.
The Advanced settings pop-up appears. Alt text :::note Advanced settings allow you to modify the default settings. It gives more flexibility to apply advanced use cases. However, the default settings are adequate to perform the task. :::
If you want toThen
Modify data types such as fixed or variable data types.Select from the Compression type. It allows you to reduce the size of data by removing the number of bits.
Change the delimiterClick Row delimiter. By default, a column delimiter is selected, and each row is separated with a comma.
Change information or instructionChoose from the Encoding list. By default, UTF-8 encoding is selected.
Modify the escape character such as backslash (\) or slash (/)Select from the Escape character.
Apply different quote characters such as Single quote (’) or Double quote (”).Select from the Quote character.
  1. Click Save.

Run, edit, and delete the imported data

Once you save the connector, the Azure Data Lake Storage Gen2 connector gets displayed in the list of tables created in the Dataflow page. Alt text
ItemDescription
NameDisplays the name of the imported Dataflow.
TypeDisplays connector type symbol.
StatusIndicates whether the data is imported successfully.
Tables CountDisplays the number of tables.
Created DateDisplays date of creation.
Last refresh typeDisplays the refresh value. After the last data refresh, it will indicate whether the value is Full or Incremental.
Updated DateDisplays last modified date.
Last RefreshDisplays the latest refresh date, which updates each time you refresh the data.
Group byOption to view the items in a specific Group (For example, name, type, status).
  • Select the horizontal ellipsis in the Actions column and do the following:
If you want toThen
Modify the DataflowSelect Edit and modify the Dataflow. Click Save to apply your changes.
Execute the DataflowSelect Run.
Bring the data to its previous stateSelect Rollback.
Delete the DataflowSelect Remove and then click the Delete button. All tables in the data source get deleted.
See the run history of the DataflowSelect Run history.

Setup guide

Overview

This document will help you gather all credentials for connecting Azure Data Lake Storage Gen2 with Skypoint AI.

Prerequisite

You must have an Azure account.

To Create a Storage Account

Follow below steps to create a storage account:
  1. Log in to Microsoft Azure.
  2. Go to Azure services > Storage accounts.
Alt text
  1. Click Create.

Creating Storage account name

  1. Go to Create a storage account > Basics.
  2. Fill the information to create a new Storage account.
    • Type a desired Storage account name in the text area.
:::note The name must be unique among all existing storage account names in Azure. It must be between 3 to 24 characters long and can contain only lowercase letters and numbers. :::
Alt text
  • In Advanced tab, check Enable hierarchical namespace under Data Lake Storage Gen2 section to create your ADLS Gen2 storage account.
Alt text
  • Fill all text areas under each tab to configure and create your storage account.

Finding the Account Key

Follow below steps to find the account key for your storage account:
  1. Go to Storage account > Access Key.
Alt text
  1. Click Show for the Key text area.
:::note Copy this key and paste it into the Account Key text area of your Skypoint AI’s Azure Data Lake Storage Gen2 configuration form. :::