Azure Load Delta Lake

Dark
Light

Article Summary

Share feedback

Thanks for sharing your feedback!

This article is specific to the following platforms - Delta Lake.

Azure Blob Storage Load

The Azure Blob Storage Load component lets users load data into an existing table from objects stored in Azure Blob Storage.

Azure Blob Storage is used for storing large amounts of unstructured object data, for example as text or binary data.

To learn more, read Blob storage.

Properties

Delta Lake Properties
Property	Setting	Description
Name	String	A human-readable name for the component.
Storage Account	Select	Select an Azure Blob Storage account. An Azure storage account contains all of your Azure Storage data objects: blobs, files, queues, tables, and disks. For more information, read Storage account overview.
Blob Container	Select	A Blob Storage location. The available blob containers will depend on the selected storage account.
Pattern	String	A string that will partially match all filenames that are to be included in the load. Defaults to .* indicating all files within the Azure Storage Location.
Catalog	Select	Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.
Database	Select	Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.
Target Table	Select	Select the table into which data will be loaded from Azure Blob storage.
Load Columns	Column Select	Select which of the target table's columns to load. Move columns to the right using the arrow buttons to include them in the load. Columns on the left will be excluded from the load.
Recursive File Lookup	Boolean	When enabled, disables partition inference. To control which files are loaded use the "pattern" property instead.
File Type	Select	Select the file type. Available types include AVRO, CSV, JSON, and PARQUET. Below properties will change to reflect the selected file type.
Skip Header	Boolean	(CSV only) When True, uses the first line as names of columns. Default is False.
Field Delimiter	Delimiting Character	(CSV only) Specify a delimiter to separate columns. The default is a comma ,. A TAB character can be specified as "\ ".
Date Format	String	(CSV & JSON only) Manually set a date format. If none is set, the default is `yyyy-MM-dd`.
Timestamp Format	String	(CSV & JSON only) Manually set a timestamp format. If none is set, the default is `yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX]`.
Encoding Type	String	(CSV & JSON only) Decodes the CSV files via the given encoding type. If none is set, the default is `UTF-8`.
Mode	Select	Select the mode for handling corrupted records during parsing. DROPMALFORMED: ignores corrupted records. FAILFAST: throws an exception when it meets corrupted records. PERMISSIVE: when a corrupted record is met, the malformed string is placed into a field configured by `columnNameOfCorruptRecord`, and the malformed field is set to null. This is the default setting.
Ignore Leading White Space	Boolean	(CSV only) When True, skips any leading whitespaces. Default is False.
Ignore Trailing White Space	Boolean	(CSV only) When True, skips any trailing whitespaces. Default is False.
Infer Schema	Boolean	(CSV only) When True, infers the input schema automatically from the data. Default is False.
Multi Line	Boolean	When True, parses records, which may span multiple lines. Default is False.
Null Value	String	(CSV only) Sets the string representation of a null value. The default value is an empty string.
Empty Value	String	(CSV only) Sets the string representation of an empty value. The default value is an empty string.
Primitive as String	Boolean	(JSON only) When True, primitive data types are inferred as strings. Default is False.
Prefers Decimal	Boolean	(JSON only) When True, infers all floating-point values as a decimal type. If the values do not fit in decimal, then they are inferred as doubles. Default is False.
Allow Comments	Boolean	(JSON only) When True, allows JAVA/C++ comments in JSON records. Default is False.
Allow Unquoted Field Names	Boolean	(JSON only) When True, allows unquoted JSON field names. Default is False.
Allow Single Quotes	Boolean	(JSON only) When True, allows single quotes in addition to double quotes. Default is True.
Allow Numeric Leading Zeros	Boolean	(JSON only) When True, allows leading zeros in numbers, e.g. `00019`. Default is False.
Allow Backslash Escaping Any Character	Boolean	(JSON only) When True, allows accepting the quoting of all characters using the backslash quoting mechanism \\. Default is False.
Allow Unquoted Control Chars	Boolean	(JSON only) When True, allows JSON strings to include unquoted control characters (ASCII characters where their value is less than 32, including Tab and line feed characters). Default is False.
Drop Field If All Null	Boolean	(JSON only) When True, ignores column of all null values or empty arrays/structs during the schema inference. Default is False.
Merge Schema	Boolean	(AVRO, PARQUET only) When True, merges schemata from all Parquet part-files. Default is False.
Path Glob Filter	String	An optional glob pattern, used to only include files with paths matching the pattern.
Force Load	Boolean	When True, idempotency is disabled and files are loaded regardless of whether they have been loaded before. Default is False.

What's Next

Azure Load Snowflake

Table of contents

Azure Blob Storage Load
Properties
Delta Lake Properties

Azure Load Delta Lake

Azure Blob Storage Load

Properties

Delta Lake Properties

What's Next