Azure Load Delta Lake
    • Dark
      Light

    Azure Load Delta Lake

    • Dark
      Light

    Article Summary

    This article is specific to the following platforms - Delta Lake.

    Azure Blob Storage Load

    The Azure Blob Storage Load component lets users load data into an existing table from objects stored in Azure Blob Storage.

    Azure Blob Storage is used for storing large amounts of unstructured object data, for example as text or binary data.

    To learn more, read Blob storage.

    Properties

    Delta Lake Properties

    PropertySettingDescription
    NameStringA human-readable name for the component.
    Storage AccountSelectSelect an Azure Blob Storage account. An Azure storage account contains all of your Azure Storage data objects: blobs, files, queues, tables, and disks. For more information, read Storage account overview.
    Blob ContainerSelectA Blob Storage location. The available blob containers will depend on the selected storage account.
    PatternStringA string that will partially match all filenames that are to be included in the load. Defaults to .* indicating all files within the Azure Storage Location.
    CatalogSelectSelect a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.
    DatabaseSelectSelect the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.
    Target TableSelectSelect the table into which data will be loaded from Azure Blob storage.
    Load ColumnsColumn SelectSelect which of the target table's columns to load. Move columns to the right using the arrow buttons to include them in the load. Columns on the left will be excluded from the load.
    Recursive File LookupBooleanWhen enabled, disables partition inference. To control which files are loaded use the "pattern" property instead.
    File TypeSelectSelect the file type. Available types include AVRO, CSV, JSON, and PARQUET. Below properties will change to reflect the selected file type.
    Skip HeaderBoolean(CSV only) When True, uses the first line as names of columns. Default is False.
    Field DelimiterDelimiting Character(CSV only) Specify a delimiter to separate columns. The default is a comma ,.
    A TAB character can be specified as "\ ".
    Date FormatString(CSV & JSON only) Manually set a date format. If none is set, the default is yyyy-MM-dd.
    Timestamp FormatString(CSV & JSON only) Manually set a timestamp format. If none is set, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX].
    Encoding TypeString(CSV & JSON only) Decodes the CSV files via the given encoding type. If none is set, the default is UTF-8.
    ModeSelectSelect the mode for handling corrupted records during parsing.
    DROPMALFORMED: ignores corrupted records.
    FAILFAST: throws an exception when it meets corrupted records.
    PERMISSIVE: when a corrupted record is met, the malformed string is placed into a field configured by columnNameOfCorruptRecord, and the malformed field is set to null. This is the default setting.
    Ignore Leading White SpaceBoolean(CSV only) When True, skips any leading whitespaces. Default is False.
    Ignore Trailing White SpaceBoolean(CSV only) When True, skips any trailing whitespaces. Default is False.
    Infer SchemaBoolean(CSV only) When True, infers the input schema automatically from the data. Default is False.
    Multi LineBooleanWhen True, parses records, which may span multiple lines. Default is False.
    Null ValueString(CSV only) Sets the string representation of a null value. The default value is an empty string.
    Empty ValueString(CSV only) Sets the string representation of an empty value. The default value is an empty string.
    Primitive as StringBoolean(JSON only) When True, primitive data types are inferred as strings. Default is False.
    Prefers DecimalBoolean(JSON only) When True, infers all floating-point values as a decimal type. If the values do not fit in decimal, then they are inferred as doubles. Default is False.
    Allow CommentsBoolean(JSON only) When True, allows JAVA/C++ comments in JSON records. Default is False.
    Allow Unquoted Field NamesBoolean(JSON only) When True, allows unquoted JSON field names. Default is False.
    Allow Single QuotesBoolean(JSON only) When True, allows single quotes in addition to double quotes. Default is True.
    Allow Numeric Leading ZerosBoolean(JSON only) When True, allows leading zeros in numbers, e.g. 00019. Default is False.
    Allow Backslash Escaping Any CharacterBoolean(JSON only) When True, allows accepting the quoting of all characters using the backslash quoting mechanism \\. Default is False.
    Allow Unquoted Control CharsBoolean(JSON only) When True, allows JSON strings to include unquoted control characters (ASCII characters where their value is less than 32, including Tab and line feed characters). Default is False.
    Drop Field If All NullBoolean(JSON only) When True, ignores column of all null values or empty arrays/structs during the schema inference. Default is False.
    Merge SchemaBoolean(AVRO, PARQUET only) When True, merges schemata from all Parquet part-files. Default is False.
    Path Glob FilterStringAn optional glob pattern, used to only include files with paths matching the pattern.
    Force LoadBooleanWhen True, idempotency is disabled and files are loaded regardless of whether they have been loaded before. Default is False.