DynamoDB Load
    • Dark
      Light

    DynamoDB Load

    • Dark
      Light

    Article Summary

    This article is specific to the following platforms - Redshift.

    DynamoDB Load Component

    Load data into an existing table from objects stored in Amazon DynamoDB. This uses an innate ability of DynamoDB to push to Redshift, unlike the similar DynamoDB Query component that goes via a 3rd party driver. It is generally advised that for pushing full data sets, this component should be used whereas DynamoDB Query should be used for more selective data loads.

    Note: This component requires working AWS Credentials with read access to the DynamoDB tables you want to read from. This is easily achieved by attaching an IAM role to the instance when launching Matillion ETL for Redshift, however it can also be managed manually by editing an Environment.


    Properties

    Redshift Properties

    PropertySettingDescription
    For more information on all the settings in this component, see the Amazon Redshift COPY from Amazon DynamoDB syntax.
    NameTextA human-readable name for the component.
    SchemaSelectSelect the table schema. The special value, [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see this article.
    Target TableSelectSelect an existing table to load data into.
    Load ColumnsSelect MultipleOne or more columns that exist in the target table. Since Redshift will attempt to match the fields found in the DynamoDB records to existing columns names, this parameter exists so that you can ignore target fields, even if the incoming data had populated them.
    DynamoDB TableTextEnter the name of the DynamoDB table to read from.
    Read RatioNumberThe percentage of the DynamoDB table's provisioned throughput to use for the data load. To understand the trade-offs in changing this number, please see the Amazon Redshift documentation.
    Explicit IDsSelectWhether or not to load data from the S3 Objects into an IDENTITY column. See the Redshift documentation for more information.
    RegionSelectThe Amazon S3 region hosting the S3 bucket. This is not normally required and can be left as "None" if the bucket is in the same region as your Redshift cluster.
    Maximum ErrorsTextThe maximum number of individual parsing errors that cause the whole load to fail. Values up to this will be substituted as null values.This value defaults to 0, but the Amazon default is 1000.
    Date FormatTextDefaults to 'auto' - this can be used to manually specify a date format.
    Time FormatTextDefaults to 'auto' - this can be used to manually specify a time format.
    Accept Any DateSelectIf this is enabled, invalid dates such as '45-65-2018' are not considered an error, but will be loaded as the null value.
    Truncate ColumnsSelectIf this is set, any instance of data in the input file that is too long to fit into the specified target column width will be truncated to fit instead of causing an error.
    Trim BlanksSelectRemoves trailing and leading whitespace from the input data.
    Empty As NullSelectIf this is set, empty columns in the input file will become NULL.
    Blanks As NullSelectIf this is set, blank columns in the input file will become NULL.
    Comp UpdateSelectControls whether compression encodings are automatically applied during a COPY. This is usually a good idea to optimise the compression used when storing the data.
    Stat UpdateSelectGoverns automatic computation and refresh of optimizer statistics at the end of a successful COPY command.
    Round DecimalsSelectIf this option is set, round any decimals to fit into the column when the number of decimal places in the input data is larger than defined for the target column.

    What's Next