Git Integration with Matillion ETL
    • Dark
      Light

    Git Integration with Matillion ETL

    • Dark
      Light

    Article Summary

    Overview

    This article explores the architecture of Matillion ETL's Git integration feature and documents actions including commit, create branch, merge, push, fetch, and more.


    This article is part of a series of technical documentation covering the Git integration feature within Matillion ETL. Additional documentation includes:

    Git is an Enterprise Mode feature within Matillion ETL. To learn more, read Enterprise Mode.


    Getting started

    When using the Git version control feature in Matillion ETL, it'll help to understand the underlying architecture and concomitant technical terms. There are six components involved:

    1. A Matillion ETL project. The project is the top-level structure containing jobs and other collateral within Matillion ETL. Each project is isolated, and user access can be granted or denied on a per-project basis.
    2. A Matillion ETL version. A project can contain more than one version. When used with Git, think of a version as an independent working area. Each version points to a single Git commit in the local Git repository.
    3. The local Git repository (repo). The local repository stores files on the Matillion ETL instance's filesystem, which is created automatically when a project is Git-enabled.
    4. The remote Git repository (repo). A self-hosted or cloud-hosted Git repository that is external to Matillion ETL, and which you must set up prior to Git-enablement in Matillion ETL. Users can push local repository commits to their remote repository. Users can also fetch newer commits from the remote repository into the local repository.
    5. Commit. A commit is a point-in-time copy of a Matillion ETL version, typically with collateral stored in the version, such as orchestration and transformation jobs.
    6. Branch. A branch is a collection of one or more commits in Git. Typically, a Git project will have a master branch, from which other branches will be created to develop and test code. A branch model typically allows users to develop new code without adding questionable code to the master branch. Once the code has undergone testing, it can be merged safely into the master branch.

    Git integration model

    The above diagram includes an example project, project_Dev. Within this project are three separate versions. Each of these versions might, for example, belong to an individual developer in a development team.

    Shown to the right of the project is the local repository, which contains two branches. There is the master branch (Branch Master), and an additional branch (Branch Feature_1). Within each of these branches are three commits, and the diagram shows via the shorter white arrows which version is pointing at which commit.

    Shown to the right of the local repository is the remote repository, which contains a backup copy of local repository. However, note that the remote repository is missing Commit 3 from Branch Feature_1. This simply means that the local repository's changes require a push to the remote repository, at which point Commit 3 of Branch Feature_1, which is being developed in Version ver_z, will be backed up in the cloud-hosted remote repository.

    Note

    Matillion ETL's Git integration feature does not support multi-factor authentication (MFA) at this time.


    User actions

    The rest of this article documents what actions a user can take when using Git in Matillion ETL.

    For many of the sections below, a fictitious development team's example workflow is referenced in the screenshots, focusing on a master branch and a branch each for a pair of developers, Alice and Bob. master, Alice_Branch and Bob's_Branch also have their own Matillion ETL versions, with Alice and Bob's version serving as independent working areas for their developer work, which, when tested and approved, is merged.


    Initializing Git

    In the top-left of the Matillion ETL user interface, click Project, then navigate down and click Git.

    When performing this action for the first time, you will have two options:

    • Init local repository: Select this option to initialize a local Git repository and connect a Matillion ETL project to Git for the first time.
    • Clone remote repository: Select this option to connect a new Matillion ETL project to an existing remote Git repository, copying the commits and branches from the remote repository into a local repository. See Clone remote repository, below, for how to perform this action.

    For the purposes of this example, we select Init local repository. We click OK to confirm this action and commit the current state of this Matillion ETL project to what will become our master branch.

    Init local repository

    After this, the Git Integration dialog opens. Because this is our first interaction with this screen since we initialized the local repository, we currently only have one commit, labelled "Initial commit". This first commit belongs to our master branch, which is currently our only branch. This dialog also shows Author and Date for the commit. Various actions can be performed in the dialog, described in the following sections of this article.

    Git Integration dialog


    Clone remote repository

    You can clone a repository for use with Matillion ETL using either SSH or HTTPS for the connection. Follow the steps for the chosen method:

    SSH

    1. Click Project and then click Git.
    2. Click Clone remote repository.
    3. In the Clone Repository dialog, paste the SSH URL of your remote repository into the Remote URI field. Matillion ETL will automatically update the dialog to display the appropriate input fields for the SSH connection.
    4. Paste a valid SSH private key into the Private Key field.
    5. Input the passphrase associated with the private key into the Passphrase field.
    6. Select an encryption type from the Encryption Type field.
    7. If you selected KMS as the encryption type, you must also select a KMS Master Key.
    8. Click OK.

    Clone repository using SSH

    1. The Remove existing jobs? dialog will ask whether you are ready to confirm the cloning of your remote repository. If you click Yes, all existing jobs on the Matillion ETL instance will be deleted.

    HTTPS

    1. Click Project and then click Git.
    2. Click Clone remote repository.
    3. In the Clone Repository dialog, paste the HTTPS URL of your remote repository into the Remote URI field. Matillion ETL will automatically update the dialog to display the appropriate input fields for the HTTPS connection.
    4. Input the Username and Password of your repository host account—for example, Bitbucket.
    Note
    • If you're using GitHub, use the Password text field for a GitHub personal access token, as Matillion ETL does not support GitHub passwords. To learn how to create a new token, read Creating a personal access token.
    • If you're using Bitbucket, you must create an app password as the connection password. To create an app password, follow these instructions, taken directly from Bitbucket.org/blog:
    1. From your profile and settings avatar, select Personal settings.
    2. Select App passwords under Access management.
    3. Select Create app password.
    4. Give the app password a name related to the application that will use the password.
    5. Select the specific access and permissions you want to assign to this application password.
    6. Copy the generated password and either record or paste it into the application you want to give access. The password is only displayed this one time.
    1. Select an encryption type from the Encryption Type field.
    2. If you selected KMS as the encryption type, you must also select a KMS Master Key.
    3. Click OK.

    Clone repository using HTTPS

    1. The Remove existing jobs? dialog will ask whether you are ready to confirm the cloning of your remote repository. If you click Yes, all existing jobs on the Matillion ETL instance will be deleted.

    Commit

    A commit captures a snapshot of currently staged changes to the Git integration project.

    The example in this section will add a commit featuring work added to the project by another team member.

    1. Create a new version by clicking Project then Manage Versions to open the Manage Versions dialog, and then clicking the + button. To learn more, read Manage Versions.
    2. Assign the new version a Name and ensure the Lock Version option is not selected. Click OK to create the version.

    Create Version

    1. In the Manage Versions dialog, click the icon in the Switch Version column to switch to the new version. Click OK to confirm.
    2. Perform another commit by clicking Project then Git to open the Git Integration dialog.
    3. In the Git Integration dialog, click the commit button, which is located at the bottom-left of the dialog as shown in the following image. This opens the Commit window.

    Git integration - commit

    1. In the Commit dialog, you can select your branch from the Branch Name drop-down, or create a new branch by typing a new, unique, branch name into the field. In our current example, the only available branch is master so we will enter a new branch name, Alice_Branch. The changes we are going to commit will sit under this branch.
    2. All changes that are available to commit are listed in the Changes box. Use the checkboxes to select any or all of the changes to include in the current commit. In this example there are three changes and we have selected all of them. Note that you can commit changes to version information as well as changes to jobs:

    Commit - select changes

    1. Enter a commit message in Commit Information field, and your username and email address in the fields below it. These are mandatory fields. By default your Matillion ETL username is used, but you can change this to any other name if required. Then click OK to confirm the commit.
    Note

    You can't checkout a commit when a related orchestration or transformation job is presently running in the same version. You must switch to a different version to run the job.

    1. Upon confirmation, you are returned to the Git Integration dialog, and you will see a second commit in your structure, as shown in the below. In our example, the commit is on the Alice_Branch branch. The hollowed-out commit circle, which in this case belongs to our newest commit, highlights the currently active commit. To switch commit, click the arrow to the right of the branch you want to switch to, and click Yes to confirm.

    Commit - commit shown on new branch

    Note

    When you switch commit, Matillion ETL will change the current version's Description value to whatever the value was at the time of the Git commit.


    Create branch

    To create a new branch from the Git Integration dialog:

    1. Click the button indicated in the following image:

    Create branch

    1. Name the new branch, and click OK.

    The following image illustrates all three Git branches in our example Git project (Note: a commit has been made on Bob's_Branch).

    Branch illustration


    Checkout branch

    The hollowed-out circle to the left of the branch indicates the currently active branch. In this example, Alice_Branch is active:

    Currently active branch

    To switch to a different branch, click the Checkout arrow to the right of the branch you want to switch to, and click Yes to confirm.

    Note

    If you create and commit to a new branch, and then checkout the master branch, the version Description is changed to Default Version. You can change this description in the Manage Versions dialog, if required. If you do change it, you can commit the change as illustrated in step 7 of Commit, above.


    Merge

    Matillion ETL allows users to merge branches. When performing a merge, one branch's commit is merged into the current branch. Performing this action creates a new commit, and will switch the current Matillion ETL version to the new commit.

    To learn more about a merge in Git, read git-merge.

    Note

    Normally, you can't merge changes and switch to another branch while a job is still running on the current branch. However, you can change this default behavior if you wish. See Switch commits while a job is running, below, for details.

    1. To begin performing a merge, click the merge symbol next to the branch you want to merge, as indicated in the following image.

    Merge

    1. In the Merge dialog, complete the following fields:
      • Merge to: Select which branch to merge the commit into. You can choose to merge to a branch that is not the currently selected branch.
      • Ours: This field displays the latest commit of the branch to merge into, including commit ID, branch name, author name, author email address, and timestamp. You can't edit this field.
      • Theirs: This field displays the latest commit of the branch to merge from, including commit ID, branch name, author name, author email address, and timestamp. You can't edit this field.
      • Commit Information: Enter a message for the new commit. An automatically generated default message will be displayed, which you can overwrite with a different message if prefered.
      • Username: By default your Matillion ETL username is used, but you can change this to any other name if required.
      • Email: Enter an email address.
      • Checkout After Merge: When selected, Git will perform the switch commit action. This is selected by default.
    2. Click OK to perform the merge.

    Merge dialog

    In the following image, the merged commit shows that the branch has been joined back to the master branch.

    Merge results

    Note

    Remember, a Matillion ETL version points at a specific Git commit. The currently selected branch is determined by which commit the current version is pointing at.


    Configure remote

    Matillion ETL can use a remote Git repository such as GitHub, AWS CodeCommit, or Bitbucket.

    To configure a remote repository:

    1. Click the Configure Remote icon in the bottom-right of the Git Integration dialog, as shown in the image below.

    Configure remote

    1. In the Configure Remote dialog, enter a Remote URI. This can be a https://, ssh://, or git@ format URI.
    2. Click OK.
    3. Click the Configure Default Credentials (padlock) icon in the bottom-right of the Git Integration dialog, as shown in the image below.

    Configure default credentials

    1. The Configure Default Credentials dialog will display different fields depending on the remote URI format. Complete these fields as follows, and then click OK.

    SSH or git@

    • Private Key: Paste a valid SSH private key.
    • Passphrase: Enter your the passphrase associated with the private key.
    • Encryption Type: Select an encryption type, KMS or Encoded.
    • Master Key: Only required if you select the KMS Encryption Type. Select a KMS Master Key from the drop-down list.

    HTTPS

    • Username: The username
    • Password: The password of your repository host account.
    Note

    If you are using GitHub, please use the Password field for a GitHub personal access token, as Matillion ETL does not support GitHub passwords. To learn how to create a new token, read Creating a personal access token.

    • Encryption Type: Select an encryption type, KMS or Encoded.
    • Master Key: Only required if you select the KMS Encryption Type. Select a KMS Master Key from the drop-down list.

    Fetch

    Performing a fetch means to pull in branches from another, in this case remote, repository. Remote repositories are an effective method of having a backup "master copy" of code.

    Note

    A remote repository must be configured before the fetch action can be used.

    To fetch from a remote repository:

    1. Click the icon at the bottom-right of the Git Integration dialog, as illustrated below.

    Fetch

    1. In the Fetch dialog, you can select either or both of the following options. Neither option is selected by default:
    • Remove old refs: Removes remote branches that no longer exist in the remote repository.
    • Thin Fetch: Minimizes the amount of objects to be received, useful for slower connections.
    1. Click OK.

    Push

    Use the push action to send the branches of a local repository to a remote repository. This action pushes all changes.

    Note

    A remote repository must be configured before the push action can be used.

    To push to a remote repository:

    1. Click the Push All icon at the bottom-right of the Git Integration dialog, as illustrated below.

    Push

    1. In the Push All dialog, select the type of push to perform:
    • Atomic Push: Guarantees that either all references will be pushed to the remote, or none of them will; this option avoids partial pushes.
    • Force Push: Forces the local revision to be pushed into the remote repository. This action can cause the remote repository to lose commits, and should be used with caution.
    • Thin Push: Reduces the data sent when the sender and receiver share many of the same objects. This option is selected by default.
    1. Click OK.

    Local and remote branch divergence

    A divergence occurs when the remote branch has commits that the local branch doesn't have, and the local branch has commits that the remote branch doesn't have.

    This can happen in Matillion ETL's Git integration when performing merges outside Matillion ETL in the remote repository (this action causes a commit to exist remotely but not locally) while simultaneously working using the local branch.

    An example of how a divergence looks in the Git Integration dialog is illustrated below:

    Local and remote branch divergence

    Key:

    ColorDefinition
    Green branchA local branch.
    Blue branchA remote branch.
    Pink branchA branch that includes both local and remote branches pointing at the same commits.

    In the above image, the blue master branch (remote) has a commit that the green master branch (local) does not have, and vice versa.

    To fix this divergence, merge the blue branch into the green branch. This will create a pink branch. A pink branch includes both remote and local branches pointing at the same commits.

    For best practice, we advise you to ensure only pink branches exist where possible, ensuring that remote and local branches are synced.

    Warning

    If the blue branch (remote) contains commits that the green branch (local) does not, you stand to lose work when pushing the green branch, as work may be overwritten.


    Resolving a merge conflict

    When performing a merge, Matillion ETL will return an error message if a merge conflict is found. Merge conflicts are a common aspect of using a version control system such as Git, and are often easy to resolve. In Matillion ETL, a merge conflict arises when there is a conflict between the current local branch and the branch being merged.

    Read about MergeManager to learn how to efficiently resolve merge conflicts in Matillion ETL.


    SSH authentication

    Matillion ETL's Git source control management feature currently supports the following private key formats:

    • DSA
    • RSA
    • ECDSA
    • Ed25519
    Note

    From version 1.55, Matillion ETL supports Ed25519 for SSH key generation.

    Private keys of an OpenSSH format aren't currently supported, and will produce an error message when used as a private key for performing a push to a remote repository.

    However, you can convert your OpenSSH format private key to a supported key format using this command:

    ssh-keygen -p -f YOUR_PRIVATE_KEY -m pem
    

    Disconnecting Git from a Matillion ETL project

    1. SSH onto your Matillion ETL instance using the private key with the CentOS username and the Matillion ETL instance's IP address or DNS name.
    2. Stop Tomcat using the command sudo service tomcat stop.
    3. Remove records from the SCM-related tables in the PostgreSQL repo using the following commands:
    sudo su postgres -
    psql
    
    select id, rec::json->'name' as projectname
    from projects
    

    This will return the project_id to use in the following steps.

    DELETE FROM scm_filesync where namespace = project_id;
    

    Ctrl-d to exit psql.
    Ctrl-d to go back to the CentOS user.

    1. Remove any subdirectories under the /usr/share/tomcat/scm directory using the following command: sudo rm -rf /usr/share/tomcat/scm/project_id.
    2. Restart Tomcat: sudo service tomcat start.
    3. Wait for approximately 60 seconds and then log back in to Matillion ETL.

    You should now have the option to re-initialize your local repository and connect to a remote repository when you click ProjectGit.


    Switch commits while a job is running

    By default, Matillion ETL will prevent you from merging changes and switching to another branch while a job is still running on the current branch.

    However, in a busy production environment, you may find that this behavior interferes with your work schedule and you need to switch commits before waiting for jobs to complete. If this issue affects you, you can configure Matillion ETL's behavior to allow you to switch commit while jobs are still running.

    To change the default behavior:

    1. SSH to your Matillion ETL instance and locate and edit the Emerald.properties file:
    /usr/share/emerald/WEB-INF/classes/Emerald.properties
    
    1. Add the following line to the file:
    ALWAYS_ALLOW_REPO_UPDATE=true
    
    1. Save the file.
    2. Restart Tomcat using the following command:
    sudo service tomcat restart
    

    With this behavior enabled, a "snapshot" version of the current branch is created whenever you switch commits with jobs still running. The snapshot is a temporary version, allowing all currently running jobs to continue running to completion. You can verify that this snapshot has been created by looking in the Manage Versions dialog. You can't switch to this snapshot version, and it will be automatically deleted 60 seconds after all jobs running in it have completed.


    Contact support

    If you have any questions about using the Matillion ETL Git feature, read Git Integration: Frequently Asked Questions, or contact support.