airflow_dbt_python.hooks

Core dbt hooks

Supported dbt remote hooks

The DbtRemoteHook interface

The DbtRemoteHook interface includes methods for downloading and uploading files.

Internally, DbtRemoteHooks can use Airflow hooks to execute the actual operations.

Currently, only AWS S3 and the local filesystem are supported as remotes.

class airflow_dbt_python.hooks.remote.DbtRemoteHook(context=None)[source]

Represents a dbt project storing any dbt files.

A concrete backend class should implement the push and pull methods to fetch one or more dbt files. Backends can rely on an Airflow connection with a corresponding hook, but this is not enforced.

Delegating the responsibility of dealing with dbt files to backend subclasses allows us to support more backends without changing the DbtHook.

connection_id

An optional Airflow connection. If defined, will be used to instantiate a hook for this backend.

abstract download(source, destination, replace=False, delete_before=False)[source]

Download source URL into local destination URL.

Parameters:
  • source (URL)

  • destination (URL)

  • replace (bool)

  • delete_before (bool)

download_dbt_profiles(source, destination)[source]

Download a dbt profiles.yml file from a given source.

Parameters:
  • source (URL | str | Path) – URLLike pointing to a remote containing a profiles.yml file.

  • destination (URL | str | Path) – URLLike to a directory where the profiles.yml will be stored.

Returns:

The destination Path.

Return type:

Path

download_dbt_project(source, destination)[source]

Download all dbt project files from a given source.

Parameters:
  • source (URL | str | Path) – URLLike to a directory containing a dbt project.

  • destination (URL | str | Path) – URLLike to a directory where the will be stored.

Returns:

The destination Path.

Return type:

Path

abstract upload(source, destination, replace=False, delete_before=False)[source]

Upload a local source URL into destination URL.

Parameters:
  • source (URL)

  • destination (URL)

  • replace (bool)

  • delete_before (bool)

upload_dbt_project(source, destination, replace=False, delete_before=False)[source]

Upload all dbt project files from a given source.

Parameters:
  • source (URL | str | Path) – URLLike to a directory containing a dbt project.

  • destination (URL | str | Path) – URLLike to a directory where the dbt project will be stored.

  • replace (bool) – Flag to indicate whether to replace existing files.

  • delete_before (bool) – Flag to indicate wheter to clear any existing files before uploading the dbt project.

airflow_dbt_python.hooks.remote.get_remote(scheme, conn_id=None)[source]

Get a DbtRemoteHook as long as the scheme is supported.

In the future we should make our hooks discoverable and package ourselves as a proper Airflow providers package.

Parameters:
  • scheme (str)

  • conn_id (str | None)

Return type:

DbtRemoteHook

dbt git remote

dbt localfs remote

A local filesystem remote for dbt.

Intended to be used only when running Airflow with a LocalExceutor.

class airflow_dbt_python.hooks.localfs.DbtLocalFsRemoteHook(fs_conn_id='fs_default')[source]

A concrete dbt remote for a local filesystem.

This remote is intended to be used when running Airflow with a LocalExecutor, and it relies on shutil from the standard library to do all the file manipulation. For these reasons, running multiple concurrent tasks with this remote may lead to race conditions if attempting to push files to the remote.

Parameters:

fs_conn_id (str)

copy(source, destination, replace=False, delete_before=False)[source]

Push all dbt files under the source directory to another local path.

Pushing supports zipped projects: the destination will be used to determine if we are working with a zip file by looking at the file extension.

Parameters:
  • source (URL) – A local file path where to fetch the files to push.

  • destination (URL) – A local path where the file should be copied.

  • replace (bool) – Whether to replace existing files or not.

  • delete_before (bool) – Whether to delete the contents of destination before pushing.

Return type:

None

copy_one(source, destination, replace=False)[source]

Pull many files from local path.

If the file already exists, it will be ignored if replace is False (the default).

Parameters:
  • source (URL) – A local path to a directory containing the files to pull.

  • destination (URL) – A destination path where to pull the file to.

  • replace (bool) – A bool flag to indicate whether to replace existing files.

Return type:

None

download(source, destination, replace=False, delete_before=False)[source]

Implement download method of dbt remote interface.

For a local filesystem, this copies the source directory or file to destination.

Parameters:
  • source (URL)

  • destination (URL)

  • replace (bool)

  • delete_before (bool)

Return type:

None

get_url(url)[source]

Return an url relative to this hook’s basepath.

If the given url is absolute, simply return the url. If it’s none, then return an url made from basepath.

Parameters:

url (URL | None)

Return type:

URL

upload(source, destination, replace=False, delete_before=False)[source]

Implement upload method of dbt remote interface.

For a local filesystem, this copies the source directory or file to destination.

Parameters:
  • source (URL)

  • destination (URL)

  • replace (bool)

  • delete_before (bool)

Return type:

None

airflow_dbt_python.hooks.localfs.py37_copytree(source, destination, replace=True)[source]

A (probably) poor attempt at replicating shutil.copytree for Python 3.7.

shutil.copytree is available in Python 3.7, however it doesn’t have the dirs_exist_ok parameter, and we really need that. If the destination path doesn’t exist, we can use shutil.copytree, however if it does then we need to copy files one by one and make any subdirectories ourselves.

Parameters:
  • source (URL)

  • destination (URL)

  • replace (bool)

dbt S3 remote

An implementation for an S3 remote for dbt.

class airflow_dbt_python.hooks.s3.DbtS3RemoteHook(*args, **kwargs)[source]

A dbt remote implementation for S3.

This concrete remote class implements the DbtRemote interface by using S3 as a storage for uploading and downloading dbt files to and from. The DbtS3RemoteHook subclasses Airflow’s S3Hook to interact with S3. A connection id may be passed to set the connection to use with S3.

download(source, destination, replace=False, delete_before=False)[source]

Download one or more files from a destination URL in S3.

Lists all S3 keys that have source as a prefix to find what to download.

Parameters:
  • source (URL) – An S3 URL to a key prefix containing objects to download.

  • destination (URL) – A destination URL where to download the objects to. The existing sub-directory hierarchy in S3 will be preserved.

  • replace (bool) – Indicates whether to replace existing files when downloading. This flag is kept here to comply with the DbtRemote interface but its ignored as files downloaded from S3 always overwrite local files.

  • delete_before (bool) – Delete destination directory before download.

download_s3_object(s3_object, destination)[source]

Download an S3 object into a local destination.

Parameters:

destination (URL)

Return type:

None

iter_url(source)[source]

Iterate over an S3 key given by a URL.

Parameters:

source (URL)

Return type:

Iterable[URL]

load_file_handle_replace_error(file_url, key, bucket_name=None, replace=False, encrypt=False, gzip=False, acl_policy=None)[source]

Calls S3Hook.load_file but handles ValueError when replacing existing keys.

Will also log a warning whenever attempting to replace an existing key with replace = False.

Returns:

True if no ValueError was raised, False otherwise.

Parameters:
  • file_url (URL | str | Path)

  • key (str)

  • bucket_name (str | None)

  • replace (bool)

  • encrypt (bool)

  • gzip (bool)

  • acl_policy (str | None)

Return type:

bool

upload(source, destination, replace=False, delete_before=False)[source]

Upload one or more files under source URL to S3.

Parameters:
  • source (URL) – A local URL where to fetch the file/s to push.

  • destination (URL) – An S3 URL where the file should be uploaded. The bucket name and key prefix will be extracted by calling S3Hook.parse_s3_url.

  • replace (bool) – Whether to replace existing S3 keys or not.

  • delete_before (bool) – Whether to delete the contents of destination before pushing.

Return type:

None