Skip to main content

ClickHouse setup

Some core functionality may be limited. If you're interested in contributing, check out the source code for each repository listed below.

  • Maintained by: Community
  • Authors: Geoff Genz & Bentsi Leviav
  • GitHub repo: ClickHouse/dbt-clickhouse
  • PyPI package: dbt-clickhouse
  • Slack channel: #db-clickhouse
  • Supported dbt Core version: v0.19.0 and newer
  • dbt Cloud support: Not Supported
  • Minimum data platform version: ?

Installing dbt-clickhouse

Use pip to install the adapter. Before 1.8, installing the adapter would automatically install dbt-core and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install dbt-core. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation:

Configuring dbt-clickhouse

For Clickhouse-specific configuration, please refer to Clickhouse configs.

Connecting to ClickHouse with dbt-clickhouse

To connect to ClickHouse from dbt, you'll need to add a profile to your profiles.yml file. A ClickHouse profile conforms to the following syntax:

profiles.yml
<profile-name>:
target: <target-name>
outputs:
<target-name>:
type: clickhouse
schema: [ default ] # ClickHouse database for dbt models

# optional
driver: [ http ] # http or native. If not configured, this will be auto-determined based on the port setting
host: [ localhost ]
port: [ 8123 ] # Defaults to 8123, 8443, 9000, 9440 depending on the secure and driver settings
user: [ default ] # User for all database operations
password: [ <empty string> ] # Password for the user
cluster: [ <empty string> ] # If configured, certain DDL/table operations will be executed with the `ON CLUSTER` clause using this cluster. Distributed materializations require this setting to work. See the following ClickHouse Cluster section for more details.
verify: [ True ] # Validate TLS certificate if using TLS/SSL
secure: [ False ] # Use TLS (native protocol) or HTTPS (http protocol)
retries: [ 1 ] # Number of times to retry a "retriable" database exception (such as a 503 'Service Unavailable' error)
compression: [ <empty string> ] # Use gzip compression if truthy (http), or compression type for a native connection
connect_timeout: [ 10 ] # Timeout in seconds to establish a connection to ClickHouse
send_receive_timeout: [ 300 ] # Timeout in seconds to receive data from the ClickHouse server
cluster_mode: [ False ] # Use specific settings designed to improve operation on Replicated databases (recommended for ClickHouse Cloud)
use_lw_deletes: [ False ] # Use the strategy `delete+insert` as the default incremental strategy.
check_exchange: [ True ] # Validate that clickhouse support the atomic EXCHANGE TABLES command. (Not needed for most ClickHouse versions)
local_suffix: [ _local ] # Table suffix of local tables on shards for distributed materializations.
local_db_prefix: [ <empty string> ] # Database prefix of local tables on shards for distributed materializations. If empty, it uses the same database as the distributed table.
allow_automatic_deduplication: [ False ] # Enable ClickHouse automatic deduplication for Replicated tables
tcp_keepalive: [ False ] # Native client only, specify TCP keepalive configuration. Specify custom keepalive settings as [idle_time_sec, interval_sec, probes].
custom_settings: [ { } ] # A dictionary/mapping of custom ClickHouse settings for the connection - default is empty.

# Native (clickhouse-driver) connection settings
sync_request_timeout: [ 5 ] # Timeout for server ping
compress_block_size: [ 1048576 ] # Compression block size if compression is enabled


Description of ClickHouse Profile Fields

FieldDescription
typeThis must be included either in profiles.yml or in the dbt_project.yml file. Must be set to clickhouse.
schemaRequired. A ClickHouse's database name. The dbt model database.schema.table is not compatible with ClickHouse because ClickHouse does not support a schema. So we use a simple model schema.table, where schema is the ClickHouse's database. We don't recommend using the default database.
driverOptional. The ClickHouse client interface, http or native. Defaults to http unless the port is set to 9440 or 9400, in which case the native driver is assumed.
hostOptional. The host name of the connection. Default is localhost.
portOptional. ClickHouse server port number. Defaults to 8123/8443 (secure) if the driver is http, and to 9000/9440(secure) if the driver is native.
userRequired. A ClickHouse username with adequate permissions to access the specified schema.
passwordRequired. The password associated with the specified user.
clusterOptional. If set, certain DDL/table operations will be executed with the ON CLUSTER clause using this cluster. Distributed materializations require this setting to work. See the following ClickHouse Cluster section for more details.
verifyOptional. For (secure=True) connections, validate the ClickHouse server TLS certificate, including matching hostname, expiration, and signed by a trusted Certificate Authority. Defaults to True.
secureOptional. Whether the connection (either http or native) is secured by TLS. This converts an http driver connection to https, and a native driver connection to the native ClickHouse protocol over TLS. the Defaults to False.
retriesOptional. Number of times to retry the initial connection attempt if the error appears to be recoverable.
compressionOptional. Use compression in the connection. Defaults to False. If set to True for HTTP, this enables gzip compression. If set to True for the native protocol, this enabled lz4 compression. Other valid values are lz4hc and zstd for the native driver only.
connect_timeoutOptional. Connection timeout in seconds. Defaults is 10 seconds.
send_receive_timeoutOptional. Timeout for receiving data from or sending data to ClickHouse. Defaults to 5 minutes (300 seconds)
cluster_modeOptional. Add connection settings to improve compatibility with clusters using the Replicated Database Engine. Default False.
use_lw_deletesOptional. If ClickHouse experimental lightweight deletes are available, use the delete+insert strategy as the default strategy for incremental materializations. Defaults to False (use legacy strategy).
check_exchangeOptional. On connecting to the ClickHouse, if this is parameter is True DBT will validate that the ClickHouse server supports atomic exchange of tables. Using atomic exchange (when available) improves reliability and parallelism. This check is unnecessary for ClickHouse running on recent Linux operating system, and in those circumstances can be disabled by setting check_exchange to False to avoid additional overhead on startup. Defaults to True.
local_suffixOptional. Table suffix of local tables on shards for distributed materializations. Defaults to '_local'.
local_db_prefixOptional. Database prefix of local tables on shards for distributed materializations. If empty, it uses the same database as the distributed table. Defaults to empty string.
allow_automatic_deduplicationOptional. Enable ClickHouse automatic deduplication for Replicated tables. Defaults to False.
tcp_keepaliveOptional. Native client only, specify TCP keepalive configuration. Specify custom keepalive settings as idle_time_sec, interval_sec, probes. Defaults to False.
sync_request_timeoutOptional. Timeout for connection ping request (native connection only). Defaults to 5 seconds.
compress_block_sizeOptional. Compression block size (in bytes) when using compression with the native driver. Defaults to 1MB
database_engineOptional. Database engine to use when creating new ClickHouse schemas (databases). If not set (the default), new databases will use the default ClickHouse database engine (usually Atomic).
custom_settingsOptional. A mapping of ClickHouse specific user settings to use with the connection. See the ClickHouse documentation for supported settings.

Troubleshooting Connections

If you encounter issues connecting to ClickHouse from dbt, make sure the following criteria are met:

  • The engine must be one of the supported engines.
  • You must have adequate permissions to access the database.
  • If you're not using the default table engine for the database, you must specify a table engine in your model configuration.
0