Intro to ClickHouse® Internals

This overview covers ClickHouse's core internals. Each concept is covered in detail throughout the course.

Parts

Data in ClickHouse is stored in parts - immutable chunks written to disk during inserts. Each insert creates one or more parts depending on your partition key.

Key points:

Parts are immutable (never modified in place)
More parts = slower queries (>100 parts per partition degrades performance)
Parts are merged together in the background

Partitions

Data is divided into partitions based on a partition key. Partitions enable fast data management and partition pruning.

Key points:

Partition by time only (daily or monthly)
Partition pruning skips irrelevant partitions
Sorting key has bigger impact on query performance than partition key

Merges

ClickHouse continuously merges smaller parts into larger ones in the background.

Key points:

Merges consume CPU, memory, and disk I/O
Creating parts faster than merges can handle causes backlog
Merges compete with queries and ingestion for shared resources

Sorting Keys and Sparse Indexes

The sorting key (ORDER BY) determines how data is physically stored and enables sparse primary indexes for fast data location.

Key points:

Sparse indexes store min/max values every 8192 rows (granularity)
Queries can skip entire granules that don't match
Filtering by columns not in sorting key requires full table scan
Put frequently filtered columns first in sorting key

Replication

In distributed setups, replicas maintain copies of data for high availability. Replication uses ZooKeeper for coordination.

Key points:

Replicas store copies of data on different servers
ZooKeeper coordinates DDL operations and part replication
Replication queue can grow if parts created faster than replicated
Zero-copy replication (not in OSS) replicates metadata only

Key Takeaways

Parts are immutable chunks - Monitor part counts (>100 per partition degrades performance) and understand how merges optimize them.
Partition by time only - Avoid over-partitioning. Partition pruning helps, but sorting key has bigger impact on query performance.
Merges compete for resources - Balance part creation rate with merge capacity to avoid backlog and performance degradation.
Design sorting keys for queries - Put frequently filtered columns first. Sparse indexes enable fast data skipping within partitions.

Learn More

For detailed information, see:

Parts, Partitions, Merges, and Indexes - Deep dive into parts, partitions, merges, and indexes with practical monitoring queries

Intro to ClickHouse^® Internals

Parts

Partitions

Merges

Sorting Keys and Sparse Indexes

Replication

Learn More

Table of Contents

Ship fast over a Managed ClickHouse^®

Parts

Partitions

Merges

Sorting Keys and Sparse Indexes

Replication

Learn More

Table of Contents

Ship fast over a Managed ClickHouse®

Intro to ClickHouse^® Internals

Ship fast over a Managed ClickHouse^®