
Unlock this content
Enter your email to unlock this content for free
Intro to Ingestion
ClickHouse can query billions of records in milliseconds, but the challenge is bringing those billions of rows in reliably. While benchmarks highlight query performance, production systems struggle with ingestion. Understanding the mental model, parts, merges, and budget, is essential for successful ingestion.
ClickHouse is designed for bulk inserts, not row-by-row streaming. To understand why ingestion is challenging, you need to understand how ClickHouse handles data internally.
Parts: Physical Chunks Created by Inserts
When you insert data into a MergeTree table, ClickHouse creates parts - immutable chunks of data stored on disk. Each insert operation creates one or more parts depending on your partition key.
CREATE TABLE events (
event_date Date,
user_id UInt64,
event_type String
) ENGINE = MergeTree()