Unlock this content

Enter your email to unlock this content for free

By continuing, you agree to our Terms of Service and Privacy Notice, and to receive occasional marketing emails.

Intro to Ingestion

TL;DR

ClickHouse can query billions of records in milliseconds, but the challenge is bringing those billions of rows in reliably. While benchmarks highlight query performance, production systems struggle with ingestion. Understanding the mental model, parts, merges, and budget, is essential for successful ingestion.

ClickHouse is designed for bulk inserts, not row-by-row streaming. To understand why ingestion is challenging, you need to understand how ClickHouse handles data internally.

Parts: Physical Chunks Created by Inserts

When you insert data into a MergeTree table, ClickHouse creates parts - immutable chunks of data stored on disk. Each insert operation creates one or more parts depending on your partition key.

CREATE TABLE events ( event_date Date, user_id UInt64, event_type String ) ENGINE = MergeTree()

Tinybird is not affiliated with, associated with, or sponsored by ClickHouse, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.

Intro to Ingestion | ClickHouse for Developers