Unlock this content

Enter your email to unlock this content for free

By continuing, you agree to our Terms of Service and Privacy Notice, and to receive occasional marketing emails.

Types of Ingestion

TL;DR

There are three common types of ingestion in ClickHouse: streaming (small and frequent inserts), batch/pseudo streaming (big files from cloud storage or HTTP servers), and backfilling (rewriting large amounts of historical data). Each serves different use cases and has specific requirements.

Three Types of Ingestion

1. Streaming

Streaming involves small and frequent inserts from Kafka, HTTP streaming, change data capture, or webhooks. The challenge is part explosion and merge pressure: many small parts are created faster than they can be merged, impacting read query performance.

2. Batch / Pseudo Streaming

Batch ingestion involves files from cloud storage (S3, GCS), HTTP servers, or data lakes. Files can be small or large. The challenges include scale (processing many files), observability (tracking progress and failures), format handling (CSV, JSON, Parquet, etc.), coordination between services, ensuring atomicity, and achieving low latency.

3. Backfilling

Tinybird is not affiliated with, associated with, or sponsored by ClickHouse, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.

Types of Ingestion | ClickHouse for Developers