
Unlock this content
Enter your email to unlock this content for free
Types of Ingestion
There are three common types of ingestion in ClickHouse: streaming (small and frequent inserts), batch/pseudo streaming (big files from cloud storage or HTTP servers), and backfilling (rewriting large amounts of historical data). Each serves different use cases and has specific requirements.
Three Types of Ingestion
1. Streaming
Streaming involves small and frequent inserts from Kafka, HTTP streaming, change data capture, or webhooks. The challenge is part explosion and merge pressure: many small parts are created faster than they can be merged, impacting read query performance.
2. Batch / Pseudo Streaming
Batch ingestion involves files from cloud storage (S3, GCS), HTTP servers, or data lakes. Files can be small or large. The challenges include scale (processing many files), observability (tracking progress and failures), format handling (CSV, JSON, Parquet, etc.), coordination between services, ensuring atomicity, and achieving low latency.