
Unlock this content
Enter your email to unlock this content for free
Backfilling
Backfilling is the hardest ingestion operation in ClickHouse. There's no built-in way to rewrite large-scale historical data plus all downstream tables. You need compute separation, parallelization, progress tracking, atomicity, backpressure, queue management, and coordination with live ingestion.
Why Simple Approaches Don't Work
A simple INSERT INTO new_table SELECT * FROM old_table doesn't work. It may cause memory issues, block live queries by competing for resources (CPU, memory, disk I/O), has no atomicity guarantee, poor progress tracking, and risks data loss or duplication.
What You Need for Backfilling
Backfilling requires seven capabilities:
- Compute separation: Separate from live queries
- Parallelization: Split work across workers