Unlock this content

Enter your email to unlock this content for free

By continuing, you agree to our Terms of Service and Privacy Notice, and to receive occasional marketing emails.

Backfilling

TL;DR

Backfilling is the hardest ingestion operation in ClickHouse. There's no built-in way to rewrite large-scale historical data plus all downstream tables. You need compute separation, parallelization, progress tracking, atomicity, backpressure, queue management, and coordination with live ingestion.

Why Simple Approaches Don't Work

A simple INSERT INTO new_table SELECT * FROM old_table doesn't work. It may cause memory issues, block live queries by competing for resources (CPU, memory, disk I/O), has no atomicity guarantee, poor progress tracking, and risks data loss or duplication.


What You Need for Backfilling

Backfilling requires seven capabilities:

  • Compute separation: Separate from live queries
  • Parallelization: Split work across workers

Tinybird is not affiliated with, associated with, or sponsored by ClickHouse, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.

Backfilling | ClickHouse for Developers