

Your First ClickHouse® Table
TL;DR
Every ClickHouse table needs an ENGINE (usually MergeTree) and an ORDER BY clause. Start simple, then optimize based on your query patterns.
Creating Your First Table
Here's a simple events table:
CREATE TABLE events (
event_time DateTime,
user_id UInt32,
event_type String,
page_url String,
session_id String
) ENGINE = MergeTree()
ORDER BY (event_time, user_id);You need three things: columns for your data, ENGINE = MergeTree() for analytics, and ORDER BY for your sorting key.
Every ClickHouse table needs an ENGINE (usually MergeTree) and an ORDER BY clause. The sorting key is critical for query performance.
Inserting Data
Bulk inserts are way faster than row-by-row:
-- Single row
INSERT INTO events VALUES ('2024-01-01 10:00:00', 12345, 'click', '/products', 'sess_abc');
-- Bulk insert (preferred)
INSERT INTO events VALUES
('2024-01-01 10:00:00', 12345, 'click', '/products', 'sess_abc'),
('2024-01-01 10:00:01', 12346, 'view', '/home', 'sess_def'),
('2024-01-01 10:00:02', 12345, 'click', '/cart', 'sess_abc');
-- Insert from SELECT (for large datasets)
INSERT INTO events
SELECT
now() AS timestamp,
number AS user_id,
'event_' || toString(number % 10) AS event_type,
'/page' || toString(number) AS page,
'session_' || toString(number) AS session_id
FROM numbers(1_000_000);Tip
Aim for batch sizes of 100K-1M rows. Row-by-row inserts are slow.
Querying Data
ClickHouse is built for analytical queries:
-- Basic query with filtering
SELECT * FROM events WHERE event_time > '2024-01-01' LIMIT 10;
-- Aggregations (ClickHouse's strength)
SELECT
event_type,
count() as total_events,
uniq(user_id) as unique_users
FROM events
WHERE event_time >= today() - INTERVAL 7 DAY
GROUP BY event_type
ORDER BY total_events DESC;Time-based Queries
SELECT
toStartOfHour(event_time) as hour,
count() as events_per_hour
FROM events
WHERE event_time >= today()
GROUP BY hour
ORDER BY hour;ClickHouse excels at analytical queries: aggregations, time-based analysis, and column pruning. Use GROUP BY, time functions like toStartOfHour(), and filter by your sorting key columns for best performance.
Updating and Deleting
Updates and deletes work differently here. They require mutations, which rewrite entire parts:
-- Expensive: Rewrites entire parts
ALTER TABLE events UPDATE page_url = '/new-url' WHERE user_id = 12345;
ALTER TABLE events DELETE WHERE event_time < '2024-01-01';-- Update and Delete generate table mutations (part rewrites)
SELECT
database,
`table`,
command
FROM system.mutations
┌─database─┬─table──┬─command──────────────────────────────────────────────┐
1. │ default │ events │ (UPDATE page_url = '/new-url' WHERE user_id = 12345) │
2. │ default │ events │ (DELETE WHERE event_time < '2024-01-01') │
└──────────┴────────┴──────────────────────────────────────────────────────┘Use TTL for automatic data expiration instead:
CREATE TABLE events (
event_time DateTime,
user_id UInt32,
event_type String
) ENGINE = MergeTree()
ORDER BY (event_time, user_id)
TTL event_time + INTERVAL 90 DAY; -- Auto-delete after 90 daysWarning
Mutations rewrite entire parts and are expensive. Use TTL for automatic data expiration instead of manual deletes.
Tip
Learn more about how parts, merges, and mutations work in Intro to Ingestion.
Key Takeaways
- Every ClickHouse table needs ENGINE and ORDER BY - Use MergeTree engine for analytics workloads and define ORDER BY based on your query patterns. The sorting key is critical for query performance and compression.
- ClickHouse excels at bulk operations - Insert millions of rows at once (100K-1M batch sizes), use aggregations and time-based queries, and leverage column pruning. Avoid row-by-row operations.
- Use TTL instead of mutations - Automatic data expiration with TTL is much more efficient than manual deletes or updates. Mutations rewrite entire parts and should be avoided for frequent operations.
Learn More
- MergeTree Engine - MergeTree engine documentation
- ClickHouse Engines Deep Dive - Learn about MergeTree and other engine families
- HTTP Streaming - Events API and ingestion patterns