Internal Architecture
AstraLog is built to solve the fundamental problem of log ingestion at scale: analytical databases are not designed for thousands of concurrent, single-row inserts.
To solve this, AstraLog decouples ingestion from analytical storage using an S3-backed “Shock Absorber” pattern.
1. The Shock Absorber Pattern
The core philosophy of AstraLog is to separate concerns. Ingestion must be fast and highly available. Analytics must be structured and bulk-oriented.
- Ingestion API (Go): Receives incoming HTTP requests. Its only job is to validate the payload and push it into memory as fast as possible.
- S3 Staging: The temporary, highly-durable storage layer. S3 absorbs the volume of the traffic.
- ClickHouse (Analytics): The final destination, optimized for reading massive datasets.
2. Component Workflow
A. The Go Pipeline (Memory Channels)
When the AstraLog server receives a batch of logs, it does not write to the database. Instead, it pushes the payload into a buffered Go Channel (capacity: 10,000).
- This ensures the HTTP request terminates in milliseconds, keeping client applications (your microservices) completely unblocked.
B. Dynamic Batching & GZIP
A background worker continuously reads from the Go Channel. It creates in-memory maps, grouping logs by their project_id and service.
- Once a batch reaches the configured limit (e.g., 100 logs) or the maximum age (e.g., 5 seconds), the worker serializes the batch to JSON, compresses it using GZIP, and streams it directly to S3.
- Why GZIP? It reduces storage and egress network costs by up to 90%.
C. The Background Syncer
A dedicated synchronization routine (Syncer) wakes up periodically (e.g., every 30 seconds). Its job is to move data from the S3 Staging area to ClickHouse.
- List: Scans S3 for new
.json.gzfiles. - Process: Downloads and decompresses the files.
- Insert: Prepares a bulk
INSERTstatement and commits the batch to ClickHouse. - Cleanup: Once ClickHouse confirms the transaction, the Syncer deletes the staged file from S3.
3. Resilience Mechanisms
AstraLog is designed with failure scenarios in mind:
Graceful Shutdown & Zero Data Loss
If the ingestion server receives a SIGTERM (e.g., during a deployment or server restart), it does not crash.
- It stops accepting new HTTP requests.
- It utilizes Go’s
sync.WaitGroupto forcefully flush all pending in-memory batches to S3. - The process only exits when S3 confirms all uploads are complete.
Database Downtime Tolerance
If your ClickHouse cluster goes offline for maintenance or crashes, the ingestion API remains fully operational.
- Incoming logs will safely accumulate in S3.
- When ClickHouse comes back online, the Syncer will automatically process the backlog of staged files. No logs are lost.
4. Multi-Tenancy Architecture
AstraLog is built for SaaS and multi-team environments.
- In S3, files are organized as
bucket/project_id/service/timestamp.json.gz. - In ClickHouse, the table engine (
MergeTree) usesproject_idas the primary sorting key, ensuring that queries filtered by a specific customer or organization are resolved in milliseconds.