In an actual project, a senior Big Data Engineer at Tikal reduced BigQuery storage costs by 90% by changing a default configuration.
The key points include:
- Default Configurations: Tools often come with default settings that may not be optimal for specific workflows.
- BigQuery Time Travel: This feature allows retrieving older data states but increases storage costs.
- Billing Models:
- Physical Bytes: Includes time travel and fail-safe storage.
- Logical Bytes: Excludes time travel and fail-safe storage.
Configuration Changes:
- Set billing mode to physical bytes:
ALTER SCHEMA my_landing_zone SET OPTIONS(storage_billing_model = 'PHYSICAL');
- Reduce time travel window to 48 hours:
ALTER SCHEMA my_landing_zone SET OPTIONS(max_time_travel_hours = 48);
Assessment Steps:
- Cost Savings: Use SQL queries to forecast costs and compare different billing models.
- Usage Needs: Determine if time travel is necessary for each dataset.
- Risks: Weigh the trade-offs between upfront storage costs and re-ingestion costs.
- Re-ingest Costs: Utilize free batch ingestion where possible.
- Re-ingest Complexity: Design pipelines for idempotent backfilling to simplify re-ingestion.
Conclusion:
- Read Documentation: Understand vendor pricing structures.
- Design for Cost Efficiency: Implement changes that balance cost savings with operational needs.
These adjustments resulted in significant cost savings while maintaining data integrity and operational efficiency.