Google has announced that the BigQuery Data Transfer Service now supports incremental transfers when migrating data from Teradata data warehouses to BigQuery. This feature has reached general availability (GA), offering users a more efficient way to keep their BigQuery datasets updated with changes from their Teradata sources.
Key Features of Incremental Transfers
Initial Transfer
- The first transfer creates a complete table snapshot in BigQuery
Subsequent Transfers
- Follow annotations defined in a custom schema file
- Use timestamps to track and transfer only new or modified data
How Incremental Transfers Work
The incremental transfer process operates on a per-table basis, using the following logic:
Timestamp Tracking
- Each transfer run saves a timestamp
- Subsequent runs use the previous run's timestamp (T1) and the current run's start time (T2)
Table-Specific Behavior
- Tables without a
COMMIT_TIMESTAMP
column are skipped - Tables with only a
COMMIT_TIMESTAMP
column:- Rows with timestamps between T1 and T2 are extracted and appended to the existing BigQuery table
- Tables with both
COMMIT_TIMESTAMP
andPRIMARY_KEY
columns:- Rows with timestamps between T1 and T2 are extracted
- New rows are appended, and modified rows are updated in the existing BigQuery table
- Tables without a
It's important to note that the incremental migration from Teradata does not support syncing deleted rows with BigQuery. Users should be aware of this limitation when planning their data migration strategy.
Benefits for Users
This update offers several advantages for organizations migrating from Teradata to BigQuery:
- Efficient Data Updates: Only new or modified data is transferred, reducing processing time and resource usage
- Reduced Downtime: Incremental transfers allow for more frequent updates with minimal impact on operations
- Flexibility: The custom schema file allows users to define how different tables should be handled during transfers
Conclusion
The addition of incremental transfers to the BigQuery Data Transfer Service for Teradata migration represents a significant improvement in Google's data migration toolset. This feature allows organizations to more easily keep their BigQuery datasets in sync with their Teradata sources, facilitating smoother transitions to cloud-based data warehousing and analytics.