Working with sharded tables in BigQuery requires careful consideration, especially when using the native BigQuery Data Transfer for Facebook Ads. Sharded tables can be useful when the structure of daily tables changes over time, such as the addition of new columns or changes in data types.
Schema Changes and Their Impact
- Schema Changes: If the schema of the data source changes, the export or data load process continues as usual. A common change is the addition of new columns, like the "collected_traffic_source" array in GA4.
- Data Type Changes: Changes in data types can also occur, as seen with the BigQuery Data Transfer for Facebook on April 22nd, where date columns changed from integers to proper dates.
Problems with Wildcard Queries
- Union Issues: Sharded tables are often read using wildcards (e.g.,
projectname.datasetname.AdInsights_*
). However, if there is a change in data types, BigQuery cannot union the shards before and after the change. - Metadata Usage: BigQuery uses the latest metadata for all shards, which can cause errors if querying old shards with different schemas.
Recommendations
- Documentation Advice: The documentation suggests using partitioned and clustered tables instead of sharded tables to avoid these issues.
Sharded tables offer flexibility but come with challenges, particularly when dealing with schema changes.