Google Cloud has introduced BigQuery metastore, a fully managed metastore service that provides unified metadata management for data analytics products. Currently in preview, this new offering enables users to access and manage metadata from various processing engines, including BigQuery and Apache Spark, while supporting both BigQuery tables and open formats like Apache Iceberg.
Key Benefits
The serverless architecture of BigQuery metastore eliminates infrastructure management needs, reducing operational overhead and enabling automatic scaling based on demand. The system offers seamless engine interoperability, allowing users to directly access tables in BigQuery without additional configuration requirements.
A significant advantage is its unified user experience across BigQuery and BigQuery Studio. Users can create tables in Spark using a BigQuery Studio notebook and immediately query them through the Google Cloud console, streamlining the analytics workflow.
Technical Specifications and Integration Support
BigQuery metastore integrates with multiple platforms and versions:
- Supports Apache Iceberg 1.5.2 or later
- Compatible with Dataproc version 2.2 or later
- Works with Spark version 3.3 or later
- Includes BigQuery metastore Iceberg catalog plugin
Comparison with BigLake Metastore
As Google Cloud's recommended metastore solution, BigQuery metastore offers distinct advantages over BigLake Metastore. While BigLake Metastore operates as a standalone service supporting only Iceberg tables, BigQuery metastore integrates directly with BigQuery's catalog system. This integration ensures a single source of truth for metadata, enabling tables to be modified through multiple open source engines while maintaining direct query access through BigQuery.
The seamless Spark integration demonstrates BigQuery metastore's efficiency in reducing metadata storage redundancy and streamlining job execution processes.