BigQuery

Google Launches BigQuery Metastore for Unified Metadata Management

January 23, 2025 at 5:52:19 AM

TL;DR Google has launched BigQuery metastore, a fully managed service for unified metadata management in data analytics. Currently in preview, it allows users to access metadata from BigQuery and Apache Spark, supporting formats like Apache Iceberg. Its serverless architecture reduces operational overhead and enables automatic scaling. Users can create and query tables seamlessly across platforms.

Google Launches BigQuery Metastore for Unified Metadata Management

Google Cloud has introduced BigQuery metastore, a fully managed metastore service that provides unified metadata management for data analytics products. Currently in preview, this new offering enables users to access and manage metadata from various processing engines, including BigQuery and Apache Spark, while supporting both BigQuery tables and open formats like Apache Iceberg.

Key Benefits

The serverless architecture of BigQuery metastore eliminates infrastructure management needs, reducing operational overhead and enabling automatic scaling based on demand. The system offers seamless engine interoperability, allowing users to directly access tables in BigQuery without additional configuration requirements.

A significant advantage is its unified user experience across BigQuery and BigQuery Studio. Users can create tables in Spark using a BigQuery Studio notebook and immediately query them through the Google Cloud console, streamlining the analytics workflow.

Technical Specifications and Integration Support

BigQuery metastore integrates with multiple platforms and versions:

Supports Apache Iceberg 1.5.2 or later
Compatible with Dataproc version 2.2 or later
Works with Spark version 3.3 or later
Includes BigQuery metastore Iceberg catalog plugin

Comparison with BigLake Metastore

As Google Cloud's recommended metastore solution, BigQuery metastore offers distinct advantages over BigLake Metastore. While BigLake Metastore operates as a standalone service supporting only Iceberg tables, BigQuery metastore integrates directly with BigQuery's catalog system. This integration ensures a single source of truth for metadata, enabling tables to be modified through multiple open source engines while maintaining direct query access through BigQuery.

The seamless Spark integration demonstrates BigQuery metastore's efficiency in reducing metadata storage redundancy and streamlining job execution processes.