BigQuery Introduces AI-Augmented Data Preparation with Gemini

October 25, 2024 at 5:58:58 AM

TL;DR BigQuery data preparation uses AI to clean, transform, and enrich data, reducing manual effort. Dataform supports CI/CD processes. Users need IAM roles. Managed in BigQuery Studio, Gemini provides context-aware suggestions. Views include data, graph, and schema views. Write modes: full refresh, append, and incremental. Supported steps: source, transformation, filter, validation, join, and delete columns.

BigQuery Introduces AI-Augmented Data Preparation with Gemini

AI-augmented data preparation in BigQuery, powered by Gemini, offers intelligent suggestions for cleaning, transforming, and enriching data, significantly reducing manual effort. Dataform orchestrates these preparations, supporting CI/CD processes for collaboration.

Benefits

  • Time Reduction: Context-aware, Gemini-generated transformation suggestions.
  • Data Quality: Automated schema mapping and data quality cleanup.
  • Collaboration: CI/CD support for code reviews and source control.

Users and Dataform service accounts need specific IAM roles. Data preparations are managed in BigQuery Studio. Opening a table triggers a BigQuery job that samples data for Gemini to generate suggestions.

Views in the Data Preparation Editor

  • Data View: Displays a sample of the table and allows interaction and application of Gemini suggestions.
  • Graph View: Visual overview of the data preparation pipeline.
  • Schema View: Displays and allows operations on the current schema.

Gemini offers context-aware suggestions for transformations, data quality rules, standardization, enrichment, and schema mapping. Each suggestion includes a high-level category, description, and corresponding SQL expression.

BigQuery uses data sampling to preview data preparation. Samples are not automatically refreshed. Optimize costs and processing time by changing write mode settings to incrementally process new data. Supported modes include Full refresh, Append, and Incremental.

Supported Data Preparation Steps

  • Source: Adds a source table or join step.
  • Transformation: Cleans and transforms data using SQL expressions.
  • Filter: Removes rows using WHERE clause syntax.
  • Validation: Sends rows meeting validation criteria to an error table.
  • Join: Joins values from two sources with various join operations.
  • Destination: Defines where to output data preparation steps.
  • Delete Columns: Removes columns from the schema.

Schedule one-time or recurring data preparation runs from the data preparation editor or manage them from the BigQuery Orchestration page. BigQuery data preparation does not have its own API. Contact bq-datapreparation-feedback@google.com for more information.

Limitations

  • Source and destination datasets must be in the same location.
  • Data and interactions are processed in a US data center during pipeline editing.
  • No support for natural language SQL query generation or viewing/comparing/restoring data preparation versions.
  • Gemini responses are based on a sample of the dataset.

For more detailed steps and configurations, refer to the BigQuery documentation below.

Have more questions on this topic? Ask our AI assistant for in-depth insights.

Read more from sources 👇

The Only Digital Marketing Feed You'll Ever Need.

Stay informed your way. Tailored updates when and how you want them. 100% Free.

10,000+ Users

500+ Sources

1000+ Tools

Or

Related Posts

Google Cloud Code Extension Adds BigQuery Support in VS Code

Google Cloud Code Extension Adds BigQuery Support in VS Code

Google Cloud
Google Cloud

Official Source

Official Source

Google Cloud is a Official Source. The source has been verified by Swipe Insight team.

Official Source
BigQuery Introduces Commercial Data Sharing via Cloud Marketplace

BigQuery Introduces Commercial Data Sharing via Cloud Marketplace

Google Cloud
Google Cloud

Official Source

Official Source

Google Cloud is a Official Source. The source has been verified by Swipe Insight team.

Official Source
Google Sheets Integrates BigQuery Saved Queries

Google Sheets Integrates BigQuery Saved Queries

Google
Google

Official Source

Official Source

Google is a Official Source. The source has been verified by Swipe Insight team.

Official Source
Tired of spending too much time creating audits for your clients?

Tired of spending too much time creating audits for your clients?

Featured
BigQuery Expands Search Index Capabilities with INT64 and TIMESTAMP Support

BigQuery Expands Search Index Capabilities with INT64 and TIMESTAMP Support

Google Cloud
Google Cloud

Official Source

Official Source

Google Cloud is a Official Source. The source has been verified by Swipe Insight team.

Official Source
Google Ends Free Access to Gemini in BigQuery, Announces Paid Plans

Google Ends Free Access to Gemini in BigQuery, Announces Paid Plans

Google Cloud
Google Cloud

Official Source

Official Source

Google Cloud is a Official Source. The source has been verified by Swipe Insight team.

Official Source
Google Cloud Enhances Looker with Major User Experience Updates

Google Cloud Enhances Looker with Major User Experience Updates

Sean Zinsmeister
Sean Zinsmeister

Official Source

Official Source

Sean Zinsmeister is a Official Source. The source has been verified by Swipe Insight team.

Official Source
BigQuery Launches Gemini-Enhanced SQL Translation Features

BigQuery Launches Gemini-Enhanced SQL Translation Features

Google Cloud
Google Cloud

Official Source

Official Source

Google Cloud is a Official Source. The source has been verified by Swipe Insight team.

Official Source

Related Tools

Marketing Auditor logo

Marketing Auditor

Verified Tool

Verified Tool

Marketing Auditor is a Verified Tool. Want to get this badge? Contact us.

Verified Tool

Automated audits for Google Ads and Analytics.

Get Featured Here

Showcase your tool in this list.

Contact Us
Databricks logo

Databricks

Generative AI-powered data intelligence platform

Data Engineering
GA4 SQL logo

GA4 SQL

Verified Tool

Verified Tool

GA4 SQL is a Verified Tool. Want to get this badge? Contact us.

Verified Tool

Generate GA4 BigQuery queries easily

Data Analysis
TapClicks logo

TapClicks

Automated marketing solutions powered by your data

Data Engineering
Stitch logo

Stitch

Automated cloud data pipelines, no coding needed

Data Engineering
Akkio logo

Akkio

AI-powered analytics for agencies

Data Analysis
NinjaCat logo

NinjaCat

AI-powered marketing data and analytics platform

Reporting
Funnel logo

Funnel

Aggregate and analyze marketing data seamlessly

Reporting
Fivetran logo

Fivetran

Effortlessly centralize and move data from any source

Data Engineering
Power My Analytics logo

Power My Analytics

Automate and integrate your marketing data

Reporting

Get Featured Here

Showcase your tool in this list.

Contact Us