BigQuery has introduced an update to its ML.GENERATE_EMBEDDING
function, offering users more control over embedding generation when working with remote models based on Vertex AI's multimodal embedding models.
Key Feature: output_dimensionality Argument
The new output_dimensionality
argument allows users to specify the number of dimensions for generated embeddings. This feature, currently in Preview, provides flexibility in tailoring embedding outputs to specific use cases.
Customization Options
Users can now choose from four different dimension sizes:
- 128
- 256
- 512
- 1408 (default)
For example, specifying 256 AS output_dimensionality
will result in the ml_generate_embedding_result
output column containing 256 embeddings for each input value.
Implications for Data Scientists and Analysts
This update offers several benefits:
- Improved control over embedding complexity
- Potential for reduced computational resources
- Flexibility to match embedding dimensions with specific model requirements
Availability
The output_dimensionality
argument is available when using ML.GENERATE_EMBEDDING
with remote models based on Vertex AI multimodal embedding models.
This enhancement underscores BigQuery's commitment to providing advanced machine learning capabilities directly within its SQL interface, enabling more sophisticated data analysis and model development workflows.