BigQuery DataFrames

Interfaces for using Gretel with Google BigQuery. This module assumes that the bigframes package is already installed as a transitive dependency.

class gretel_client.bigquery.BigFrames(gretel: Gretel)

This interface enables using Gretel Transforms, Gretel Synthetics, and Gretel Navigator with Google BigFrames.

Parameters:

gretel – An instance of the Gretel interface. This instance should be imported from from gretel_client import Gretel.

display_dataframe_in_notebook(dataframe: DataFrame, settings: dict | None = None) None

Display a BigFrames DataFrame in a Notebook.

Parameters:
  • dataframe – A BigFrames DataFrame

  • settings – Any valid settings that are accepted by the method pandas.DataFrame.style.set_properties

fetch_generate_job_results(model_id: str, record_id: str) ModelGenerationResult

Given the Model ID and Job ID (record ID), return a ModelGenerationResult instance which allows for checking the generation job status and retrieving the generated data.

fetch_train_job_results(model_id: str) ModelTrainResult

Given a Gretel Model ID, return a ModelTrainResult instance. This allows for checking model training status, retrieving model quality report(s) and retrieving generated data.

fetch_transform_results(model_id: str) BigQueryTransformResults

Given a Transforms model ID, return a TransformsResult in order to retrieve transformed data and check job status.

init_navigator(name: str, **kwargs) None

Create an instance of Gretel’s Navigator API and store it on this instance. Only Navigator’s Tabular mode is supported.

Parameters:

name – The name of the Navigator instance you want to use. When using this Navigator instance, you will refernce this name.

The additional **kwargs are identical to what is supported in Gretel.factories.initialize_navigator_api().

navigator_edit(name: str, *args, seed_data: DataFrame | List[dict[str, Any]], **kwargs) DataFrame

Edit a BigQuery Table using Gretel Navigator.

Parameters:
  • name – The name of a registered Navigator instance. This should have been

  • method. (created using the init_navigator()) –

The other *args and **kwargs are what is supported by TabularInferenceAPI.edit(). Streaming responses are not supported at this time.

navigator_generate(name: str, *args, **kwargs) DataFrame

Generate a BigQuery Table using Gretel Navigator.

Parameters:
  • name – The name of a registered Navigator instance. This should have been

  • method. (created using the init_navigator()) –

The other *args and **kwargs are what is supported by TabularInferenceAPI.generate(). Streaming responses are not supported at this time.

submit_generate(model_id: str, *, seed_data: DataFrame | None = None, wait: bool = False, **kwargs) ModelGenerationResult

Given a fine-tuned model ID, request the generation of more data.

If the model supports conditional generation, a partial DataFrame may be provided as input to inference. This method supports the same additional kwargs as Gretel.submit_generate().

submit_train(base_config: str | Path | dict, *, dataframe: DataFrame, wait: bool = False, **kwargs) ModelTrainResult

Fine-tune a Gretel model on an existing BigFrames DataFrame

Parameters:
  • base_config – Base Gretel config name, yaml file path, yaml string, or dict.

  • dataframe – The BigFrames DataFrame to use as the training data.

  • wait – If True, wait for the job to complete before returning.

NOTE: The remaining kwargs are the same ones that are supported by

Gretel.submit_train()

submit_transform(config: str | Path | dict, dataframe: DataFrame, *, wait: bool = False, **kwargs) BigQueryTransformResults

Run a Gretel Transform job against the provided dataframe. A Transform model will be created and then immediately used to apply row, column, or cell level transforms against a dataframe.

class gretel_client.bigquery.BigQueryTransformResults(project: Project, model: Model, transform_logs: List[dict] | None = None, transformed_df: DataFrame | None = None, transformed_data_link: str | None = None, report: GretelReport | None = None)

Should not be used directly.

Stores metadata and a transformed BigFrames DataFrame that was created from a Gretel Transforms job.

refresh() None

Refresh the transform job result attributes.

transformed_df: bpd.DataFrame | None = None

A BigQuery DataFrame of the transformed table. This will not be populated until the trasnforms job succeeds.

class gretel_client.bigquery.JobLabel(value)

An enumeration.

class gretel_client.bigquery.ModelGenerationResult(project: Project, model: Model, record_handler: RecordHandler, synthetic_data_link: str | None = None, synthetic_data: DataFrame | None = None)

Should not be used directly.

An instance of this class is returned when generating more data from an existing model or retrieving generated data from an existing model.

refresh() None

Refresh the generate job results attributes.

class gretel_client.bigquery.ModelTrainResult(project: Project, model: Model, model_config: dict | None = None, report: GretelReport | None = None, model_logs: List[dict] | None = None)

Should not be used directly.

An instance of this class is returned when creating a new synthetic model or retrieving an existing one.

fetch_report_synthetic_data() DataFrame

Fetch the synthetic BigQuery DataFrame that was created as part of the model training process. This DataFrame is what is used to create the model report.