Safe Synthetics SDK

Gretel Safe Synthetics allows you to create private versions of your sensitive data. You can use Safe Synthetics to redact and replace sensitive Personally Identifiable Information (PII) with Transform, obfuscate quasi-identifiers with Synthetics, and apply differential privacy for mathematical guarantees of privacy protection.

Getting Started

The Safe Synthetics SDK adopts a fluent builder pattern for constructing a Safe Synthetic data pipeline.

The following code example creates and executes a synthetic data pipeline. The script will wait while the job completes, and then return the data generation report and associated dataset.

from gretel_client.navigator_client import Gretel

gretel = Gretel()

synthetic_dataset = gretel.safe_synthetics\
    .from_data_source("https://raw.githubusercontent.com/gretelai/gretel-blueprints/refs/heads/main/sample_data/financial_transactions.csv")
    .transform()
    .synthesize()
    .create()


# waits until the synthetic data pipeline completes
synthetic_dataset.wait_until_done()

# to view a report of the dataset
synthetic_dataset.report.table


# to view the final transformed dataset as a dataframe
synthetic_dataset.dataset.df

For more information and examples please view our main product documentation.

class gretel_client.safe_synthetics.dataset.SafeSyntheticDataset(builder: WorkflowBuilder, registry: Registry)

A class for configuring and creating synthetic data generation workflows.

transform(config: str | dict = 'transform/default') Self

Add a data transformation step to the workflow.

Parameters:

config – Transform configuration, either as a string path to a blueprint or a dictionary of configuration options. Defaults to “transform/default”.

data_source(data_source: str | Path | DataFrame, use_data_source_step: bool = True) Self

Configure the input data source for the workflow.

Parameters:
  • data_source – Input data as either a file path, Path object, or pandas DataFrame

  • use_data_source_step – Whether to create a dedicated data source step in the workflow. Defaults to True.

holdout(holdout: float | int | str | Path | DataFrame, max_holdout: int | None = None, group_by: str | None = None) Self

Configure a holdout dataset. This holdout will get used during evaluation.

Parameters:
  • holdout – If a numeric value, indicates the amount of data to holdout, either as a fraction (float) or absolute number of records (int). Alternatively can be a file path to, or pandas DataFrame of, pre-configured test holdout data.

  • max_holdout – Maximum number of records to include in holdout set

  • group_by – Column name to use for grouped holdout selection

synthesize(model_or_blueprint_or_task: str | BaseModel | None = 'tabular_ft/default', config: dict | str | None = None, num_records: int | None = None)

Configure the synthetic data generation model.

Parameters:
  • model_or_blueprint_or_task – Model specification, either as a string identifier, blueprint path, or BaseModel instance

  • config – Additional configuration options as dict or YAML string

  • num_records – Number of synthetic records to generate

Raises:

TaskConfigError – If the model configuration cannot be determined

evaluate(config: dict | str | EvaluateSafeSyntheticsDataset | None = None, disable: bool = False) Self

Configure the evaluation step for comparing synthetic to original data.

Parameters:
  • config – Evaluation configuration as dict, YAML string, or concrete config instance

  • disable – If True, disable the evaluation step. Defaults to False.

create(new_workflow: bool = False, name: str | None = None, run_name: str | None = None, wait_until_done: bool = False) WorkflowRun

Create and optionally execute the configured synthetic data generation pipeline.

Parameters:
  • new_workflow – If True, create a new workflow instead of using existing

  • name – Name for the workflow

  • run_name – Name for this specific workflow run

  • wait_until_done – If True, wait for workflow completion before returning

Returns:

WorkflowRun instance representing the created workflow

Raises:

WorkflowValidationError – If the workflow configuration is invalid

builder() WorkflowBuilder

Get the underlying WorkflowBuilder instance.