Safe Synthetics SDK
Gretel Safe Synthetics allows you to create private versions of your sensitive data. You can use Safe Synthetics to redact and replace sensitive Personally Identifiable Information (PII) with Transform, obfuscate quasi-identifiers with Synthetics, and apply differential privacy for mathematical guarantees of privacy protection.
Getting Started
The Safe Synthetics SDK adopts a fluent builder pattern for constructing a Safe Synthetic data pipeline.
The following code example creates and executes a synthetic data pipeline. The script will wait while the job completes, and then return the data generation report and associated dataset.
from gretel_client.navigator_client import Gretel
gretel = Gretel()
synthetic_dataset = gretel.safe_synthetics\
.from_data_source("https://raw.githubusercontent.com/gretelai/gretel-blueprints/refs/heads/main/sample_data/financial_transactions.csv")
.transform()
.synthesize()
.create()
# waits until the synthetic data pipeline completes
synthetic_dataset.wait_until_done()
# to view a report of the dataset
synthetic_dataset.report.table
# to view the final transformed dataset as a dataframe
synthetic_dataset.dataset.df
For more information and examples please view our main product documentation.
- class gretel_client.safe_synthetics.dataset.SafeSyntheticDataset(builder: WorkflowBuilder, registry: Registry)
A class for configuring and creating synthetic data generation workflows.
- transform(config: str | dict = 'transform/default') Self
Add a data transformation step to the workflow.
- Parameters:
config – Transform configuration, either as a string path to a blueprint or a dictionary of configuration options. Defaults to “transform/default”.
- data_source(data_source: str | Path | DataFrame, use_data_source_step: bool = True) Self
Configure the input data source for the workflow.
- Parameters:
data_source – Input data as either a file path, Path object, or pandas DataFrame
use_data_source_step – Whether to create a dedicated data source step in the workflow. Defaults to True.
- holdout(holdout: float | int | str | Path | DataFrame, max_holdout: int | None = None, group_by: str | None = None) Self
Configure a holdout dataset. This holdout will get used during evaluation.
- Parameters:
holdout – If a numeric value, indicates the amount of data to holdout, either as a fraction (float) or absolute number of records (int). Alternatively can be a file path to, or pandas DataFrame of, pre-configured test holdout data.
max_holdout – Maximum number of records to include in holdout set
group_by – Column name to use for grouped holdout selection
- synthesize(model_or_blueprint_or_task: str | BaseModel | None = 'tabular_ft/default', config: dict | str | None = None, num_records: int | None = None)
Configure the synthetic data generation model.
- Parameters:
model_or_blueprint_or_task – Model specification, either as a string identifier, blueprint path, or BaseModel instance
config – Additional configuration options as dict or YAML string
num_records – Number of synthetic records to generate
- Raises:
TaskConfigError – If the model configuration cannot be determined
- evaluate(config: dict | str | EvaluateSafeSyntheticsDataset | None = None, disable: bool = False) Self
Configure the evaluation step for comparing synthetic to original data.
- Parameters:
config – Evaluation configuration as dict, YAML string, or concrete config instance
disable – If True, disable the evaluation step. Defaults to False.
- create(new_workflow: bool = False, name: str | None = None, run_name: str | None = None, wait_until_done: bool = False) WorkflowRun
Create and optionally execute the configured synthetic data generation pipeline.
- Parameters:
new_workflow – If True, create a new workflow instead of using existing
name – Name for the workflow
run_name – Name for this specific workflow run
wait_until_done – If True, wait for workflow completion before returning
- Returns:
WorkflowRun instance representing the created workflow
- Raises:
WorkflowValidationError – If the workflow configuration is invalid
- builder() WorkflowBuilder
Get the underlying WorkflowBuilder instance.