Interface

class gretel_client.gretel.interface.Gretel(*, project_name: str | None = None, project_display_name: str | None = None, session: ClientConfig | None = None, **session_kwargs)

High-level interface for interacting with Gretel’s APIs.

To bound an instance of this class to a Gretel project, provide a project name at instantiation or use the set_project method. If a job is submitted (via a submit_* method) without a project set, a randomly-named project will be created and set as the current project.

Parameters:

project_name (str) – Name of new or existing project. If a new project name is given, it will be created at instantiation. If no name given, a new randomly-named project will be created with the first job submission.
project_display_name (str) – Project display name. If None, will use the project name. This argument is only used when creating a new project.
session (ClientConfig) – Client session to use. If set, no session_kwargs may be specified.
**session_kwargs – kwargs for your Gretel session. See options below.

Keyword Arguments:

api_key (str) – Your Gretel API key. If set to “prompt” and no API key is found on the system, you will be prompted for the key.
endpoint (str) – Specifies the Gretel API endpoint. This must be a fully qualified URL. The default is “https://api.gretel.cloud”.
default_runner (str) – Specifies the runner mode. Must be one of “cloud”, “local”, “manual”, or “hybrid”. The default is “cloud”.
artifact_endpoint (str) – Specifies the endpoint for project and model artifacts. Defaults to “cloud” for running in Gretel Cloud. If working in hybrid mode, set to the URL of your artifact storage bucket.
cache (str) – Valid options are “yes” or “no”. If set to “no”, the session configuration will not be written to disk. If set to “yes”, the session configuration will be written to disk only if one doesn’t already exist. The default is “no”.
validate (bool) – If True, will validate the login credentials at instantiation. The default is False.
clear (bool) – If True, existing Gretel credentials will be removed. The default is False.

fetch_generate_job_results(model_id: str, record_id: str) → GenerateJobResults

Fetch the results object from a Gretel generate job.

Parameters:

model_id – The Gretel model ID.
record_id – The Gretel record handler ID.

Raises:

GretelProjectNotSetError – If a project has not been set.

Returns:

Job results including the model object, record handler, and synthetic data.

fetch_model(model_id: str) → Model

Fetch a Gretel model using its ID.

You must set a project before calling this method.

Parameters:: model_id – The Gretel model ID.
Raises:: GretelProjectNotSetError – If a project has not been set.
Returns:: The Gretel model object.

fetch_train_job_results(model_id: str) → TrainJobResults

Fetch the results object from a Gretel training job.

You must set a project before calling this method.

Parameters:: model_id – The Gretel model ID.
Raises:: GretelProjectNotSetError – If a project has not been set.
Returns:: Job results including the model object, report, logs, and final config.

get_project(**kwargs) → Project

Returns the current Gretel project.

If a project has not been set, a new one will be created. The optional kwargs are the same as those available for the set_project method.

run_tuner(tuner_config: str | Path | dict, *, data_source: str | Path | _DataFrameT, n_trials: int = 5, n_jobs: int = 1, use_temporary_project: bool = False, verbose_logging: bool = False, **non_default_config_settings)

Run a hyperparameter tuning experiment with Gretel Tuner.

Parameters:

tuner_config – The config as a yaml file path, yaml string, or dict.
data_source – Training data source as a file path or pandas DataFrame.
n_trials – Number of trials to run.
n_jobs – Number of parallel jobs to run locally. Note each job will spin up a Gretel worker.
use_temporary_project – If True, will create a temporary project for the tuning experiment. The project will be deleted when the experiment is complete. If False, will use the current project.
verbose_logging – If True, will print all logs from submitted Gretel jobs.
**non_default_config_settings – Config settings to override in the given tuner config. The kwargs must follow the same nesting format as the yaml config file. See example below.

Raises:

ImportError – If the Gretel Tuner is not installed.

Returns:

Tuner results dataclass with the best config, best model id, study object, and trial data as attributes.

Example:

from gretel_client import Gretel
gretel = Gretel(api_key="prompt")

yaml_config_string = '''
base_config: "tabular-actgan"
metric: synthetic_data_quality_score
params:
    epochs:
        fixed: 50
    batch_size:
        choices: [500, 1000]
privacy_filters:
    similarity:
        choices: ["medium", "high"]
'''

data_source="https://gretel-public-website.s3-us-west-2.amazonaws.com/datasets/USAdultIncome5k.csv"

results = gretel.run_tuner(
    tuner_config=yaml_config_string,
    data_source=data_source,
    n_trials=2,
    params={
        "batch_size": {"choices": [50, 100]},
        "generator_lr": {"log_range": [0.001, 0.01]}
    },
    privacy_filters={"similarity": {"choices": [None, "medium", "high"]}},
)

print(f"Best config: {results.best_config}")

# generate data with best model
generated = gretel.submit_generate(results.best_model_id, num_records=100)

set_project(name: str | None = None, desc: str | None = None, display_name: str | None = None)

Set the current Gretel project.

If a project with the given name does not exist, it will be created. If the name is not unique, the user id will be appended to the name.

Parameters:

name – Name of new or existing project. If None, will create one.
desc – Project description.
display_name – Project display name. If None, will use project name.

Raises:

ApiException – If an error occurs while creating the project.

submit_generate(model_id: str, *, num_records: int | None = None, seed_data: str | Path | _DataFrameT | None = None, wait: bool = True, fetch_data: bool = True, verbose_logging: bool = False, **generate_kwargs) → GenerateJobResults

Submit a Gretel model generate job.

Only one of num_records or seed_data can be provided. The former will generate a complete synthetic dataset, while the latter will conditionally generate synthetic data based on the seed data.

Parameters:

model_id – The Gretel model ID.
num_records – Number of records to generate.
seed_data – Seed data source as a file path or pandas DataFrame.
wait – If True, wait for the job to complete before returning.
fetch_data – If True, fetch the synthetic data as a DataFrame.
verbose_logging – If True, will print all logs from the job.

Raises:

GretelJobSubmissionError – If the combination of arguments is invalid.

Returns:

Job results including the model object, record handler, and synthetic data.

Examples:

# Generate a synthetic dataset with 1000 records.
from gretel_client import Gretel
gretel = Gretel(project_name="my-project")
generated = gretel.submit_generate(model_id, num_records=100)

# Conditionally generate synthetic examples of a rare class.
import pandas pd
from gretel_client import Gretel
gretel = Gretel(project_name="my-project")
df_seed = pd.DataFrame(["rare_class"] * 1000, columns=["field_name"])
generated = gretel.submit_generate(model_id, seed_data=df_seed)

Submit a Gretel model training job.

Training jobs are configured by updating a base config, which can be given as a dict, yaml file path, yaml string, or as the name of one of the Gretel base config files (without the extension) listed here: https://github.com/gretelai/gretel-blueprints/tree/main/config_templates/gretel/synthetics

Parameters:

base_config – Base config name, yaml file path, yaml string, or dict.
data_source – Training data source as a file path or pandas DataFrame.
job_label – Descriptive label to append to job the name.
wait – If True, wait for the job to complete before returning.
verbose_logging – If True, will print all logs from the job.
**non_default_config_settings – Config settings to override in the template. The format is section={“setting”: “value”}, where section is the name of a yaml section within the specific model settings, e.g. params or privacy_filters. If the parameter is not nested within a section, pass it directly as a keyword argument.

Returns:

Job results including the model object, report, logs, and final config.

Example:

from gretel_client import Gretel

data_source="https://gretel-public-website.s3-us-west-2.amazonaws.com/datasets/USAdultIncome5k.csv"

gretel = Gretel(project_name="my-project")
trained = gretel.submit_train(
    base_config="tabular-actgan",
    data_source=data_source,
    params={"epochs": 100, "generator_dim": [128, 128]},
    privacy_filters={"similarity": "high", "outliers": None},
)