Management

class gretel_client.workflows.manager.WorkflowManager(api_factory: GretelApiProviderProtocol, resource_provider: GretelResourceProviderProtocol)

Provides a low-level interface for interacting with Gretel Workflows.

Note: This class should never be directly instantiated. Instead you should interact with the class from the Gretel client session.

For example

To fetch an existing Workflow Run:

from gretel.navigator_client import Gretel

gretel = Gretel()
workflow = gretel.workflows.get_workflow_run("wr_run_id_here")
builder(globals: Globals | None = None) WorkflowBuilder

Creates a new workflow builder instance. This can be used to construct Workflows using a fluent builder pattern.

Parameters:

globals – Configure global variables for the Workflow.

Returns:

A fluent builder to construct Workflows.

Return type:

WorkflowBuilder

create(tasks: list[pydantic.main.BaseModel], wait_until_done: bool = False) WorkflowRun

Creates and executes a workflow from a list of task configurations.

Parameters:

tasks – List of task configurations to include in the workflow.

Returns:

The executed workflow run instance.

Return type:

WorkflowRun

get_workflow_run(workflow_run_id: str) WorkflowRun

Retrieves a specific workflow run by ID.

Parameters:

workflow_run_id – The ID of the workflow run to retrieve.

Returns:

The workflow run instance.

Return type:

WorkflowRun

registry() dict[str, Any]

Retrieves the workflow registry.

Returns:

The workflow registry.

Return type:

object

class gretel_client.workflows.workflow.WorkflowRun(workflow: WorkflowRun, api_provider: GretelApiProviderProtocol, resource_provider: GretelResourceProviderProtocol)

The WorkflowRun class represents a concrete execution of a Workflow, providing methods to monitor execution, retrieve outputs, and access logs. Each workflow execution is composed of steps that form a directed acyclic graph (DAG).

You should never directly instantiate a WorkflowRun, but instead use Workflow methods from the main Gretel class.

property config: dict

Return the Workflow config as a dictionary

property config_yaml: str

Return the Workflow config as yaml

property console_url: str

Get the URL for viewing this Workflow Run in the Gretel Console.

property dataset: Dataset

Get the final output Dataset of the Workflow if one exists

fetch_status() Status

Fetch the latest status of the Workflow

get_step_output(step_name: str, format: str | None = None) PydanticModel | Dataset | Report | IO

Retrieve the output from a specific workflow step.

Parameters:
  • step_name – Name of the workflow step

  • format – Optional output format specification

Returns:

The step output in the appropriate format (PydanticModel, Dataset, Report, or IO)

Raises:

Exception – If the step cannot be found or output type cannot be determined

property id: str

Get the ID of the Workflow Run

property name: str

Get the name of the Workflow

property report: Report

Return the report for the Workflow if one exists

property steps: list[Step]

Return a list of steps in the Workflow

wait_until_done(wait: int = -1, verbose: bool = True, log_printer: LogPrinter | None = None)

Wait for the workflow run to complete, with optional logging.

Parameters:
  • wait – Maximum time to wait in seconds. -1 means wait indefinitely

  • verbose – Whether to print detailed logs during execution

  • log_printer – Custom log printer implementation. If None, uses LoggingPrinter

property workflow: Workflow

Get the Workflow configuration

property workflow_id: str

Get the ID of the parent Workflow

Outputs

class gretel_client.workflows.io.Dataset(df: DataFrame)

Represents tabular data generated by a Workflow.

The Dataset class provides a wrapper around a pandas DataFrame to represent output from Workflow.

This class should never be directly instantiated, but should instead get called from the parent Workflow Run, eg:

gretel.workflows.get_workflow_run("workflow run id").dataset
property df: DataFrame

Get the Datasets as a pandas DataFrame

download(file: str | Path | IO, format: Literal['csv', 'parquet'] = 'parquet') None

Save the dataset to a file in either CSV or parquet format.

Parameters:
  • file – The target file path or file-like object where the data will be saved.

  • format – The output format, either “csv” or “parquet”. Defaults to “parquet”.

Note

If a string or Path is provided, any necessary parent directories will be created automatically.

class gretel_client.workflows.io.PydanticModel(model_dict: dict)

Some Workflow steps produce structured data as pydantic objects. This class is a wrapper around those objects providing methods to interact with the underlying data structure.

property dict: dict

Return the dictionary representation of the output

class gretel_client.workflows.io.Report(report_dict: dict, report_downloader: Callable[[Literal['json', 'html'] | None], IO])

Represents an evaluation report for synthetic data generated by workflows.

The Report class provides functionality to display, and save evaluation report comparing output data with the reference dataset.

This class should never be directly instantiated, but should instead get called from the parent Workflow Run, eg:

gretel.workflows.get_workflow_run("workflow run id").report
property dict: dict

Get the report as a dictionary

display_in_browser()

Display the HTML report in a browser.

display_in_notebook()

Display the HTML report in a notebook.

download(file: str | Path | IO, format: Literal['json', 'html'] = 'html')

Save the report to a file in either JSON or HTML format.

Parameters:
  • file – The target file path or file-like object where the report will be saved.

  • format – The output format, either “json” or “html”. Defaults to “json”.

Note

If a string or Path is provided, any necessary parent directories will be created automatically.

property table: Table

Get a formatted rich Table representation of the report.

Returns:

A rich Table instance containing the report data formatted

for display.

Return type:

Table

Builder

class gretel_client.workflows.builder.FieldViolation(*, field: str, error_message: str)

Represent a field that has failed schema validation

class gretel_client.workflows.builder.LogMessage(level: 'str', msg: 'str')
class gretel_client.workflows.builder.Message(step: 'str', stream: 'str', payload: 'dict', type: 'str', ts: 'datetime')
payload: dict

The actual value of the output

raise_for_error() None

Check for fatal errors and raise an exception if found.

step: str

The name of the step

stream: str

The stream the message should be associated with.

We use multiple streams so that we can differentiate between different types of outputs.

ts: datetime

The date and time the message was created

type: str

The type of message

class gretel_client.workflows.builder.WorkflowBuilder(project_id: str, globals: Globals, api_provider: GretelApiProviderProtocol, resource_provider: GretelResourceProviderProtocol, workflow_session_manager: WorkflowSessionManager | None = None)

A builder class for creating Gretel workflows.

This class provides a fluent interface for constructing Workflow objects by chaining method calls. It allows setting the workflow name and adding steps sequentially.

add_step(step: BaseModel | Step, step_inputs: list[pydantic.main.BaseModel | Step | str] | None = None, validate: bool = True, step_name: str | None = None) Self

Add a single step to the workflow.

Parameters:
  • step – The workflow step to add.

  • step_input – Configure an input for the step or task.

  • validate – Whether to validate the step. Defaults to True.

  • step_name – The name of the step. If not provided, the name will be generated based on the name of the task.

Returns:

The builder instance for method chaining.

Return type:

Self

add_steps(steps: list[pydantic.main.BaseModel | Step], validate: bool = True) Self

Add multiple steps to the workflow.

Parameters:
  • steps – A list of workflow steps to add.

  • validate – Whether to validate the steps. Defaults to True.

Returns:

The builder instance for method chaining.

Return type:

Self

property data_source: str | None

Return the current input data source for the builder.

for_workflow(workflow_id: str | None = None) Self

Configure this builder to use an existing workflow.

When a workflow ID is specified, the run() method will execute a new run within the context of the existing workflow instead of creating a new workflow. This allows multiple runs to share the same workflow.

Parameters:

workflow_id – The ID of an existing workflow to use. If set to None, a new workflow will get created for the subsequent run.

Returns:

The builder instance for method chaining

Return type:

Self

get_steps() list[Step]

Return the list of steps in the workflow.

iter_preview() Iterator[Message | WorkflowInterruption]

Stream workflow execution messages for preview purposes.

This method executes the workflow in streaming preview mode, returning an iterator that yields messages as they are received from the workflow execution. This allows for real-time monitoring of workflow execution before you submit your workflow for batch execution.

Returns:

An iterator that yields:
  • Message objects containing logs, outputs, and state changes from the workflow

  • WorkflowInterruption if the stream is unexpectedly disconnected

Return type:

Iterator[Union[Message, WorkflowInterruption]]

prepare_data_source(data_source: str | Path | DataFrame | File, purpose: str = 'dataset') str

Uploads the data source to the Files API if it is not already a File and returns the file ID.

preview(log_printer: ~typing.Callable[[~gretel_client.workflows.builder.Message | ~gretel_client.workflows.builder.WorkflowInterruption], None] = <function _default_preview_printer>)

Preview the workflow in realtime.

Parameters:

log_printer – A callable that processes each message or interruption. Defaults to _default_preview_printer which logs messages to the console in a human-readable format. You can provide your own function to customize how messages are processed.

run(name: str | None = None, run_name: str | None = None, wait_until_done: bool = False) WorkflowRun

Execute the workflow as a batch job.

This method creates a persistent workflow and runs it as a batch job on the Gretel platform. Unlike preview, this creates a permanent record of the workflow execution that can be referenced later.

Parameters:
  • name – Optional name to assign to the workflow. If provided, this will override any name previously set with the name() method.

  • run_name – Optional name to assign to this specific run of the workflow.

  • wait_until_done – Block until the workflow has completed running. If set to false the method will immediately return the WorkflowRun object.

Returns:

A WorkflowRun object representing the running workflow.

This can be used to track the status of the workflow and retrieve results when it completes.

Return type:

WorkflowRun

set_name(name: str) Self

Set the name of the workflow.

Parameters:

name – The name to assign to the workflow.

Returns:

The builder instance for method chaining.

Return type:

Self

property step_names: list[str]

Return a list of step names for the current builder

to_dict() dict

Convert the workflow to a dictionary representation.

Returns:

A dictionary representation of the workflow.

Return type:

dict

to_workflow() Workflow

Convert the builder to a Workflow object.

Returns:

A new Workflow instance with the configured name and steps.

Return type:

Workflow

to_yaml() str

Convert the workflow to a YAML string representation.

Returns:

A YAML string representation of the workflow.

Return type:

str

validate_step(step: Step) str

Validate a workflow step using the Gretel API.

This method makes an API call to validate the configuration of a workflow step before adding it to the workflow. It ensures the task type and configuration are valid according to the Gretel platform’s requirements.

Parameters:

step – The workflow step to validate, containing task name and configuration.

Returns:

Validation message if successful. Empty string if no message was returned.

Return type:

str

Raises:
  • WorkflowValidationError – If the step fails validation. The exception includes field-specific violations that can be accessed via the field_violations property.

  • ApiException – If there is an issue with the API call not related to validation.

with_data_source(data_source: str | Path | DataFrame | File, purpose: str = 'dataset', use_data_source_step: bool = False) Self

Add a data source to the workflow.

This method allows you to specify the primary data input for your workflow. The data source will be connected to the first step in the workflow chain.

Parameters:
  • data_source – The data to use as input. Can be one of: - A File object from the Gretel SDK - A file ID string (starting with “file_”) - A path to a local file (string or Path) - A pandas DataFrame

  • purpose – The purpose tag for the uploaded data. Defaults to “dataset”.

  • use_data_source_step – Instead of passing a file_id as an input, use the DataSource task. Generally you shouldn’t need to set this.

Returns:

The builder instance for method chaining.

Return type:

Self

Examples

# Using a pandas DataFrame builder.with_data_source(df)

# Using a file path builder.with_data_source(“path/to/data.csv”)

# Using an existing Gretel File object builder.with_data_source(file_obj)

# Using a file ID builder.with_data_source(“file_abc123”)

class gretel_client.workflows.builder.WorkflowInterruption(message: str)

Provide a user friendly error message when a workflow is unexpectedly interrupted.

exception gretel_client.workflows.builder.WorkflowTaskError

Represents an error returned by the Task. This error is most likely related to an issue with the Task itself. If you see this error check your Task config first. If the issue persists, the error might be a bug in the remote Task implementation.

exception gretel_client.workflows.builder.WorkflowValidationError(msg: str, *, task_name: str | None = None, step_name: str | None = None, field_violations: list[FieldViolation] | None = None)

Raised when workflow schema validation fails.

Use field_violations to access validation errors by field name.

Configs

class gretel_client.workflows.configs.workflows.DistributionType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
class gretel_client.workflows.configs.workflows.GenerationParameters(*, temperature: float | UniformDistribution | ManualDistribution | None = None, top_p: float | UniformDistribution | ManualDistribution | None = None, **extra_data: Any)
class gretel_client.workflows.configs.workflows.Globals(*, num_records: int | None = None, model_suite: str | None = None, model_configs: List[ModelConfig] | None = None, error_rate: float | None = 0.2, **extra_data: Any)
class gretel_client.workflows.configs.workflows.ManualDistribution(*, distribution_type: DistributionType | None = 'manual', params: ManualDistributionParams, **extra_data: Any)
class gretel_client.workflows.configs.workflows.ManualDistributionParams(*, values: List[float], weights: List[float] | None = None, **extra_data: Any)
class gretel_client.workflows.configs.workflows.ModelConfig(*, alias: str, model_name: str, generation_parameters: GenerationParameters, **extra_data: Any)
class gretel_client.workflows.configs.workflows.Step(*, name: str, task: str, inputs: List[str] | None = None, config: Dict[str, Any], **extra_data: Any)
class gretel_client.workflows.configs.workflows.UniformDistribution(*, distribution_type: DistributionType | None = 'uniform', params: UniformDistributionParams, **extra_data: Any)
class gretel_client.workflows.configs.workflows.UniformDistributionParams(*, low: float, high: float, **extra_data: Any)
class gretel_client.workflows.configs.workflows.Workflow(*, name: str, version: str | None = '2', inputs: Dict[str, Any] | None = None, globals: Globals | None = None, steps: List[Step] | None = None, **extra_data: Any)

Tasks

class gretel_client.workflows.configs.registry.Registry
class ConcatDatasets(**extra_data: Any)
class ExtractDataSeedsFromSampleRecords(*, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, max_num_seeds: int | None = 5, num_assistants: int | None = 5, dataset_context: str | None = '', system_prompt_type: SystemPromptType | None = 'cognition', num_samples: int | None = 25, **extra_data: Any)
model_suite: Annotated[Optional[str], Field(title='Model Suite')]
error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
max_num_seeds: Annotated[Optional[int], Field(ge=1, le=10, title='Max Num Seeds')]
num_assistants: Annotated[Optional[int], Field(ge=1, le=8, title='Num Assistants')]
dataset_context: Annotated[Optional[str], Field(title='Dataset Context')]
system_prompt_type: SystemPromptType | None
num_samples: Annotated[Optional[int], Field(title='Num Samples')]
class IdGenerator(*, num_records: int | None = 100, **extra_data: Any)
num_records: Annotated[Optional[int], Field(title='Num Records')]
class LoadDataSeeds(*, seed_categories: List[SeedCategory], dataset_schema_map: Dict[str, Any] | None = None, **extra_data: Any)
seed_categories: Annotated[List[SeedCategory], Field(title='Seed Categories')]
dataset_schema_map: Annotated[Optional[Dict[str, Any]], Field(title='Dataset Schema Map')]
class GenerateColumnFromTemplateV2(*, num_records: int | None = 100, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, model_configs: List[ModelConfig] | None = None, model_alias: str | ModelAlias | None = 'text', prompt: str, name: str | None = 'response', system_prompt: str | None = None, output_type: OutputType | None = 'text', output_format: str | Dict[str, Any] | None = None, description: str | None = '', **extra_data: Any)
num_records: Annotated[Optional[int], Field(title='Num Records')]
model_suite: Annotated[Optional[str], Field(title='Model Suite')]
error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
model_configs: Annotated[Optional[List[ModelConfig]], Field(title='Model Configs')]
model_alias: Annotated[Optional[Union[str, ModelAlias]], Field(title='Model Alias')]
prompt: Annotated[str, Field(title='Prompt')]
name: Annotated[Optional[str], Field(title='Name')]
system_prompt: Annotated[Optional[str], Field(title='System Prompt')]
output_type: OutputType | None
output_format: Annotated[Optional[Union[str, Dict[str, Any]]], Field(title='Output Format')]
description: Annotated[Optional[str], Field(title='Description')]
class DropColumns(*, columns: List[str], **extra_data: Any)
columns: Annotated[List[str], Field(title='Columns')]
class NameGenerator(*, num_records: int | None = 100, column_name: str | None = 'name', seed: int | None = None, should_fail: bool | None = False, **extra_data: Any)
num_records: Annotated[Optional[int], Field(title='Num Records')]
column_name: Annotated[Optional[str], Field(title='Column Name')]
seed: Annotated[Optional[int], Field(title='Seed')]
should_fail: Annotated[Optional[bool], Field(title='Should Fail')]
class GenerateDatasetFromSampleRecords(*, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, target_num_records: int | None = 500, system_prompt_type: SystemPromptType | None = 'cognition', num_records_per_seed: int | None = 5, append_seeds_to_dataset: bool | None = True, num_examples_per_prompt: int | None = 5, dataset_context: str | None = '', **extra_data: Any)
model_suite: Annotated[Optional[str], Field(title='Model Suite')]
error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
target_num_records: Annotated[Optional[int], Field(ge=50, le=10000, title='Target Num Records')]
system_prompt_type: SystemPromptType | None
num_records_per_seed: Annotated[Optional[int], Field(ge=1, le=10, title='Num Records Per Seed')]
append_seeds_to_dataset: Annotated[Optional[bool], Field(title='Append Seeds To Dataset')]
num_examples_per_prompt: Annotated[Optional[int], Field(ge=1, le=50, title='Num Examples Per Prompt')]
dataset_context: Annotated[Optional[str], Field(title='Dataset Context')]
class SampleDataSeeds(*, num_records: int | None = 100, **extra_data: Any)
num_records: Annotated[Optional[int], Field(title='Num Records')]
class RunSampleToDataset(*, target_num_records: int | None = 500, system_prompt_type: SystemPromptType | None = 'cognition', num_records_per_seed: int | None = 5, num_examples_per_prompt: int | None = 5, max_num_seeds: int | None = 5, num_assistants: int | None = 5, append_seeds_to_dataset: bool | None = True, num_samples: int | None = 25, dataset_context: str | None = '', **extra_data: Any)
target_num_records: Annotated[Optional[int], Field(ge=50, le=10000, title='Target Num Records')]
system_prompt_type: SystemPromptType | None
num_records_per_seed: Annotated[Optional[int], Field(ge=1, le=10, title='Num Records Per Seed')]
num_examples_per_prompt: Annotated[Optional[int], Field(ge=1, le=50, title='Num Examples Per Prompt')]
max_num_seeds: Annotated[Optional[int], Field(ge=1, le=10, title='Max Num Seeds')]
num_assistants: Annotated[Optional[int], Field(ge=1, le=8, title='Num Assistants')]
append_seeds_to_dataset: Annotated[Optional[bool], Field(title='Append Seeds To Dataset')]
num_samples: Annotated[Optional[int], Field(title='Num Samples')]
dataset_context: Annotated[Optional[str], Field(title='Dataset Context')]
class GetGretelDataset(*, name: str, **extra_data: Any)
name: Annotated[str, Field(title='Name')]
class GenerateColumnFromExpression(*, num_records: int | None = 100, name: str, expr: str, dtype: Dtype | None = 'str', **extra_data: Any)
num_records: Annotated[Optional[int], Field(title='Num Records')]
name: Annotated[str, Field(title='Name')]
expr: Annotated[str, Field(title='Expr')]
dtype: Annotated[Optional[Dtype], Field(title='Dtype')]
class Combiner(*, num_records: int | None = 100, **extra_data: Any)
num_records: Annotated[Optional[int], Field(title='Num Records')]
class DummyTaskWithInputs(*, num_records: int | None = 100, **extra_data: Any)
num_records: Annotated[Optional[int], Field(title='Num Records')]
class DummyTaskWithListOfInputs(*, num_records: int | None = 100, **extra_data: Any)
num_records: Annotated[Optional[int], Field(title='Num Records')]
class TestFailingTask(*, num_records: int | None = 100, **extra_data: Any)
num_records: Annotated[Optional[int], Field(title='Num Records')]
class TestOptionalArgTask(**extra_data: Any)
class TestRequiredAndOptionalArgsTask(**extra_data: Any)
class TestTaskCallingTask(**extra_data: Any)
class TestUnhandledErrorTask(*, foo: str, **extra_data: Any)
foo: Annotated[str, Field(title='Foo')]
class GenerateSamplingColumnConfigFromInstruction(*, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, model_configs: List[ModelConfig] | None = None, model_alias: str | ModelAlias | None = 'code', name: str, instruction: str, edit_task: SerializableConditionalDataColumn | None = None, existing_samplers: List[SerializableConditionalDataColumn] | None = None, **extra_data: Any)
model_suite: Annotated[Optional[str], Field(title='Model Suite')]
error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
model_configs: Annotated[Optional[List[ModelConfig]], Field(title='Model Configs')]
model_alias: Annotated[Optional[Union[str, ModelAlias]], Field(title='Model Alias')]
name: Annotated[str, Field(title='Name')]
instruction: Annotated[str, Field(title='Instruction')]
edit_task: SerializableConditionalDataColumn | None
existing_samplers: Annotated[Optional[List[SerializableConditionalDataColumn]], Field(title='Existing Samplers')]
class EvaluateDataDesignerDataset(*, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, model_configs: List[ModelConfig] | None = None, model_alias: str | ModelAlias | None = 'text', llm_judge_columns: List[str] | None = None, columns_to_ignore: List[str] | None = None, validation_columns: List[str] | None = None, defined_categorical_columns: List[str] | None = None, **extra_data: Any)
model_suite: Annotated[Optional[str], Field(title='Model Suite')]
error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
model_configs: Annotated[Optional[List[ModelConfig]], Field(title='Model Configs')]
model_alias: Annotated[Optional[Union[str, ModelAlias]], Field(title='Model Alias')]
llm_judge_columns: Annotated[Optional[List[str]], Field(title='Llm Judge Columns')]
columns_to_ignore: Annotated[Optional[List[str]], Field(title='Columns To Ignore')]
validation_columns: Annotated[Optional[List[str]], Field(title='Validation Columns')]
defined_categorical_columns: Annotated[Optional[List[str]], Field(title='Defined Categorical Columns')]
class Holdout(*, holdout: float | int | None = None, max_holdout: int | None = None, group_by: str | None = None, **extra_data: Any)
holdout: Annotated[Optional[Union[float, int]], Field(title='Holdout')]
max_holdout: Annotated[Optional[int], Field(title='Max Holdout')]
group_by: Annotated[Optional[str], Field(title='Group By')]
class GenerateColumnConfigFromInstruction(*, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, model_configs: List[ModelConfig] | None = None, model_alias: str | ModelAlias | None = 'code', name: str, instruction: str, edit_task: GenerateColumnFromTemplateV2Config | None = None, existing_columns: ExistingColumns | None = {'columns': []}, use_reasoning: bool | None = True, must_depend_on: List[str] | None = None, **extra_data: Any)
model_suite: Annotated[Optional[str], Field(title='Model Suite')]
error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
model_configs: Annotated[Optional[List[ModelConfig]], Field(title='Model Configs')]
model_alias: Annotated[Optional[Union[str, ModelAlias]], Field(title='Model Alias')]
name: Annotated[str, Field(title='Name')]
instruction: Annotated[str, Field(title='Instruction')]
edit_task: GenerateColumnFromTemplateV2Config | None
existing_columns: Annotated[Optional[ExistingColumns], Field()]
use_reasoning: Annotated[Optional[bool], Field(title='Use Reasoning')]
must_depend_on: Annotated[Optional[List[str]], Field(title='Must Depend On')]
class SampleFromDataset(*, num_samples: int | None = None, strategy: SamplingStrategy | None = 'ordered', with_replacement: bool | None = False, **extra_data: Any)
num_samples: Annotated[Optional[int], Field(title='Num Samples')]
strategy: SamplingStrategy | None
with_replacement: Annotated[Optional[bool], Field(title='With Replacement')]
class JudgeWithLlm(*, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, model_configs: List[ModelConfig] | None = None, model_alias: str | ModelAlias | None = 'judge', prompt: str, num_samples_to_judge: int | None = 100, rubrics: List[Rubric], result_column: str | None = 'llm_judge_results', judge_random_seed: int | None = 2025, **extra_data: Any)
model_suite: Annotated[Optional[str], Field(title='Model Suite')]
error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
model_configs: Annotated[Optional[List[ModelConfig]], Field(title='Model Configs')]
model_alias: Annotated[Optional[Union[str, ModelAlias]], Field(title='Model Alias')]
prompt: Annotated[str, Field(description='Template for generating prompts. Use Jinja2 templates to reference dataset columns.', title='Prompt')]
num_samples_to_judge: Annotated[Optional[int], Field(description='Number of samples to judge. Default is 100.', title='Num Samples To Judge')]
rubrics: Annotated[List[Rubric], Field(description='List of rubric configurations to use for evaluation. At least one must be provided.', min_length=1, title='Rubrics')]
result_column: Annotated[Optional[str], Field(description='Column name to store judge results.', title='Result Column')]
judge_random_seed: Annotated[Optional[int], Field(description='Random seed to use for selecting samples to judge. Same seed ensures same samples are selected each time.', title='Judge Random Seed')]
class GenerateColumnsUsingSamplers(*, num_records: int | None = 100, data_schema: DataSchema, max_rejections_factor: int | None = 5, **extra_data: Any)
num_records: Annotated[Optional[int], Field(title='Num Records')]
data_schema: DataSchema
max_rejections_factor: Annotated[Optional[int], Field(title='Max Rejections Factor')]
class EvaluateSafeSyntheticsDataset(*, skip_attribute_inference_protection: bool | None = False, attribute_inference_protection_quasi_identifier_count: int | None = 3, skip_membership_inference_protection: bool | None = False, membership_inference_protection_column_name: str | None = None, skip_pii_replay: bool | None = False, pii_replay_entities: List[str] | None = None, pii_replay_columns: List[str] | None = None, **extra_data: Any)
skip_attribute_inference_protection: Annotated[Optional[bool], Field(title='Skip Attribute Inference Protection')]
attribute_inference_protection_quasi_identifier_count: Annotated[Optional[int], Field(gt=0, title='Attribute Inference Protection Quasi Identifier Count')]
skip_membership_inference_protection: Annotated[Optional[bool], Field(title='Skip Membership Inference Protection')]
membership_inference_protection_column_name: Annotated[Optional[str], Field(title='Membership Inference Protection Column Name')]
skip_pii_replay: Annotated[Optional[bool], Field(title='Skip Pii Replay')]
pii_replay_entities: Annotated[Optional[List[str]], Field(title='Pii Replay Entities')]
pii_replay_columns: Annotated[Optional[List[str]], Field(title='Pii Replay Columns')]
class ValidateCode(*, code_lang: CodeLang, target_columns: List[str], result_columns: List[str], **extra_data: Any)
code_lang: CodeLang
target_columns: Annotated[List[str], Field(title='Target Columns')]
result_columns: Annotated[List[str], Field(title='Result Columns')]
class SeedFromRecords(*, records: List[Dict[str, Any]], **extra_data: Any)
records: Annotated[List[Dict[str, Any]], Field(title='Records')]
class EvaluateDataset(*, seed_columns: List[str], ordered_list_like_columns: List[str] | None = None, other_list_like_columns: List[str] | None = None, llm_judge_column: str | None = '', columns_to_ignore: List[str] | None = None, **extra_data: Any)
seed_columns: Annotated[List[str], Field(title='Seed Columns')]
ordered_list_like_columns: Annotated[Optional[List[str]], Field(title='Ordered List Like Columns')]
other_list_like_columns: Annotated[Optional[List[str]], Field(title='Other List Like Columns')]
llm_judge_column: Annotated[Optional[str], Field(title='Llm Judge Column')]
columns_to_ignore: Annotated[Optional[List[str]], Field(title='Columns To Ignore')]
class TabularGan(*, train: TrainTabularGANConfig | None = None, generate: GenerateFromTabularGANConfig | None = None, **extra_data: Any)
train: TrainTabularGANConfig | None
generate: GenerateFromTabularGANConfig | None
class TabularFt(*, train: TrainTabularFTConfig | None = None, generate: GenerateFromTabularFTConfig | None = None, **extra_data: Any)
train: TrainTabularFTConfig | None
generate: GenerateFromTabularFTConfig | None
class Transform(*, globals: Globals | None = {'classify': {'enable': None, 'entities': None, 'num_samples': 3}, 'locales': None, 'lock_columns': None, 'ner': {'enable_gliner': True, 'enable_regexps': False, 'entities': None, 'gliner_batch_mode': {'batch_size': 8, 'chunk_length': 512, 'enable': True}, 'ner_optimized': True, 'ner_threshold': 0.7}, 'seed': None}, steps: List[StepDefinition], model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, **extra_data: Any)
globals: Annotated[Optional[Globals], Field(description='Global config options.', title='Globals')]
steps: Annotated[List[StepDefinition], Field(description='list of transform steps to perform on input.', max_length=10, min_length=1, title='Steps')]
model_suite: Annotated[Optional[str], Field(title='Model Suite')]
error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
class TextFt(*, train: TrainTextFTConfig | None = None, generate: GenerateFromTextFTConfig | None = None, **extra_data: Any)
train: TrainTextFTConfig | None
generate: GenerateFromTextFTConfig | None
class PromptPretrainedModel(*, pretrained_model: str | None = 'meta-llama/Llama-3.1-8B-Instruct', prompt_template: str | None = None, generate: GenerateParams | None = None, **extra_data: Any)
pretrained_model: Annotated[Optional[str], Field(description='Select the text generation model to fine-tune from HuggingFace. Defaults to `meta-llama/Llama-3.1-8B-Instruct`.', title='Pretrained Model')]
prompt_template: Annotated[Optional[str], Field(description="All prompt inputs are formatted according to this template. The template must either start with '@' and reference the name of a pre-defined template, or contain a single '%s' formatting verb.", title='Prompt Template')]
generate: GenerateParams | None
class DataSource(*, data_source: str, **extra_data: Any)
data_source: Annotated[str, Field(title='Data Source')]
class AzureDestination(*, connection_id: str, path: str, container: str, **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
path: Annotated[str, Field(title='Path')]
container: Annotated[str, Field(title='Container')]
class AzureSource(*, connection_id: str, path: str, container: str, **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
path: Annotated[str, Field(title='Path')]
container: Annotated[str, Field(title='Container')]
class MssqlDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
database: Annotated[Optional[str], Field(title='Database')]
table: Annotated[str, Field(title='Table')]
sync: DestinationSyncConfig | None
class MssqlSource(*, connection_id: str, queries: List[Query], **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]
class GcsDestination(*, connection_id: str, path: str, bucket: str, **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
path: Annotated[str, Field(title='Path')]
bucket: Annotated[str, Field(title='Bucket')]
class GcsSource(*, connection_id: str, path: str, bucket: str, **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
path: Annotated[str, Field(title='Path')]
bucket: Annotated[str, Field(title='Bucket')]
class BigqueryDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, bq_dataset: str | None = None, **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
database: Annotated[Optional[str], Field(title='Database')]
table: Annotated[str, Field(title='Table')]
sync: DestinationSyncConfig | None
bq_dataset: Annotated[Optional[str], Field(title='Bq Dataset')]
class BigquerySource(*, connection_id: str, queries: List[Query], **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]
class SnowflakeDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
database: Annotated[Optional[str], Field(title='Database')]
table: Annotated[str, Field(title='Table')]
sync: DestinationSyncConfig | None
class SnowflakeSource(*, connection_id: str, queries: List[Query], **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]
class PostgresDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
database: Annotated[Optional[str], Field(title='Database')]
table: Annotated[str, Field(title='Table')]
sync: DestinationSyncConfig | None
class PostgresSource(*, connection_id: str, queries: List[Query], **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]
class DatabricksDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, volume: str | None = 'gretel_databricks_connector', **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
database: Annotated[Optional[str], Field(title='Database')]
table: Annotated[str, Field(title='Table')]
sync: DestinationSyncConfig | None
volume: Annotated[Optional[str], Field(title='Volume')]
class DatabricksSource(*, connection_id: str, queries: List[Query], **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]
class OracleDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
database: Annotated[Optional[str], Field(title='Database')]
table: Annotated[str, Field(title='Table')]
sync: DestinationSyncConfig | None
class OracleSource(*, connection_id: str, queries: List[Query], **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]
class S3Destination(*, connection_id: str, path: str, bucket: str, **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
path: Annotated[str, Field(title='Path')]
bucket: Annotated[str, Field(title='Bucket')]
class S3Source(*, connection_id: str, path: str, bucket: str, **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
path: Annotated[str, Field(title='Path')]
bucket: Annotated[str, Field(title='Bucket')]
class MysqlDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
database: Annotated[Optional[str], Field(title='Database')]
table: Annotated[str, Field(title='Table')]
sync: DestinationSyncConfig | None
class MysqlSource(*, connection_id: str, queries: List[Query], **extra_data: Any)
connection_id: Annotated[str, Field(title='Connection Id')]
queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]