Management
- class gretel_client.workflows.manager.WorkflowManager(api_factory: GretelApiProviderProtocol, resource_provider: GretelResourceProviderProtocol)
Provides a low-level interface for interacting with Gretel Workflows.
Note: This class should never be directly instantiated. Instead you should interact with the class from the Gretel client session.
For example
To fetch an existing Workflow Run:
from gretel.navigator_client import Gretel gretel = Gretel() workflow = gretel.workflows.get_workflow_run("wr_run_id_here")
- builder(globals: Globals | None = None) WorkflowBuilder
Creates a new workflow builder instance. This can be used to construct Workflows using a fluent builder pattern.
- Parameters:
globals – Configure global variables for the Workflow.
- Returns:
A fluent builder to construct Workflows.
- Return type:
- create(tasks: list[pydantic.main.BaseModel], wait_until_done: bool = False) WorkflowRun
Creates and executes a workflow from a list of task configurations.
- Parameters:
tasks – List of task configurations to include in the workflow.
- Returns:
The executed workflow run instance.
- Return type:
- get_workflow_run(workflow_run_id: str) WorkflowRun
Retrieves a specific workflow run by ID.
- Parameters:
workflow_run_id – The ID of the workflow run to retrieve.
- Returns:
The workflow run instance.
- Return type:
- registry() dict[str, Any]
Retrieves the workflow registry.
- Returns:
The workflow registry.
- Return type:
object
- class gretel_client.workflows.workflow.WorkflowRun(workflow: WorkflowRun, api_provider: GretelApiProviderProtocol, resource_provider: GretelResourceProviderProtocol)
The WorkflowRun class represents a concrete execution of a Workflow, providing methods to monitor execution, retrieve outputs, and access logs. Each workflow execution is composed of
steps
that form a directed acyclic graph (DAG).You should never directly instantiate a WorkflowRun, but instead use Workflow methods from the main
Gretel
class.- property config: dict
Return the Workflow config as a dictionary
- property config_yaml: str
Return the Workflow config as yaml
- property console_url: str
Get the URL for viewing this Workflow Run in the Gretel Console.
- fetch_status() Status
Fetch the latest status of the Workflow
- get_step_output(step_name: str, format: str | None = None) PydanticModel | Dataset | Report | IO
Retrieve the output from a specific workflow step.
- Parameters:
step_name – Name of the workflow step
format – Optional output format specification
- Returns:
The step output in the appropriate format (PydanticModel, Dataset, Report, or IO)
- Raises:
Exception – If the step cannot be found or output type cannot be determined
- property id: str
Get the ID of the Workflow Run
- property name: str
Get the name of the Workflow
- wait_until_done(wait: int = -1, verbose: bool = True, log_printer: LogPrinter | None = None)
Wait for the workflow run to complete, with optional logging.
- Parameters:
wait – Maximum time to wait in seconds. -1 means wait indefinitely
verbose – Whether to print detailed logs during execution
log_printer – Custom log printer implementation. If None, uses LoggingPrinter
- property workflow_id: str
Get the ID of the parent Workflow
Outputs
- class gretel_client.workflows.io.Dataset(df: DataFrame)
Represents tabular data generated by a Workflow.
The Dataset class provides a wrapper around a pandas DataFrame to represent output from Workflow.
This class should never be directly instantiated, but should instead get called from the parent Workflow Run, eg:
gretel.workflows.get_workflow_run("workflow run id").dataset
- property df: DataFrame
Get the Datasets as a pandas DataFrame
- download(file: str | Path | IO, format: Literal['csv', 'parquet'] = 'parquet') None
Save the dataset to a file in either CSV or parquet format.
- Parameters:
file – The target file path or file-like object where the data will be saved.
format – The output format, either “csv” or “parquet”. Defaults to “parquet”.
Note
If a string or Path is provided, any necessary parent directories will be created automatically.
- class gretel_client.workflows.io.PydanticModel(model_dict: dict)
Some Workflow steps produce structured data as pydantic objects. This class is a wrapper around those objects providing methods to interact with the underlying data structure.
- property dict: dict
Return the dictionary representation of the output
- class gretel_client.workflows.io.Report(report_dict: dict, report_downloader: Callable[[Literal['json', 'html'] | None], IO])
Represents an evaluation report for synthetic data generated by workflows.
The Report class provides functionality to display, and save evaluation report comparing output data with the reference dataset.
This class should never be directly instantiated, but should instead get called from the parent Workflow Run, eg:
gretel.workflows.get_workflow_run("workflow run id").report
- property dict: dict
Get the report as a dictionary
- display_in_browser()
Display the HTML report in a browser.
- display_in_notebook()
Display the HTML report in a notebook.
- download(file: str | Path | IO, format: Literal['json', 'html'] = 'html')
Save the report to a file in either JSON or HTML format.
- Parameters:
file – The target file path or file-like object where the report will be saved.
format – The output format, either “json” or “html”. Defaults to “json”.
Note
If a string or Path is provided, any necessary parent directories will be created automatically.
- property table: Table
Get a formatted rich Table representation of the report.
- Returns:
- A rich Table instance containing the report data formatted
for display.
- Return type:
Table
Builder
- class gretel_client.workflows.builder.FieldViolation(*, field: str, error_message: str)
Represent a field that has failed schema validation
- class gretel_client.workflows.builder.LogMessage(level: 'str', msg: 'str')
- class gretel_client.workflows.builder.Message(step: 'str', stream: 'str', payload: 'dict', type: 'str', ts: 'datetime')
- payload: dict
The actual value of the output
- raise_for_error() None
Check for fatal errors and raise an exception if found.
- step: str
The name of the step
- stream: str
The stream the message should be associated with.
We use multiple streams so that we can differentiate between different types of outputs.
- ts: datetime
The date and time the message was created
- type: str
The type of message
- class gretel_client.workflows.builder.WorkflowBuilder(project_id: str, globals: Globals, api_provider: GretelApiProviderProtocol, resource_provider: GretelResourceProviderProtocol, workflow_session_manager: WorkflowSessionManager | None = None)
A builder class for creating Gretel workflows.
This class provides a fluent interface for constructing Workflow objects by chaining method calls. It allows setting the workflow name and adding steps sequentially.
- add_step(step: BaseModel | Step, step_inputs: list[pydantic.main.BaseModel | Step | str] | None = None, validate: bool = True, step_name: str | None = None) Self
Add a single step to the workflow.
- Parameters:
step – The workflow step to add.
step_input – Configure an input for the step or task.
validate – Whether to validate the step. Defaults to True.
step_name – The name of the step. If not provided, the name will be generated based on the name of the task.
- Returns:
The builder instance for method chaining.
- Return type:
Self
- add_steps(steps: list[pydantic.main.BaseModel | Step], validate: bool = True) Self
Add multiple steps to the workflow.
- Parameters:
steps – A list of workflow steps to add.
validate – Whether to validate the steps. Defaults to True.
- Returns:
The builder instance for method chaining.
- Return type:
Self
- property data_source: str | None
Return the current input data source for the builder.
- for_workflow(workflow_id: str | None = None) Self
Configure this builder to use an existing workflow.
When a workflow ID is specified, the run() method will execute a new run within the context of the existing workflow instead of creating a new workflow. This allows multiple runs to share the same workflow.
- Parameters:
workflow_id – The ID of an existing workflow to use. If set to None, a new workflow will get created for the subsequent run.
- Returns:
The builder instance for method chaining
- Return type:
Self
- iter_preview() Iterator[Message | WorkflowInterruption]
Stream workflow execution messages for preview purposes.
This method executes the workflow in streaming preview mode, returning an iterator that yields messages as they are received from the workflow execution. This allows for real-time monitoring of workflow execution before you submit your workflow for batch execution.
- Returns:
- An iterator that yields:
Message objects containing logs, outputs, and state changes from the workflow
WorkflowInterruption if the stream is unexpectedly disconnected
- Return type:
Iterator[Union[Message, WorkflowInterruption]]
- prepare_data_source(data_source: str | Path | DataFrame | File, purpose: str = 'dataset') str
Uploads the data source to the Files API if it is not already a File and returns the file ID.
- preview(log_printer: ~typing.Callable[[~gretel_client.workflows.builder.Message | ~gretel_client.workflows.builder.WorkflowInterruption], None] = <function _default_preview_printer>)
Preview the workflow in realtime.
- Parameters:
log_printer – A callable that processes each message or interruption. Defaults to _default_preview_printer which logs messages to the console in a human-readable format. You can provide your own function to customize how messages are processed.
- run(name: str | None = None, run_name: str | None = None, wait_until_done: bool = False) WorkflowRun
Execute the workflow as a batch job.
This method creates a persistent workflow and runs it as a batch job on the Gretel platform. Unlike preview, this creates a permanent record of the workflow execution that can be referenced later.
- Parameters:
name – Optional name to assign to the workflow. If provided, this will override any name previously set with the name() method.
run_name – Optional name to assign to this specific run of the workflow.
wait_until_done – Block until the workflow has completed running. If set to false the method will immediately return the WorkflowRun object.
- Returns:
- A WorkflowRun object representing the running workflow.
This can be used to track the status of the workflow and retrieve results when it completes.
- Return type:
- set_name(name: str) Self
Set the name of the workflow.
- Parameters:
name – The name to assign to the workflow.
- Returns:
The builder instance for method chaining.
- Return type:
Self
- property step_names: list[str]
Return a list of step names for the current builder
- to_dict() dict
Convert the workflow to a dictionary representation.
- Returns:
A dictionary representation of the workflow.
- Return type:
dict
- to_workflow() Workflow
Convert the builder to a Workflow object.
- Returns:
A new Workflow instance with the configured name and steps.
- Return type:
- to_yaml() str
Convert the workflow to a YAML string representation.
- Returns:
A YAML string representation of the workflow.
- Return type:
str
- validate_step(step: Step) str
Validate a workflow step using the Gretel API.
This method makes an API call to validate the configuration of a workflow step before adding it to the workflow. It ensures the task type and configuration are valid according to the Gretel platform’s requirements.
- Parameters:
step – The workflow step to validate, containing task name and configuration.
- Returns:
Validation message if successful. Empty string if no message was returned.
- Return type:
str
- Raises:
WorkflowValidationError – If the step fails validation. The exception includes field-specific violations that can be accessed via the field_violations property.
ApiException – If there is an issue with the API call not related to validation.
- with_data_source(data_source: str | Path | DataFrame | File, purpose: str = 'dataset', use_data_source_step: bool = False) Self
Add a data source to the workflow.
This method allows you to specify the primary data input for your workflow. The data source will be connected to the first step in the workflow chain.
- Parameters:
data_source – The data to use as input. Can be one of: - A File object from the Gretel SDK - A file ID string (starting with “file_”) - A path to a local file (string or Path) - A pandas DataFrame
purpose – The purpose tag for the uploaded data. Defaults to “dataset”.
use_data_source_step – Instead of passing a file_id as an input, use the DataSource task. Generally you shouldn’t need to set this.
- Returns:
The builder instance for method chaining.
- Return type:
Self
Examples
# Using a pandas DataFrame builder.with_data_source(df)
# Using a file path builder.with_data_source(“path/to/data.csv”)
# Using an existing Gretel File object builder.with_data_source(file_obj)
# Using a file ID builder.with_data_source(“file_abc123”)
- class gretel_client.workflows.builder.WorkflowInterruption(message: str)
Provide a user friendly error message when a workflow is unexpectedly interrupted.
- exception gretel_client.workflows.builder.WorkflowTaskError
Represents an error returned by the Task. This error is most likely related to an issue with the Task itself. If you see this error check your Task config first. If the issue persists, the error might be a bug in the remote Task implementation.
- exception gretel_client.workflows.builder.WorkflowValidationError(msg: str, *, task_name: str | None = None, step_name: str | None = None, field_violations: list[FieldViolation] | None = None)
Raised when workflow schema validation fails.
Use field_violations to access validation errors by field name.
Configs
- class gretel_client.workflows.configs.workflows.DistributionType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
- class gretel_client.workflows.configs.workflows.GenerationParameters(*, temperature: float | UniformDistribution | ManualDistribution | None = None, top_p: float | UniformDistribution | ManualDistribution | None = None, **extra_data: Any)
- class gretel_client.workflows.configs.workflows.Globals(*, num_records: int | None = None, model_suite: str | None = None, model_configs: List[ModelConfig] | None = None, error_rate: float | None = 0.2, **extra_data: Any)
- class gretel_client.workflows.configs.workflows.ManualDistribution(*, distribution_type: DistributionType | None = 'manual', params: ManualDistributionParams, **extra_data: Any)
- class gretel_client.workflows.configs.workflows.ManualDistributionParams(*, values: List[float], weights: List[float] | None = None, **extra_data: Any)
- class gretel_client.workflows.configs.workflows.ModelConfig(*, alias: str, model_name: str, generation_parameters: GenerationParameters, **extra_data: Any)
- class gretel_client.workflows.configs.workflows.Step(*, name: str, task: str, inputs: List[str] | None = None, config: Dict[str, Any], **extra_data: Any)
- class gretel_client.workflows.configs.workflows.UniformDistribution(*, distribution_type: DistributionType | None = 'uniform', params: UniformDistributionParams, **extra_data: Any)
- class gretel_client.workflows.configs.workflows.UniformDistributionParams(*, low: float, high: float, **extra_data: Any)
Tasks
- class gretel_client.workflows.configs.registry.Registry
- class ConcatDatasets(**extra_data: Any)
- class ExtractDataSeedsFromSampleRecords(*, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, max_num_seeds: int | None = 5, num_assistants: int | None = 5, dataset_context: str | None = '', system_prompt_type: SystemPromptType | None = 'cognition', num_samples: int | None = 25, **extra_data: Any)
- model_suite: Annotated[Optional[str], Field(title='Model Suite')]
- error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
- max_num_seeds: Annotated[Optional[int], Field(ge=1, le=10, title='Max Num Seeds')]
- num_assistants: Annotated[Optional[int], Field(ge=1, le=8, title='Num Assistants')]
- dataset_context: Annotated[Optional[str], Field(title='Dataset Context')]
- system_prompt_type: SystemPromptType | None
- num_samples: Annotated[Optional[int], Field(title='Num Samples')]
- class IdGenerator(*, num_records: int | None = 100, **extra_data: Any)
- num_records: Annotated[Optional[int], Field(title='Num Records')]
- class LoadDataSeeds(*, seed_categories: List[SeedCategory], dataset_schema_map: Dict[str, Any] | None = None, **extra_data: Any)
- seed_categories: Annotated[List[SeedCategory], Field(title='Seed Categories')]
- dataset_schema_map: Annotated[Optional[Dict[str, Any]], Field(title='Dataset Schema Map')]
- class GenerateColumnFromTemplateV2(*, num_records: int | None = 100, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, model_configs: List[ModelConfig] | None = None, model_alias: str | ModelAlias | None = 'text', prompt: str, name: str | None = 'response', system_prompt: str | None = None, output_type: OutputType | None = 'text', output_format: str | Dict[str, Any] | None = None, description: str | None = '', **extra_data: Any)
- num_records: Annotated[Optional[int], Field(title='Num Records')]
- model_suite: Annotated[Optional[str], Field(title='Model Suite')]
- error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
- model_configs: Annotated[Optional[List[ModelConfig]], Field(title='Model Configs')]
- model_alias: Annotated[Optional[Union[str, ModelAlias]], Field(title='Model Alias')]
- prompt: Annotated[str, Field(title='Prompt')]
- name: Annotated[Optional[str], Field(title='Name')]
- system_prompt: Annotated[Optional[str], Field(title='System Prompt')]
- output_type: OutputType | None
- output_format: Annotated[Optional[Union[str, Dict[str, Any]]], Field(title='Output Format')]
- description: Annotated[Optional[str], Field(title='Description')]
- class DropColumns(*, columns: List[str], **extra_data: Any)
- columns: Annotated[List[str], Field(title='Columns')]
- class NameGenerator(*, num_records: int | None = 100, column_name: str | None = 'name', seed: int | None = None, should_fail: bool | None = False, **extra_data: Any)
- num_records: Annotated[Optional[int], Field(title='Num Records')]
- column_name: Annotated[Optional[str], Field(title='Column Name')]
- seed: Annotated[Optional[int], Field(title='Seed')]
- should_fail: Annotated[Optional[bool], Field(title='Should Fail')]
- class GenerateDatasetFromSampleRecords(*, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, target_num_records: int | None = 500, system_prompt_type: SystemPromptType | None = 'cognition', num_records_per_seed: int | None = 5, append_seeds_to_dataset: bool | None = True, num_examples_per_prompt: int | None = 5, dataset_context: str | None = '', **extra_data: Any)
- model_suite: Annotated[Optional[str], Field(title='Model Suite')]
- error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
- target_num_records: Annotated[Optional[int], Field(ge=50, le=10000, title='Target Num Records')]
- system_prompt_type: SystemPromptType | None
- num_records_per_seed: Annotated[Optional[int], Field(ge=1, le=10, title='Num Records Per Seed')]
- append_seeds_to_dataset: Annotated[Optional[bool], Field(title='Append Seeds To Dataset')]
- num_examples_per_prompt: Annotated[Optional[int], Field(ge=1, le=50, title='Num Examples Per Prompt')]
- dataset_context: Annotated[Optional[str], Field(title='Dataset Context')]
- class SampleDataSeeds(*, num_records: int | None = 100, **extra_data: Any)
- num_records: Annotated[Optional[int], Field(title='Num Records')]
- class RunSampleToDataset(*, target_num_records: int | None = 500, system_prompt_type: SystemPromptType | None = 'cognition', num_records_per_seed: int | None = 5, num_examples_per_prompt: int | None = 5, max_num_seeds: int | None = 5, num_assistants: int | None = 5, append_seeds_to_dataset: bool | None = True, num_samples: int | None = 25, dataset_context: str | None = '', **extra_data: Any)
- target_num_records: Annotated[Optional[int], Field(ge=50, le=10000, title='Target Num Records')]
- system_prompt_type: SystemPromptType | None
- num_records_per_seed: Annotated[Optional[int], Field(ge=1, le=10, title='Num Records Per Seed')]
- num_examples_per_prompt: Annotated[Optional[int], Field(ge=1, le=50, title='Num Examples Per Prompt')]
- max_num_seeds: Annotated[Optional[int], Field(ge=1, le=10, title='Max Num Seeds')]
- num_assistants: Annotated[Optional[int], Field(ge=1, le=8, title='Num Assistants')]
- append_seeds_to_dataset: Annotated[Optional[bool], Field(title='Append Seeds To Dataset')]
- num_samples: Annotated[Optional[int], Field(title='Num Samples')]
- dataset_context: Annotated[Optional[str], Field(title='Dataset Context')]
- class GenerateColumnFromExpression(*, num_records: int | None = 100, name: str, expr: str, dtype: Dtype | None = 'str', **extra_data: Any)
- num_records: Annotated[Optional[int], Field(title='Num Records')]
- name: Annotated[str, Field(title='Name')]
- expr: Annotated[str, Field(title='Expr')]
- dtype: Annotated[Optional[Dtype], Field(title='Dtype')]
- class Combiner(*, num_records: int | None = 100, **extra_data: Any)
- num_records: Annotated[Optional[int], Field(title='Num Records')]
- class DummyTaskWithInputs(*, num_records: int | None = 100, **extra_data: Any)
- num_records: Annotated[Optional[int], Field(title='Num Records')]
- class DummyTaskWithListOfInputs(*, num_records: int | None = 100, **extra_data: Any)
- num_records: Annotated[Optional[int], Field(title='Num Records')]
- class TestFailingTask(*, num_records: int | None = 100, **extra_data: Any)
- num_records: Annotated[Optional[int], Field(title='Num Records')]
- class TestOptionalArgTask(**extra_data: Any)
- class TestRequiredAndOptionalArgsTask(**extra_data: Any)
- class TestTaskCallingTask(**extra_data: Any)
- class TestUnhandledErrorTask(*, foo: str, **extra_data: Any)
- foo: Annotated[str, Field(title='Foo')]
- class GenerateSamplingColumnConfigFromInstruction(*, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, model_configs: List[ModelConfig] | None = None, model_alias: str | ModelAlias | None = 'code', name: str, instruction: str, edit_task: SerializableConditionalDataColumn | None = None, existing_samplers: List[SerializableConditionalDataColumn] | None = None, **extra_data: Any)
- model_suite: Annotated[Optional[str], Field(title='Model Suite')]
- error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
- model_configs: Annotated[Optional[List[ModelConfig]], Field(title='Model Configs')]
- model_alias: Annotated[Optional[Union[str, ModelAlias]], Field(title='Model Alias')]
- name: Annotated[str, Field(title='Name')]
- instruction: Annotated[str, Field(title='Instruction')]
- edit_task: SerializableConditionalDataColumn | None
- existing_samplers: Annotated[Optional[List[SerializableConditionalDataColumn]], Field(title='Existing Samplers')]
- class EvaluateDataDesignerDataset(*, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, model_configs: List[ModelConfig] | None = None, model_alias: str | ModelAlias | None = 'text', llm_judge_columns: List[str] | None = None, columns_to_ignore: List[str] | None = None, validation_columns: List[str] | None = None, defined_categorical_columns: List[str] | None = None, **extra_data: Any)
- model_suite: Annotated[Optional[str], Field(title='Model Suite')]
- error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
- model_configs: Annotated[Optional[List[ModelConfig]], Field(title='Model Configs')]
- model_alias: Annotated[Optional[Union[str, ModelAlias]], Field(title='Model Alias')]
- llm_judge_columns: Annotated[Optional[List[str]], Field(title='Llm Judge Columns')]
- columns_to_ignore: Annotated[Optional[List[str]], Field(title='Columns To Ignore')]
- validation_columns: Annotated[Optional[List[str]], Field(title='Validation Columns')]
- defined_categorical_columns: Annotated[Optional[List[str]], Field(title='Defined Categorical Columns')]
- class Holdout(*, holdout: float | int | None = None, max_holdout: int | None = None, group_by: str | None = None, **extra_data: Any)
- holdout: Annotated[Optional[Union[float, int]], Field(title='Holdout')]
- max_holdout: Annotated[Optional[int], Field(title='Max Holdout')]
- group_by: Annotated[Optional[str], Field(title='Group By')]
- class GenerateColumnConfigFromInstruction(*, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, model_configs: List[ModelConfig] | None = None, model_alias: str | ModelAlias | None = 'code', name: str, instruction: str, edit_task: GenerateColumnFromTemplateV2Config | None = None, existing_columns: ExistingColumns | None = {'columns': []}, use_reasoning: bool | None = True, must_depend_on: List[str] | None = None, **extra_data: Any)
- model_suite: Annotated[Optional[str], Field(title='Model Suite')]
- error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
- model_configs: Annotated[Optional[List[ModelConfig]], Field(title='Model Configs')]
- model_alias: Annotated[Optional[Union[str, ModelAlias]], Field(title='Model Alias')]
- name: Annotated[str, Field(title='Name')]
- instruction: Annotated[str, Field(title='Instruction')]
- edit_task: GenerateColumnFromTemplateV2Config | None
- existing_columns: Annotated[Optional[ExistingColumns], Field()]
- use_reasoning: Annotated[Optional[bool], Field(title='Use Reasoning')]
- must_depend_on: Annotated[Optional[List[str]], Field(title='Must Depend On')]
- class SampleFromDataset(*, num_samples: int | None = None, strategy: SamplingStrategy | None = 'ordered', with_replacement: bool | None = False, **extra_data: Any)
- num_samples: Annotated[Optional[int], Field(title='Num Samples')]
- strategy: SamplingStrategy | None
- with_replacement: Annotated[Optional[bool], Field(title='With Replacement')]
- class JudgeWithLlm(*, model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, model_configs: List[ModelConfig] | None = None, model_alias: str | ModelAlias | None = 'judge', prompt: str, num_samples_to_judge: int | None = 100, rubrics: List[Rubric], result_column: str | None = 'llm_judge_results', judge_random_seed: int | None = 2025, **extra_data: Any)
- model_suite: Annotated[Optional[str], Field(title='Model Suite')]
- error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
- model_configs: Annotated[Optional[List[ModelConfig]], Field(title='Model Configs')]
- model_alias: Annotated[Optional[Union[str, ModelAlias]], Field(title='Model Alias')]
- prompt: Annotated[str, Field(description='Template for generating prompts. Use Jinja2 templates to reference dataset columns.', title='Prompt')]
- num_samples_to_judge: Annotated[Optional[int], Field(description='Number of samples to judge. Default is 100.', title='Num Samples To Judge')]
- rubrics: Annotated[List[Rubric], Field(description='List of rubric configurations to use for evaluation. At least one must be provided.', min_length=1, title='Rubrics')]
- result_column: Annotated[Optional[str], Field(description='Column name to store judge results.', title='Result Column')]
- judge_random_seed: Annotated[Optional[int], Field(description='Random seed to use for selecting samples to judge. Same seed ensures same samples are selected each time.', title='Judge Random Seed')]
- class GenerateColumnsUsingSamplers(*, num_records: int | None = 100, data_schema: DataSchema, max_rejections_factor: int | None = 5, **extra_data: Any)
- num_records: Annotated[Optional[int], Field(title='Num Records')]
- data_schema: DataSchema
- max_rejections_factor: Annotated[Optional[int], Field(title='Max Rejections Factor')]
- class EvaluateSafeSyntheticsDataset(*, skip_attribute_inference_protection: bool | None = False, attribute_inference_protection_quasi_identifier_count: int | None = 3, skip_membership_inference_protection: bool | None = False, membership_inference_protection_column_name: str | None = None, skip_pii_replay: bool | None = False, pii_replay_entities: List[str] | None = None, pii_replay_columns: List[str] | None = None, **extra_data: Any)
- skip_attribute_inference_protection: Annotated[Optional[bool], Field(title='Skip Attribute Inference Protection')]
- attribute_inference_protection_quasi_identifier_count: Annotated[Optional[int], Field(gt=0, title='Attribute Inference Protection Quasi Identifier Count')]
- skip_membership_inference_protection: Annotated[Optional[bool], Field(title='Skip Membership Inference Protection')]
- membership_inference_protection_column_name: Annotated[Optional[str], Field(title='Membership Inference Protection Column Name')]
- skip_pii_replay: Annotated[Optional[bool], Field(title='Skip Pii Replay')]
- pii_replay_entities: Annotated[Optional[List[str]], Field(title='Pii Replay Entities')]
- pii_replay_columns: Annotated[Optional[List[str]], Field(title='Pii Replay Columns')]
- class ValidateCode(*, code_lang: CodeLang, target_columns: List[str], result_columns: List[str], **extra_data: Any)
- code_lang: CodeLang
- target_columns: Annotated[List[str], Field(title='Target Columns')]
- result_columns: Annotated[List[str], Field(title='Result Columns')]
- class SeedFromRecords(*, records: List[Dict[str, Any]], **extra_data: Any)
- records: Annotated[List[Dict[str, Any]], Field(title='Records')]
- class EvaluateDataset(*, seed_columns: List[str], ordered_list_like_columns: List[str] | None = None, other_list_like_columns: List[str] | None = None, llm_judge_column: str | None = '', columns_to_ignore: List[str] | None = None, **extra_data: Any)
- seed_columns: Annotated[List[str], Field(title='Seed Columns')]
- ordered_list_like_columns: Annotated[Optional[List[str]], Field(title='Ordered List Like Columns')]
- other_list_like_columns: Annotated[Optional[List[str]], Field(title='Other List Like Columns')]
- llm_judge_column: Annotated[Optional[str], Field(title='Llm Judge Column')]
- columns_to_ignore: Annotated[Optional[List[str]], Field(title='Columns To Ignore')]
- class TabularGan(*, train: TrainTabularGANConfig | None = None, generate: GenerateFromTabularGANConfig | None = None, **extra_data: Any)
- train: TrainTabularGANConfig | None
- generate: GenerateFromTabularGANConfig | None
- class TabularFt(*, train: TrainTabularFTConfig | None = None, generate: GenerateFromTabularFTConfig | None = None, **extra_data: Any)
- train: TrainTabularFTConfig | None
- generate: GenerateFromTabularFTConfig | None
- class Transform(*, globals: Globals | None = {'classify': {'enable': None, 'entities': None, 'num_samples': 3}, 'locales': None, 'lock_columns': None, 'ner': {'enable_gliner': True, 'enable_regexps': False, 'entities': None, 'gliner_batch_mode': {'batch_size': 8, 'chunk_length': 512, 'enable': True}, 'ner_optimized': True, 'ner_threshold': 0.7}, 'seed': None}, steps: List[StepDefinition], model_suite: str | None = 'apache-2.0', error_rate: float | None = 0.2, **extra_data: Any)
- globals: Annotated[Optional[Globals], Field(description='Global config options.', title='Globals')]
- steps: Annotated[List[StepDefinition], Field(description='list of transform steps to perform on input.', max_length=10, min_length=1, title='Steps')]
- model_suite: Annotated[Optional[str], Field(title='Model Suite')]
- error_rate: Annotated[Optional[float], Field(ge=0.0, le=1.0, title='Error Rate')]
- class TextFt(*, train: TrainTextFTConfig | None = None, generate: GenerateFromTextFTConfig | None = None, **extra_data: Any)
- train: TrainTextFTConfig | None
- generate: GenerateFromTextFTConfig | None
- class PromptPretrainedModel(*, pretrained_model: str | None = 'meta-llama/Llama-3.1-8B-Instruct', prompt_template: str | None = None, generate: GenerateParams | None = None, **extra_data: Any)
- pretrained_model: Annotated[Optional[str], Field(description='Select the text generation model to fine-tune from HuggingFace. Defaults to `meta-llama/Llama-3.1-8B-Instruct`.', title='Pretrained Model')]
- prompt_template: Annotated[Optional[str], Field(description="All prompt inputs are formatted according to this template. The template must either start with '@' and reference the name of a pre-defined template, or contain a single '%s' formatting verb.", title='Prompt Template')]
- generate: GenerateParams | None
- class DataSource(*, data_source: str, **extra_data: Any)
- data_source: Annotated[str, Field(title='Data Source')]
- class AzureDestination(*, connection_id: str, path: str, container: str, **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- path: Annotated[str, Field(title='Path')]
- container: Annotated[str, Field(title='Container')]
- class AzureSource(*, connection_id: str, path: str, container: str, **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- path: Annotated[str, Field(title='Path')]
- container: Annotated[str, Field(title='Container')]
- class MssqlDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- database: Annotated[Optional[str], Field(title='Database')]
- table: Annotated[str, Field(title='Table')]
- sync: DestinationSyncConfig | None
- class MssqlSource(*, connection_id: str, queries: List[Query], **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]
- class GcsDestination(*, connection_id: str, path: str, bucket: str, **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- path: Annotated[str, Field(title='Path')]
- bucket: Annotated[str, Field(title='Bucket')]
- class GcsSource(*, connection_id: str, path: str, bucket: str, **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- path: Annotated[str, Field(title='Path')]
- bucket: Annotated[str, Field(title='Bucket')]
- class BigqueryDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, bq_dataset: str | None = None, **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- database: Annotated[Optional[str], Field(title='Database')]
- table: Annotated[str, Field(title='Table')]
- sync: DestinationSyncConfig | None
- bq_dataset: Annotated[Optional[str], Field(title='Bq Dataset')]
- class BigquerySource(*, connection_id: str, queries: List[Query], **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]
- class SnowflakeDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- database: Annotated[Optional[str], Field(title='Database')]
- table: Annotated[str, Field(title='Table')]
- sync: DestinationSyncConfig | None
- class SnowflakeSource(*, connection_id: str, queries: List[Query], **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]
- class PostgresDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- database: Annotated[Optional[str], Field(title='Database')]
- table: Annotated[str, Field(title='Table')]
- sync: DestinationSyncConfig | None
- class PostgresSource(*, connection_id: str, queries: List[Query], **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]
- class DatabricksDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, volume: str | None = 'gretel_databricks_connector', **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- database: Annotated[Optional[str], Field(title='Database')]
- table: Annotated[str, Field(title='Table')]
- sync: DestinationSyncConfig | None
- volume: Annotated[Optional[str], Field(title='Volume')]
- class DatabricksSource(*, connection_id: str, queries: List[Query], **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]
- class OracleDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- database: Annotated[Optional[str], Field(title='Database')]
- table: Annotated[str, Field(title='Table')]
- sync: DestinationSyncConfig | None
- class OracleSource(*, connection_id: str, queries: List[Query], **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- queries: Annotated[List[Query], Field(max_length=1, min_length=1, title='Queries')]
- class S3Destination(*, connection_id: str, path: str, bucket: str, **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- path: Annotated[str, Field(title='Path')]
- bucket: Annotated[str, Field(title='Bucket')]
- class S3Source(*, connection_id: str, path: str, bucket: str, **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- path: Annotated[str, Field(title='Path')]
- bucket: Annotated[str, Field(title='Bucket')]
- class MysqlDestination(*, connection_id: str, database: str | None = None, table: str, sync: DestinationSyncConfig | None = None, **extra_data: Any)
- connection_id: Annotated[str, Field(title='Connection Id')]
- database: Annotated[Optional[str], Field(title='Database')]
- table: Annotated[str, Field(title='Table')]
- sync: DestinationSyncConfig | None