Getting Started

Looking to get started creating synthetic data from an existing dataset? Use the Safe Synthetics SDK.

Looking to create data from scratch? Use the Data Designer SDK.

For more examples and notebooks demonstrating Gretel’s product, please see our blueprint repository at github.com/gretelai/gretel-blueprints.

You can find our main product documentation at docs.gretel.ai.

Configuring the Client

To get started with Gretel, you first construct a Gretel client for your session. That client provides access to all of Gretel’s APIs via a python SDK interface.

To instantiate the client

from gretel_client.navigator_client import Gretel

gretel = Gretel(api_key="prompt")

The example above will instantiate a client and prompt for an API key if one isn’t already configured on the system.

An API key can alternatively be provided directly to the constructor via the api_key param, or configured on the GRETEL_API_KEY environment variable.

Projects

Projects are an organizational construct to help manage resources, control access permissions, and enable collaboration through sharing features.

Each client session is configured with a default project configured on client instantiation

gretel = Gretel(default_project_id="my-project")

If the project exists that project will get loaded. If the project does not exist, the project will be created automatically and reused for the session.

If no project is specified a default project for the SDK will automatically get created.

You can also create a temporary project using gretel.temp_project(). This method is implemented as a context manager. Once you leave the scope of the block, the project is deleted automatically

with gretel.tmp_project() as tmp_gretel_client:
    ...

For more product details related to Projects, please see our docs here.

Workflows

Gretel Workflows provide an easy to use, config driven API for building synthetic data pipelines.

The Safe Synthetics SDK and Data Designer SDK construct Workflows automatically for you using declarative APIs, but may also construct your own Workflow.

If you’re just getting start with Gretel, we recommend you start with those high-level APIs before attempting to construct your own Workflows.

There are two options for constructing a Workflow from scratch. Using a fluent builder interface or passing a list of tasks to a workflows.create(...) method

Using the fluent builder

my_workflow = gretel.workflows.builder() \
    .add_step(gretel.tasks.DataSource(data_source="...")) \
    .add_step(gretel.tasks.Transform()) \
    .run()

Or constructing a list of tasks

my_workflow = gretel.workflows.create([
    gretel.tasks.DataSource(data_source="...")
    gretel.tasks.Transform()
])

You can find an exhaustive list of tasks under gretel_client.workflows.configs.registry.Registry.

Once the workflow has been created, a gretel_client.workflows.workflow.WorkflowRun object is returned. This class represents a concrete Workflow Run.

To block the current thread and stream logs for the current run you can call

my_workflow.wait_until_done()

Workflow Runs can be viewed in the Gretel console by calling

my_workflow.console_url()

Once a Workflow completes, you can access datasets, reports, and individual step outputs.

# Access the final Dataset produce by the Workflow as a Dataframe
my_workflow.dataset.df

# View the report for a Workflow
my_workflow.report.table

# Access individual step outputs
my_workflow.get_step_out("transform")

You can load existing WorkflowRun runs with

my_other_workflow = gretel.workflows.get_workflow_run("workflow run id here")

Module Reference

Files

The Files API provides a mechanism to upload data to Gretel and use it as inputs to a Workflow. You can upload files as remote URLs, local file paths, or in memory Dataframes.

from gretel_client.navigator_client import Gretel

gretel = Gretel()

file = gretel.files.upload("your_file.csv")

my_workflow = gretel.workflows.create([
    gretel.tasks.DataSource(data_source=file.id)
    gretel.tasks.Transform()
])

Module Reference