Tabular
- class gretel_client.inference_api.tabular.TabularInferenceAPI(backend_model: str | None = None, *, verify_ssl: bool = True, session: ClientConfig | None = None, skip_configure_session: bool | None = False, **session_kwargs)
Inference API for real-time data generation with Gretel Navigator.
- Parameters:
backend_model (str, optional) – The model that is used under the hood. If None, the latest default model will be used. See the backend_model_list property for a list of available models.
**session_kwargs – kwargs for your Gretel session.
- Raises:
GretelInferenceAPIError – If the specified backend model is not valid.
- display_dataframe_in_notebook(dataframe: DataFrame, settings: dict | None = None) None
Display pandas DataFrame in notebook with better settings for readability.
This function is intended to be used in a Jupyter notebook.
- Parameters:
dataframe – The pandas DataFrame to display.
settings – Optional properties to set on the DataFrame’s style. If None, default settings with text wrapping are used.
- edit(prompt: str, *, seed_data: DataFrame | List[dict[str, Any]], chunk_size: int = 25, temperature: float = 0.7, top_k: int = 40, top_p: float = 0.95, stream: bool = False, as_dataframe: bool = True, disable_progress_bar: bool = False) Iterator[dict[str, Any]] | DataFrame | List[dict[str, Any]] | dict[str, Any]
Edit the seed data according to the given prompt.
- Parameters:
prompt – A prompt specifying how to edit the seed data.
seed_data – The seed data to edit. Must be a pandas DataFrame or a list of dicts (see format in the example below).
chunk_size – The seed data will be divided up into chunks of this size. Each chunk will receive its own upstream request to be edited.
temperature – Sampling temperature. Higher values make output more random.
top_k – Number of highest probability tokens to keep for top-k filtering.
top_p – The cumulative probability cutoff for sampling tokens.
stream – If True, stream the generated data.
as_dataframe – If True, return the data as a pandas DataFrame. This parameter is ignored if stream is True.
disable_progress_bar – If True, disable progress bar. Ignored if stream is True.
- Raises:
GretelInferenceAPIError – If the seed data is an invalid type.
- Returns:
The stream iterator or the generated data records.
Example:
from gretel_client.inference_api.tabular import TabularInferenceAPI # Example seed data if using a list of dicts. # You can also use a pandas DataFrame. seed_data = [ { "first_name": "Homer", "last_name": "Simpson", "favorite_band": "The Rolling Stones", "favorite_tv_show": "Breaking Bad", }, { "first_name": "Marge", "last_name": "Simpson", "favorite_band": "The Beatles", "favorite_tv_show": "Friends", } ] prompt = "Please add a column with the character's favorite food." tabular = TabularInferenceAPI(api_key="prompt") df = tabular.edit(prompt=prompt, seed_data=seed_data)
- generate(prompt: str, *, num_records: int, temperature: float = 0.7, top_k: int = 40, top_p: float = 0.95, sample_data: DataFrame | List[dict[str, Any]] | None = None, stream: bool = False, as_dataframe: bool = True, disable_progress_bar: bool = False, sample_buffer_size: int = 0) Iterator[dict[str, Any]] | DataFrame | List[dict[str, Any]] | dict[str, Any]
Generate synthetic data.
Each request to Gretel will generate at most 50 records at a time. This method will make multiple requests, as needed, to fulfill the desired num_records provided. If a request does not produce records within the self.request_timeout_sec limit, that request will be dropped and new requests will be made automatically.
When multiple requests are made to fulfill the num_records provided you may optionally pass the “last N” records generated to subsequent requests by setting sample_buffer_size to something like 5. When sample records are passed to subsequent requests the LLM will use that as context for record generation. This is useful for keeping continuity in fields between requests (such as monotonic values, etc).
- Parameters:
prompt – The prompt for generating synthetic tabular data.
num_records – The number of records to generate.
temperature – Sampling temperature. Higher values make output more random.
top_k – Number of highest probability tokens to keep for top-k filtering.
top_p – The cumulative probability cutoff for sampling tokens.
sample_data –
The sample data to guide the initial generation process. Things to keep in mind when using this parameter: - The generated data will use exact column names from “sample_data”. - It’s important for the prompt and “sample_data” match.
For example, they should refer to the same columns.
Use sample data to provide examples of the data you want generated, E.g. if you need specific data formats.
stream – If True, stream the generated data.
as_dataframe – If True, return the data as a pandas DataFrame. This parameter is ignored if stream is True.
disable_progress_bar – If True, disable progress bar. Ignored if stream is True.
sample_buffer_size – How many of the last N generated records that should be provided as sample data to subsequent generation requests.
- Returns:
The stream iterator or the generated data records.
Example:
from gretel_client.inference_api.tabular import TabularInferenceAPI prompt = ( "Generate positive and negative reviews for the following products: " "red travel mug, leather mary jane heels, unicorn LED night light. " "Include columns for the product name, number of stars (1-5), review, and customer id." ) tabular = TabularInferenceAPI(api_key="prompt") df = tabular.generate(prompt=prompt, num_records=10) # Another example with sample data sample_data = [ { "review_date": "2021-01-01", "product_name": "red travel mug", "stars": 5, "review": "I love this mug!", "customer_id": 123, } ] df = tabular.generate(prompt=prompt, num_records=10, sample_data=sample_data)
- max_retry_count: int = 3
How many errors or timeouts should be tolerated before the generation process raises a user-facing error.
- property name: str
Returns display name for this inference api.
- request_timeout_sec: int = 60
When generating data, if a request does not return data records in N seconds a new request will automatically be made to continue record generation.