Output

Abstract Class

📁application_sdk.outputs

Output classes provide a unified interface for writing data to various destinations in the Application SDK. All output classes inherit from the base `Output` class and support writing data from pandas or daft DataFrames, with automatic chunking, buffering, and object store upload capabilities. All output classes share a common foundation through the base `Output` class, which provides consistent behavior across all output types. This includes automatic chunking, buffer management, statistics tracking, and object store uploads.

Properties

output_pathstr

Base path where output files are written

output_prefixstr

Prefix for files when uploading to object store

total_record_countint

Total number of records processed

chunk_countint

Number of chunks the output was split into

chunk_partint

Current part number within a chunk

buffer_sizeint

Size of the write buffer

max_file_size_bytesint

Maximum file size before splitting

partitionsList[int]

List of partition counts per chunk

Methods7

▸

write_daft_dataframe

async

async write_daft_dataframe(self, dataframe: daft.DataFrame) -> None

Writes a daft DataFrame to the output destination. Must be implemented by all output classes.

Parameters

dataframedaft.DataFrame

Required

The daft DataFrame to write

▸

write_dataframe

async

async write_dataframe(self, dataframe: pd.DataFrame) -> None

Writes a pandas DataFrame to the output, automatically handling chunking and file size management. Estimates file size based on DataFrame sample, splits large DataFrames into chunks based on buffer_size and max_file_size_bytes, automatically uploads files to object store when size limits are reached, and records metrics for successful writes.

Parameters

dataframepd.DataFrame

Required

The pandas DataFrame to write

▸

write_batched_dataframe

async

async write_batched_dataframe(self, batch_df: pd.DataFrame) -> None

Writes batched pandas DataFrames from async or sync generators. Handles both AsyncGenerator and Generator types, skips empty DataFrames automatically, and processes each batch through write_dataframe().

Parameters

batch_dfpd.DataFrame

Required

Batched pandas DataFrame to write

▸

write_batched_daft_dataframe

async

async write_batched_daft_dataframe(self, batch_daft_df: daft.DataFrame) -> None

Writes batched daft DataFrames from async or sync generators. Handles both AsyncGenerator and Generator types, skips empty DataFrames automatically, and processes each batch through write_daft_dataframe().

Parameters

batch_daft_dfdaft.DataFrame

Required

Batched daft DataFrame to write

▸

get_statistics

async

async get_statistics(self, typename: Optional[str] = None) -> ActivityStatistics

Returns statistics about the output operation including total record count and chunk information.

Parameters

typenameOptional[str]

Optional

Type name of the entity (e.g., 'database', 'schema', 'table')

Returns

ActivityStatistics - Object containing output statistics

▸

path_gen

path_gen(self, chunk_count: Optional[int] = None, chunk_part: int = 0, start_marker: Optional[str] = None, end_marker: Optional[str] = None) -> str

Generates file paths for output chunks with support for query extraction markers.

Parameters

chunk_countOptional[int]

Optional

Total number of chunks

chunk_partint

Optional

Part number within chunk (default: 0)

start_markerOptional[str]

Optional

Start marker for query extraction

end_markerOptional[str]

Optional

End marker for query extraction

Returns

str - Generated file path

▸

process_null_fields

process_null_fields(self, obj: Any, preserve_fields: Optional[List[str]] = None, null_to_empty_dict_fields: Optional[List[str]] = None) -> Any

Recursively removes null values from dictionaries and lists, with options to preserve specific fields or convert nulls to empty dictionaries.

Parameters

objAny

Required

The object to clean (dict, list, or other value)

preserve_fieldsOptional[List[str]]

Optional

Field names to preserve even if null

null_to_empty_dict_fieldsOptional[List[str]]

Optional

Field names to convert from null to empty dict

Returns

Any - Cleaned object with null values removed

WriteMode enum

The WriteMode enum defines the available write modes for output operations:

class WriteMode(Enum):
    APPEND = "append"                    # Append data to existing files
    OVERWRITE = "overwrite"              # Overwrite existing files
    OVERWRITE_PARTITIONS = "overwrite-partitions"  # Overwrite specific partitions

Output implementations

The Application SDK provides three concrete implementations of the base Output class, each optimized for different data formats and storage requirements. All implementations inherit the common functionality from the base class, including automatic chunking, buffer management, statistics tracking, and object store uploads.

📦

ParquetOutput

Columnar Format

Writes data to Parquet files with support for chunking, consolidation, Hive partitioning, and automatic object store uploads.

📄

JsonOutput

JSONL Format

Writes data to JSON files (JSONL format) with support for chunking, buffering, null field processing, and automatic object store uploads.

🧊

IcebergOutput

Table Format

Writes data to Apache Iceberg tables using daft. Supports table creation, schema inference, and multiple write modes.

Output

Properties

Methods7

write_daft_dataframe

Parameters

write_dataframe

Parameters

write_batched_dataframe

Parameters

write_batched_daft_dataframe

Parameters

get_statistics

Parameters

Returns

path_gen

Parameters

Returns

process_null_fields

Parameters

Returns

WriteMode enum​

Output implementations​

ParquetOutput

JsonOutput

IcebergOutput

See also​

WriteMode enum

Output implementations

See also