Skip to main content

Output

Abstract Class
📁application_sdk.outputs

Output classes provide a unified interface for writing data to various destinations in the Application SDK. All output classes inherit from the base `Output` class and support writing data from pandas or daft DataFrames, with automatic chunking, buffering, and object store upload capabilities. All output classes share a common foundation through the base `Output` class, which provides consistent behavior across all output types. This includes automatic chunking, buffer management, statistics tracking, and object store uploads.

Properties

output_pathstr
Base path where output files are written
output_prefixstr
Prefix for files when uploading to object store
total_record_countint
Total number of records processed
chunk_countint
Number of chunks the output was split into
chunk_partint
Current part number within a chunk
buffer_sizeint
Size of the write buffer
max_file_size_bytesint
Maximum file size before splitting
partitionsList[int]
List of partition counts per chunk

Methods7

write_daft_dataframe

async
async write_daft_dataframe(self, dataframe: daft.DataFrame) -> None
Writes a daft DataFrame to the output destination. Must be implemented by all output classes.
Parameters
dataframedaft.DataFrame
Required
The daft DataFrame to write

write_dataframe

async
async write_dataframe(self, dataframe: pd.DataFrame) -> None
Writes a pandas DataFrame to the output, automatically handling chunking and file size management. Estimates file size based on DataFrame sample, splits large DataFrames into chunks based on buffer_size and max_file_size_bytes, automatically uploads files to object store when size limits are reached, and records metrics for successful writes.
Parameters
dataframepd.DataFrame
Required
The pandas DataFrame to write

write_batched_dataframe

async
async write_batched_dataframe(self, batch_df: pd.DataFrame) -> None
Writes batched pandas DataFrames from async or sync generators. Handles both AsyncGenerator and Generator types, skips empty DataFrames automatically, and processes each batch through write_dataframe().
Parameters
batch_dfpd.DataFrame
Required
Batched pandas DataFrame to write

write_batched_daft_dataframe

async
async write_batched_daft_dataframe(self, batch_daft_df: daft.DataFrame) -> None
Writes batched daft DataFrames from async or sync generators. Handles both AsyncGenerator and Generator types, skips empty DataFrames automatically, and processes each batch through write_daft_dataframe().
Parameters
batch_daft_dfdaft.DataFrame
Required
Batched daft DataFrame to write

get_statistics

async
async get_statistics(self, typename: Optional[str] = None) -> ActivityStatistics
Returns statistics about the output operation including total record count and chunk information.
Parameters
typenameOptional[str]
Optional
Type name of the entity (e.g., 'database', 'schema', 'table')
Returns
ActivityStatistics - Object containing output statistics

path_gen

path_gen(self, chunk_count: Optional[int] = None, chunk_part: int = 0, start_marker: Optional[str] = None, end_marker: Optional[str] = None) -> str
Generates file paths for output chunks with support for query extraction markers.
Parameters
chunk_countOptional[int]
Optional
Total number of chunks
chunk_partint
Optional
Part number within chunk (default: 0)
start_markerOptional[str]
Optional
Start marker for query extraction
end_markerOptional[str]
Optional
End marker for query extraction
Returns
str - Generated file path

process_null_fields

process_null_fields(self, obj: Any, preserve_fields: Optional[List[str]] = None, null_to_empty_dict_fields: Optional[List[str]] = None) -> Any
Recursively removes null values from dictionaries and lists, with options to preserve specific fields or convert nulls to empty dictionaries.
Parameters
objAny
Required
The object to clean (dict, list, or other value)
preserve_fieldsOptional[List[str]]
Optional
Field names to preserve even if null
null_to_empty_dict_fieldsOptional[List[str]]
Optional
Field names to convert from null to empty dict
Returns
Any - Cleaned object with null values removed

WriteMode enum

The WriteMode enum defines the available write modes for output operations:

class WriteMode(Enum):
APPEND = "append" # Append data to existing files
OVERWRITE = "overwrite" # Overwrite existing files
OVERWRITE_PARTITIONS = "overwrite-partitions" # Overwrite specific partitions

Output implementations

The Application SDK provides three concrete implementations of the base Output class, each optimized for different data formats and storage requirements. All implementations inherit the common functionality from the base class, including automatic chunking, buffer management, statistics tracking, and object store uploads.

See also

  • Inputs: Read data from various sources including SQL queries, Parquet files, JSON files, and Iceberg tables
  • Application SDK README: Overview of the Application SDK and its components
  • App structure: Standardized folder structure for Atlan applications
  • StateStore: Persistent state management for workflows and credentials