JsonOutput writes data to JSON files in JSONL format (one JSON object per line) with support for chunking, buffering, null field processing, and automatic object store uploads. It inherits from the base Output class and provides specialized functionality for JSON data serialization.
JsonOutput
Class📁
application_sdk.outputs.jsonInheritance chain:
OutputWrites data to JSON files (JSONL format) with support for chunking, buffering, null field processing, and automatic object store uploads.
Methods5
▸
__init__
__init__(self, output_suffix: str, output_path: Optional[str] = None, typename: Optional[str] = None, chunk_start: Optional[int] = None, buffer_size: int = 5000, chunk_size: Optional[int] = 50000, start_marker: Optional[str] = None, end_marker: Optional[str] = None, retain_local_copy: bool = False)Initialize JsonOutput with output suffix and configuration options.
Parameters
output_suffixstrSuffix for output files
output_pathOptional[str]Base path where JSON files are written
typenameOptional[str]Type name of the entity
chunk_startOptional[int]Starting index for chunk numbering
buffer_sizeintNumber of records per buffer (default: 5000)
chunk_sizeOptional[int]Maximum records per chunk (default: 50000)
start_markerOptional[str]Start marker for query extraction
end_markerOptional[str]End marker for query extraction
retain_local_copyboolWhether to retain local copy after upload (default: False)
▸
write_dataframe
async
async write_dataframe(self, dataframe: pd.DataFrame) -> NoneWrites a pandas DataFrame to JSON files in JSONL format (one JSON object per line). Writes in JSONL format (orient='records', lines=True), automatically chunks based on buffer_size and max_file_size_bytes, appends to existing files if they exist, and uploads to object store when size limits are reached.
Parameters
dataframepd.DataFrameThe pandas DataFrame to write
▸
write_batched_dataframe
async
async write_batched_dataframe(self, batch_df: pd.DataFrame) -> NoneWrites batched pandas DataFrames to JSON files. Processes each batch through write_dataframe() and skips empty DataFrames automatically.
Parameters
batch_dfpd.DataFrameBatched pandas DataFrame to write
▸
write_daft_dataframe
async
async write_daft_dataframe(self, dataframe: daft.DataFrame, preserve_fields: Optional[List[str]] = None, null_to_empty_dict_fields: Optional[List[str]] = None) -> NoneWrites a daft DataFrame to JSON files with null field processing and datetime conversion. Converts datetime objects to epoch timestamps (milliseconds), processes null fields recursively removing nulls except for preserved fields, buffers rows and flushes when buffer_size is reached, automatically chunks files when max_file_size_bytes or chunk_size limits are reached, and uses orjson for efficient JSON serialization.
Parameters
dataframedaft.DataFrameThe DataFrame to write
preserve_fieldsOptional[List[str]]Fields to preserve even if null (default: includes 'identity_cycle', 'number_columns_in_part_key', etc.)
null_to_empty_dict_fieldsOptional[List[str]]Fields to convert from null to empty dict (default: ['attributes', 'customAttributes'])
▸
flush_daft_buffer
async
async flush_daft_buffer(self, buffer: List[str], chunk_part: int = 0) -> NoneFlushes the current buffer to a JSON file. Called automatically but may be called manually.
Parameters
bufferList[str]List of serialized JSON row strings
chunk_partintPart number for file naming (default: 0)
Usage Examples
Basic initialization
Initialize JsonOutput with basic configuration
from application_sdk.outputs import JsonOutput
json_output = JsonOutput(
output_path="/tmp/output",
output_suffix="data",
typename="table",
chunk_size=50000,
buffer_size=5000
)
await json_output.write_dataframe(df)
With custom field preservation
Write with custom null field processing
await json_output.write_daft_dataframe(
daft_df,
preserve_fields=["required_field", "identity_cycle"],
null_to_empty_dict_fields=["attributes", "customAttributes"]
)
Usage patterns
For detailed usage patterns including JSONL format, null field processing, datetime conversion, and other JsonOutput-specific features, see Output usage patterns and select the JsonOutput tab.
See also
- Outputs: Base Output class and common usage patterns for all output types
- ParquetOutput: Write data to Parquet files with chunking and Hive partitioning
- IcebergOutput: Write data to Apache Iceberg tables with automatic table creation