Skip to main content

JsonOutput writes data to JSON files in JSONL format (one JSON object per line) with support for chunking, buffering, null field processing, and automatic object store uploads. It inherits from the base Output class and provides specialized functionality for JSON data serialization.

JsonOutput

Class
📁application_sdk.outputs.json
Inheritance chain:
Output

Writes data to JSON files (JSONL format) with support for chunking, buffering, null field processing, and automatic object store uploads.

Methods5

__init__

__init__(self, output_suffix: str, output_path: Optional[str] = None, typename: Optional[str] = None, chunk_start: Optional[int] = None, buffer_size: int = 5000, chunk_size: Optional[int] = 50000, start_marker: Optional[str] = None, end_marker: Optional[str] = None, retain_local_copy: bool = False)
Initialize JsonOutput with output suffix and configuration options.
Parameters
output_suffixstr
Required
Suffix for output files
output_pathOptional[str]
Optional
Base path where JSON files are written
typenameOptional[str]
Optional
Type name of the entity
chunk_startOptional[int]
Optional
Starting index for chunk numbering
buffer_sizeint
Optional
Number of records per buffer (default: 5000)
chunk_sizeOptional[int]
Optional
Maximum records per chunk (default: 50000)
start_markerOptional[str]
Optional
Start marker for query extraction
end_markerOptional[str]
Optional
End marker for query extraction
retain_local_copybool
Optional
Whether to retain local copy after upload (default: False)

write_dataframe

async
async write_dataframe(self, dataframe: pd.DataFrame) -> None
Writes a pandas DataFrame to JSON files in JSONL format (one JSON object per line). Writes in JSONL format (orient='records', lines=True), automatically chunks based on buffer_size and max_file_size_bytes, appends to existing files if they exist, and uploads to object store when size limits are reached.
Parameters
dataframepd.DataFrame
Required
The pandas DataFrame to write

write_batched_dataframe

async
async write_batched_dataframe(self, batch_df: pd.DataFrame) -> None
Writes batched pandas DataFrames to JSON files. Processes each batch through write_dataframe() and skips empty DataFrames automatically.
Parameters
batch_dfpd.DataFrame
Required
Batched pandas DataFrame to write

write_daft_dataframe

async
async write_daft_dataframe(self, dataframe: daft.DataFrame, preserve_fields: Optional[List[str]] = None, null_to_empty_dict_fields: Optional[List[str]] = None) -> None
Writes a daft DataFrame to JSON files with null field processing and datetime conversion. Converts datetime objects to epoch timestamps (milliseconds), processes null fields recursively removing nulls except for preserved fields, buffers rows and flushes when buffer_size is reached, automatically chunks files when max_file_size_bytes or chunk_size limits are reached, and uses orjson for efficient JSON serialization.
Parameters
dataframedaft.DataFrame
Required
The DataFrame to write
preserve_fieldsOptional[List[str]]
Optional
Fields to preserve even if null (default: includes 'identity_cycle', 'number_columns_in_part_key', etc.)
null_to_empty_dict_fieldsOptional[List[str]]
Optional
Fields to convert from null to empty dict (default: ['attributes', 'customAttributes'])

flush_daft_buffer

async
async flush_daft_buffer(self, buffer: List[str], chunk_part: int = 0) -> None
Flushes the current buffer to a JSON file. Called automatically but may be called manually.
Parameters
bufferList[str]
Required
List of serialized JSON row strings
chunk_partint
Optional
Part number for file naming (default: 0)

Usage Examples

Basic initialization

Initialize JsonOutput with basic configuration

from application_sdk.outputs import JsonOutput

json_output = JsonOutput(
output_path="/tmp/output",
output_suffix="data",
typename="table",
chunk_size=50000,
buffer_size=5000
)

await json_output.write_dataframe(df)

With custom field preservation

Write with custom null field processing

await json_output.write_daft_dataframe(
daft_df,
preserve_fields=["required_field", "identity_cycle"],
null_to_empty_dict_fields=["attributes", "customAttributes"]
)

Usage patterns

For detailed usage patterns including JSONL format, null field processing, datetime conversion, and other JsonOutput-specific features, see Output usage patterns and select the JsonOutput tab.

See also

  • Outputs: Base Output class and common usage patterns for all output types
  • ParquetOutput: Write data to Parquet files with chunking and Hive partitioning
  • IcebergOutput: Write data to Apache Iceberg tables with automatic table creation