JsonOutput writes data to JSON files in JSONL format (one JSON object per line) with support for chunking, buffering, null field processing, and automatic object store uploads. It inherits from the base Output class and provides specialized functionality for JSON data serialization.

JsonOutput

Class

📁application_sdk.outputs.json

Inheritance chain:

Output

Writes data to JSON files (JSONL format) with support for chunking, buffering, null field processing, and automatic object store uploads.

Methods5

▸

init

__init__(self, output_suffix: str, output_path: Optional[str] = None, typename: Optional[str] = None, chunk_start: Optional[int] = None, buffer_size: int = 5000, chunk_size: Optional[int] = 50000, start_marker: Optional[str] = None, end_marker: Optional[str] = None, retain_local_copy: bool = False)

Initialize JsonOutput with output suffix and configuration options.

Parameters

output_suffixstr

Required

Suffix for output files

output_pathOptional[str]

Optional

Base path where JSON files are written

typenameOptional[str]

Optional

Type name of the entity

chunk_startOptional[int]

Optional

Starting index for chunk numbering

buffer_sizeint

Optional

Number of records per buffer (default: 5000)

chunk_sizeOptional[int]

Optional

Maximum records per chunk (default: 50000)

start_markerOptional[str]

Optional

Start marker for query extraction

end_markerOptional[str]

Optional

End marker for query extraction

retain_local_copybool

Optional

Whether to retain local copy after upload (default: False)

▸

write_dataframe

async

async write_dataframe(self, dataframe: pd.DataFrame) -> None

Writes a pandas DataFrame to JSON files in JSONL format (one JSON object per line). Writes in JSONL format (orient='records', lines=True), automatically chunks based on buffer_size and max_file_size_bytes, appends to existing files if they exist, and uploads to object store when size limits are reached.

Parameters

dataframepd.DataFrame

Required

The pandas DataFrame to write

▸

write_batched_dataframe

async

async write_batched_dataframe(self, batch_df: pd.DataFrame) -> None

Writes batched pandas DataFrames to JSON files. Processes each batch through write_dataframe() and skips empty DataFrames automatically.

Parameters

batch_dfpd.DataFrame

Required

Batched pandas DataFrame to write

▸

write_daft_dataframe

async

async write_daft_dataframe(self, dataframe: daft.DataFrame, preserve_fields: Optional[List[str]] = None, null_to_empty_dict_fields: Optional[List[str]] = None) -> None

Writes a daft DataFrame to JSON files with null field processing and datetime conversion. Converts datetime objects to epoch timestamps (milliseconds), processes null fields recursively removing nulls except for preserved fields, buffers rows and flushes when buffer_size is reached, automatically chunks files when max_file_size_bytes or chunk_size limits are reached, and uses orjson for efficient JSON serialization.

Parameters

dataframedaft.DataFrame

Required

The DataFrame to write

preserve_fieldsOptional[List[str]]

Optional

Fields to preserve even if null (default: includes 'identity_cycle', 'number_columns_in_part_key', etc.)

null_to_empty_dict_fieldsOptional[List[str]]

Optional

Fields to convert from null to empty dict (default: ['attributes', 'customAttributes'])

▸

flush_daft_buffer

async

async flush_daft_buffer(self, buffer: List[str], chunk_part: int = 0) -> None

Flushes the current buffer to a JSON file. Called automatically but may be called manually.

Parameters

bufferList[str]

Required

List of serialized JSON row strings

chunk_partint

Optional

Part number for file naming (default: 0)

Usage Examples

Basic initialization

Initialize JsonOutput with basic configuration

from application_sdk.outputs import JsonOutput

json_output = JsonOutput(
  output_path="/tmp/output",
  output_suffix="data",
  typename="table",
  chunk_size=50000,
  buffer_size=5000
)

await json_output.write_dataframe(df)

With custom field preservation

Write with custom null field processing

await json_output.write_daft_dataframe(
  daft_df,
  preserve_fields=["required_field", "identity_cycle"],
  null_to_empty_dict_fields=["attributes", "customAttributes"]
)

Usage patterns

For detailed usage patterns including JSONL format, null field processing, datetime conversion, and other JsonOutput-specific features, see Output usage patterns and select the JsonOutput tab.

JsonOutput

Methods5

__init__

Parameters

write_dataframe

Parameters

write_batched_dataframe

Parameters

write_daft_dataframe

Parameters

flush_daft_buffer

Parameters

Usage Examples

Basic initialization

With custom field preservation

Usage patterns​

See also​

init

Usage patterns

See also