JsonInput reads data from JSON files, supporting both single files and directories containing multiple JSON files. Supports JSONL (JSON Lines) format where each line is a separate JSON object. Can't specify both a single file path (ending with .json) and file_names parameter.

JsonInput inherits from the base Input class and provides specialized functionality for reading JSON and JSONL formatted data.

JsonInput

Class

📁application_sdk.inputs.json

Inheritance chain:

Input

Reads data from JSON files, supporting both single files and directories containing multiple JSON files. Supports JSONL (JSON Lines) format where each line is a separate JSON object. Can't specify both a single file path (ending with .json) and file_names parameter.

Methods5

▸

init

__init__(self, path: str, file_names: Optional[List[str]] = None, chunk_size: int = 100000)

Initialize JsonInput with path to JSON file or directory. Supports local paths and object store paths.

Parameters

pathstr

Required

Path to JSON file or directory. Supports local paths and object store paths

file_namesOptional[List[str]]

Optional

List of specific file names to read from directory

chunk_sizeint

Optional

Number of rows per batch (default: 100000)

▸

get_dataframe

async

async get_dataframe(self) -> pd.DataFrame

Reads all specified JSON files and returns a single combined pandas DataFrame. Files are read as JSONL format (one JSON object per line). Reads files as JSONL format (lines=True), combines files using pd.concat() with ignore_index=True. Each line in the JSON file becomes a row in the DataFrame.

Returns

pd.DataFrame - Combined DataFrame from all specified JSON files

▸

get_batched_dataframe

async

async get_batched_dataframe(self) -> AsyncIterator[pd.DataFrame]

Reads JSON files and returns batches as pandas DataFrames. Reads each file using pd.read_json() with chunksize parameter, processes files sequentially, yielding chunks from each file. Each chunk is a separate DataFrame.

Returns

AsyncIterator[pd.DataFrame] - Iterator yielding batches of rows

▸

get_daft_dataframe

async

async get_daft_dataframe(self) -> daft.DataFrame

Reads all specified JSON files and returns a single combined daft DataFrame from all specified files.

Returns

daft.DataFrame - Combined daft DataFrame from all specified files

▸

get_batched_daft_dataframe

async

async get_batched_daft_dataframe(self) -> AsyncIterator[daft.DataFrame]

Reads JSON files and returns each file as a separate daft DataFrame batch. Each discovered file becomes a separate daft DataFrame batch, uses _chunk_size parameter for internal chunking. Files are processed individually.

Returns

AsyncIterator[daft.DataFrame] - Iterator yielding batches as daft DataFrames

Usage Examples

Single file

Read a single JSON file

from application_sdk.inputs import JsonInput

json_input = JsonInput(path="data/users.json")
df = await json_input.get_dataframe()

Directory with all files

Read all JSON files from a directory

json_input = JsonInput(
  path="s3://bucket/data/",
  chunk_size=100000
)

async for batch_df in json_input.get_batched_dataframe():
  # Process each batch
  pass

JsonInput

Methods5

__init__

Parameters

get_dataframe

Returns

get_batched_dataframe

Returns

get_daft_dataframe

Returns

get_batched_daft_dataframe

Returns

Usage Examples

Single file

Directory with all files

See also​

init

See also