JsonInput reads data from JSON files, supporting both single files and directories containing multiple JSON files. Supports JSONL (JSON Lines) format where each line is a separate JSON object. Can't specify both a single file path (ending with .json) and file_names parameter.
JsonInput inherits from the base Input class and provides specialized functionality for reading JSON and JSONL formatted data.
JsonInput
Class📁
application_sdk.inputs.jsonInheritance chain:
InputReads data from JSON files, supporting both single files and directories containing multiple JSON files. Supports JSONL (JSON Lines) format where each line is a separate JSON object. Can't specify both a single file path (ending with .json) and file_names parameter.
Methods5
▸
__init__
__init__(self, path: str, file_names: Optional[List[str]] = None, chunk_size: int = 100000)Initialize JsonInput with path to JSON file or directory. Supports local paths and object store paths.
Parameters
pathstrPath to JSON file or directory. Supports local paths and object store paths
file_namesOptional[List[str]]List of specific file names to read from directory
chunk_sizeintNumber of rows per batch (default: 100000)
▸
get_dataframe
async
async get_dataframe(self) -> pd.DataFrameReads all specified JSON files and returns a single combined pandas DataFrame. Files are read as JSONL format (one JSON object per line). Reads files as JSONL format (lines=True), combines files using pd.concat() with ignore_index=True. Each line in the JSON file becomes a row in the DataFrame.
Returns
pd.DataFrame - Combined DataFrame from all specified JSON files▸
get_batched_dataframe
async
async get_batched_dataframe(self) -> AsyncIterator[pd.DataFrame]Reads JSON files and returns batches as pandas DataFrames. Reads each file using pd.read_json() with chunksize parameter, processes files sequentially, yielding chunks from each file. Each chunk is a separate DataFrame.
Returns
AsyncIterator[pd.DataFrame] - Iterator yielding batches of rows▸
get_daft_dataframe
async
async get_daft_dataframe(self) -> daft.DataFrameReads all specified JSON files and returns a single combined daft DataFrame from all specified files.
Returns
daft.DataFrame - Combined daft DataFrame from all specified files▸
get_batched_daft_dataframe
async
async get_batched_daft_dataframe(self) -> AsyncIterator[daft.DataFrame]Reads JSON files and returns each file as a separate daft DataFrame batch. Each discovered file becomes a separate daft DataFrame batch, uses _chunk_size parameter for internal chunking. Files are processed individually.
Returns
AsyncIterator[daft.DataFrame] - Iterator yielding batches as daft DataFramesUsage Examples
Single file
Read a single JSON file
from application_sdk.inputs import JsonInput
json_input = JsonInput(path="data/users.json")
df = await json_input.get_dataframe()
Directory with all files
Read all JSON files from a directory
json_input = JsonInput(
path="s3://bucket/data/",
chunk_size=100000
)
async for batch_df in json_input.get_batched_dataframe():
# Process each batch
pass
See also
- Inputs: Overview of all input classes and common usage patterns
- SQLQueryInput: Read data from SQL databases by executing SQL queries
- ParquetInput: Read data from Parquet files, supporting single files and directories
- Application SDK README: Overview of the Application SDK and its components