The JSONReader allows you to parse JSON documents while offering features to extract relationships between nodes.
It provides options to control the depth of traversal, collapse lengthy JSON fragments, and clean up the JSON structure.
First things first, let's install the package via pip:
pip install llama-index-readers-json
Below is a code snippet that demonstrates how to use it:
from llama_index.readers.json import JSONReader
# Initialize JSONReader
reader = JSONReader(
# The number of levels to go back in the JSON tree. Set to 0 to traverse all levels. Default is None.
levels_back="<Levels Back>",
# The maximum number of characters a JSON fragment would be collapsed in the output. Default is None.
collapse_length="<Collapse Length>",
# If True, ensures that the output is ASCII-encoded. Default is False.
ensure_ascii="<Ensure ASCII>",
# If True, indicates that the file is in JSONL (JSON Lines) format. Default is False.
is_jsonl="<Is JSONL>",
# If True, removes lines containing only formatting from the output. Default is True.
clean_json="<Clean JSON>",
)
# Load data from JSON file
documents = reader.load_data(input_file="<Input File>", extra_info={})
I'm currently working on an option that will allow extracting text data from JSON, while treating the rest as metadata. Once it's complete, I'll publish it on GitHub.
References: