Data Loading and Preprocessing

This page is currently in draft

A generic approach may use an online Python interpreter such as Google Colab.

A code example may be found at this location -

LLM-Development/Colab/Data_Loading_and_Preprocessing.ipynb at main · SingularityNET-Archive/LLM-DevelopmentGitHub

Here's what this code does:

We use the built-in json module to load the JSON data directly from the file, instead of using the JSONLoader.
We iterate over the root-level array data, and for each item (workgroup meeting object), we use the CharacterTextSplitter to split the string representation of the item into chunks. We extend the chunks list with the resulting chunks.
We convert each chunk in the chunks list into a Document object.
Finally, we iterate over the docs list and print the content of each document.

This solution assumes that your JSON data is a root-level array of workgroup meeting objects. If your JSON data has a different structure, you may need to modify the code accordingly.

Please note that this solution doesn't handle nested JSON structures or complex data types within the JSON objects.

If you need more advanced JSON parsing capabilities, you may want to consider using a dedicated JSON processing library like jsonpath-ng or jq directly.

Sample code for loading JSON files into Langchain

Typically a strucured text source will be used such as JSON.

An example of how to load JSON files into Langcahin may be found at this location -

LLM-Development/Colab/JSON_Loader.ipynb at main · SingularityNET-Archive/LLM-DevelopmentGitHub

PreviousRetrieval-Augmented Generation NextVector Store Creation

Last updated 8 months ago