File processing
Joule provides utility classes to load large files efficiently
Last updated
Was this helpful?
Joule provides utility classes to load large files efficiently
Last updated
Was this helpful?
Under the hood Joule uses to read files and thereby enable efficient large file handling and OOTB standard file format support. The classes that perform this work have been surfaced to developers in the form of a Callable
task.
Two key classes are provided:
PARQUET
ORC
CSV
JSON
ARROW_IPC
The provided classes can be found under the SDK package
This processing task class reads a file contents and automatically converts each file logical row in to StreamEvent object. This is performed using micro-batch processing which reduces memory and processing overhead while driving stream processing throughput.
The below example loads
ReferenceData objects are stored within a in-memory data store to reduce the retrieval latency and I/O overhead.
This processing task class reads a reference data file contents and automatically converts each file logical row in to object. This is performed using micro-batch processing to reduce memory footprint and processing overhead and therefore able to read large files in to memory.
This example can be found within the project test CellTowerCSVParserTest
class.