Hub Export Format: stream your dataset to your ML pipeline
Nicolas Draber
Hub is an open source dataset storage format making it easy to train a ML model. As written on their GitHub's page:
> Hub is a dataset format with a simple API for creating, storing, and collaborating on AI datasets of any size. The hub data layout enables rapid transformations and streaming of data while training models at scale.
In short, for large datasets you do not need to download your complete dataset to start training a model on it. You can stream its data to your ML Pipeline.
For more details: https://www.activeloop.ai/