pytorch-datastream
Simple dataset to dataloader library for pytorch.
Quick Example
from datastream import Dataset, Datastream
dataset = (
Dataset.from_subscriptable([1, 2, 3])
.map(lambda number: number + 1)
)
assert dataset[-1] == 4
data_loader = (
Datastream(dataset)
.data_loader(batch_size=16, n_batches_per_epoch=100)
)
assert len(next(iter(data_loader))) == 16
Features
- Simple, readable dataset pipeline creation
- Built-in support for:
- Imbalanced datasets
- Oversampling / stratification
- Weighted sampling
- Easy conversion to PyTorch DataLoader
- Testable examples in documentation
- Type hints and Pydantic validation
- Clean, maintainable codebase
Installation
Install with poetry:
Or with pip:
Next Steps
- Check out the Getting Started guide
- See the Dataset and Datastream API references