Getting Started
Installation
Usage
Dataset
A Dataset[T]
is a mapping that allows pipelining of functions in a readable syntax returning an example of type T
.
from datastream import Dataset
fruits_and_cost = (
('apple', 5),
('pear', 7),
('banana', 14),
('kiwi', 100),
)
dataset = (
Dataset.from_subscriptable(fruits_and_cost)
.starmap(lambda fruit, cost: (
fruit,
cost * 2,
))
)
assert dataset[2] == ('banana', 28)
Datastream
A Datastream[T]
is an iterable that yields batches of type T
from one or more datasets.
import numpy as np
from datastream import Dataset, Datastream
dataset = Dataset.from_subscriptable([1, 2, 3, 4])
datastream = Datastream(dataset)
for batch in datastream.data_loader(batch_size=2):
assert len(batch) == 2
Merge
Merge multiple datasets into a single datastream. The proportion of samples from each dataset in a batch can be controlled by passing tuples of (datastream, proportion)
.
import numpy as np
from datastream import Dataset, Datastream
dataset1 = Dataset.from_subscriptable([1, 2, 3, 4])
dataset2 = Dataset.from_subscriptable([5, 6, 7, 8])
datastream = Datastream.merge([
(Datastream(dataset1), 1),
(Datastream(dataset2), 1),
])
for batch in datastream.data_loader(batch_size=2):
assert len(batch) == 2