openfl.utilities.data_splitters
openfl.utilities.data package.
- class openfl.utilities.data_splitters.DataSplitter
Base class for data splitting.
This class should be subclassed when creating specific data splitter classes.
- abstract split(data: Iterable[T], num_collaborators: int) List[Iterable[T]]
Split the data into a specified number of parts.
- Parameters:
data (Iterable[T]) – The data to be split.
num_collaborators (int) – The number of parts to split the data into.
- Returns:
List[Iterable[T]] – The split data.
- Raises:
NotImplementedError – This is an abstract method and must be overridden in a subclass.
- class openfl.utilities.data_splitters.DirichletNumPyDataSplitter(alpha=0.5, min_samples_per_col=10, seed=0)
Class for splitting numpy arrays of data according to a Dirichlet distribution.
Generates the random sample of integer numbers from dirichlet distribution until minimum subset length exceeds the specified threshold. This behavior is a parametrized version of non-i.i.d. split in FedMA algorithm. Origin source: https://github.com/IBM/FedMA/blob/master/utils.py#L96
- Parameters:
alpha (float, optional) – Dirichlet distribution parameter. Defaults to 0.5.
min_samples_per_col (int, optional) – Minimal amount of samples per collaborator. Defaults to 10.
seed (int, optional) – Random numbers generator seed. Defaults to 0.
- split(data, num_collaborators)
Split the data.
- class openfl.utilities.data_splitters.EqualNumPyDataSplitter(shuffle=True, seed=0)
Class for splitting numpy arrays of data evenly.
- Parameters:
shuffle (bool, optional) – Flag determining whether to shuffle the dataset before splitting. Defaults to True.
seed (int, optional) – Random numbers generator seed. Defaults to 0.
- split(data, num_collaborators)
Split the data.
- class openfl.utilities.data_splitters.LogNormalNumPyDataSplitter(mu, sigma, num_classes, classes_per_col, min_samples_per_class, seed=0)
Class for splitting numpy arrays of data according to a LogNormal distribution.
Unbalanced (LogNormal) dataset split. This split assumes only several classes are assigned to each collaborator. Firstly, it assigns classes_per_col * min_samples_per_class items of dataset to each collaborator so all of collaborators will have some data after the split. Then, it generates positive integer numbers by log-normal (power) law. These numbers correspond to numbers of dataset items picked each time from dataset and assigned to a collaborator. Generation is repeated for each class assigned to a collaborator. This is a parametrized version of non-i.i.d. data split in FedProx algorithm. Origin source: https://github.com/litian96/FedProx/blob/master/data/mnist/generate_niid.py#L30
- Parameters:
mu (float) – Distribution hyperparameter.
sigma (float) – Distribution hyperparameter.
num_classes (int) – Number of classes.
classes_per_col (int) – Number of classes assigned to each collaborator.
min_samples_per_class (int) – Minimum number of collaborator samples of each class.
seed (int, optional) – Random numbers generator seed. Defaults to 0.
Note
This split always drops out some part of the dataset! Non-deterministic behavior selects only random subpart of class items.
- split(data, num_collaborators)
Split the data.
- Parameters:
data (np.ndarray) – numpy-like label array.
num_collaborators (int) – number of collaborators to split data across. Should be divisible by number of classes in
data.
- class openfl.utilities.data_splitters.NumPyDataSplitter
Base class for splitting numpy arrays of data.
This class should be subclassed when creating specific data splitter classes.
- abstract split(data: ndarray, num_collaborators: int) List[List[int]]
Split the data.
- class openfl.utilities.data_splitters.RandomNumPyDataSplitter(shuffle=True, seed=0)
Class for splitting numpy arrays of data randomly.
- Parameters:
shuffle (bool, optional) – Flag determining whether to shuffle the dataset before splitting. Defaults to True.
seed (int, optional) – Random numbers generator seed. Defaults to 0.
- split(data, num_collaborators)
Split the data.
|
openfl.utilities.data_splitters.data_splitter module. |
|
UnbalancedFederatedDataset module. |