Class - LogNormalNumPyDataSplitter#
- class openfl.utilities.data_splitters.numpy.LogNormalNumPyDataSplitter(mu, sigma, num_classes, classes_per_col, min_samples_per_class, seed=0)[source]#
Bases:
NumPyDataSplitterClass for splitting numpy arrays of data according to a LogNormal distribution.
Unbalanced (LogNormal) dataset split. This split assumes only several classes are assigned to each collaborator. Firstly, it assigns classes_per_col * min_samples_per_class items of dataset to each collaborator so all of collaborators will have some data after the split. Then, it generates positive integer numbers by log-normal (power) law. These numbers correspond to numbers of dataset items picked each time from dataset and assigned to a collaborator. Generation is repeated for each class assigned to a collaborator. This is a parametrized version of non-i.i.d. data split in FedProx algorithm. Origin source: litian96/FedProx
- Parameters:
mu (float) – Distribution hyperparameter.
sigma (float) – Distribution hyperparameter.
num_classes (int) – Number of classes.
classes_per_col (int) – Number of classes assigned to each collaborator.
min_samples_per_class (int) – Minimum number of collaborator samples of each class.
seed (int, optional) – Random numbers generator seed. Defaults to 0.
Note
This split always drops out some part of the dataset! Non-deterministic behavior selects only random subpart of class items.
- __init__(mu, sigma, num_classes, classes_per_col, min_samples_per_class, seed=0)[source]#
Initialize the generator.
- Parameters:
mu (float) – Distribution hyperparameter.
sigma (float) – Distribution hyperparameter.
classes_per_col (int) – Number of classes assigned to each collaborator.
min_samples_per_class (int) – Minimum number of collaborator samples of each class.
seed (int) – Random numbers generator seed. Defaults to 0. For different splits on envoys, try setting different values for this parameter on each shard descriptor.
Methods