openfl.utilities.data_splitters.numpy.DirichletNumPyDataSplitter

class openfl.utilities.data_splitters.numpy.DirichletNumPyDataSplitter(alpha=0.5, min_samples_per_col=10, seed=0)

Bases: NumPyDataSplitter

Class for splitting numpy arrays of data according to a Dirichlet distribution.

Generates the random sample of integer numbers from dirichlet distribution until minimum subset length exceeds the specified threshold. This behavior is a parametrized version of non-i.i.d. split in FedMA algorithm. Origin source: https://github.com/IBM/FedMA/blob/master/utils.py#L96

Parameters:
  • alpha (float, optional) – Dirichlet distribution parameter. Defaults to 0.5.

  • min_samples_per_col (int, optional) – Minimal amount of samples per collaborator. Defaults to 10.

  • seed (int, optional) – Random numbers generator seed. Defaults to 0.

Methods

split

Split the data.

split(data, num_collaborators)

Split the data.