Class - DirichletNumPyDataSplitter#

class openfl.utilities.data_splitters.numpy.DirichletNumPyDataSplitter(alpha=0.5, min_samples_per_col=10, seed=0)[source]#

Bases: NumPyDataSplitter

Class for splitting numpy arrays of data according to a Dirichlet distribution.

Generates the random sample of integer numbers from dirichlet distribution until minimum subset length exceeds the specified threshold. This behavior is a parametrized version of non-i.i.d. split in FedMA algorithm. Origin source: IBM/FedMA

Parameters:
  • alpha (float, optional) – Dirichlet distribution parameter. Defaults to 0.5.

  • min_samples_per_col (int, optional) – Minimal amount of samples per collaborator. Defaults to 10.

  • seed (int, optional) – Random numbers generator seed. Defaults to 0.

__init__(alpha=0.5, min_samples_per_col=10, seed=0)[source]#

Initialize.

Parameters:
  • alpha (float) – Dirichlet distribution parameter. Defaults to 0.5.

  • min_samples_per_col (int) – Minimal amount of samples per collaborator. Defaults to 10.

  • seed (int) – Random numbers generator seed. Defaults to 0. For different splits on envoys, try setting different values for this parameter on each shard descriptor.

Methods

__init__([alpha, min_samples_per_col, seed])

Initialize.

split(data, num_collaborators)

Split the data.

split(data, num_collaborators)[source]#

Split the data.