openfl.federated.data.federated_data.FederatedDataSet#

class openfl.federated.data.federated_data.FederatedDataSet(X_train, y_train, X_valid, y_valid, batch_size=1, num_classes=None, train_splitter=None, valid_splitter=None)[source]#

Bases: PyTorchDataLoader

A Data Loader class used to represent a federated dataset for in-memory Numpy data.

Parameters:
train_splitter#

An object that splits the training data.

Type:

NumPyDataSplitter

valid_splitter#

An object that splits the validation data.

Type:

NumPyDataSplitter

__init__(X_train, y_train, X_valid, y_valid, batch_size=1, num_classes=None, train_splitter=None, valid_splitter=None)[source]#

Initializes the FederatedDataSet object.

Parameters:
  • X_train (np.array) – The training features.

  • y_train (np.array) – The training labels.

  • X_valid (np.array) – The validation features.

  • y_valid (np.array) – The validation labels.

  • batch_size (int, optional) – The batch size for the data loader. Defaults to 1.

  • num_classes (int, optional) – The number of classes the model will be trained on. Defaults to None.

  • train_splitter (NumPyDataSplitter, optional) – The object that splits the training data. Defaults to None.

  • valid_splitter (NumPyDataSplitter, optional) – The object that splits the validation data. Defaults to None.

Methods

__init__(X_train, y_train, X_valid, y_valid)

Initializes the FederatedDataSet object.

get_feature_shape()

Returns the shape of an example feature array.

get_infer_loader()

Returns the data loader for inferencing data.

get_train_data_size()

Returns the total number of training samples.

get_train_loader([batch_size, num_batches])

Returns the data loader for the training data.

get_valid_data_size()

Returns the total number of validation samples.

get_valid_loader([batch_size])

Returns the data loader for the validation data.

split(num_collaborators)

Splits the dataset into equal parts for each collaborator and returns a list of FederatedDataSet objects.

Attributes

get_feature_shape()[source]#

Returns the shape of an example feature array.

Returns:

The shape of an example feature array.

Return type:

tuple

get_infer_loader()[source]#

Returns the data loader for inferencing data.

Raises:

NotImplementedError – This method must be implemented by a child class.

get_train_data_size()[source]#

Returns the total number of training samples.

Returns:

The total number of training samples.

Return type:

int

get_train_loader(batch_size=None, num_batches=None)[source]#

Returns the data loader for the training data.

Parameters:
  • batch_size (int, optional) – The batch size for the data loader (default is None).

  • num_batches (int, optional) – The number of batches for the data loader (default is None).

Returns:

The DataLoader object for the training data.

Return type:

DataLoader

get_valid_data_size()[source]#

Returns the total number of validation samples.

Returns:

The total number of validation samples.

Return type:

int

get_valid_loader(batch_size=None)[source]#

Returns the data loader for the validation data.

Parameters:

batch_size (int, optional) – The batch size for the data loader (default is None).

Returns:

The DataLoader object for the validation data.

Return type:

DataLoader

split(num_collaborators)[source]#

Splits the dataset into equal parts for each collaborator and returns a list of FederatedDataSet objects.

Parameters:

num_collaborators (int) – The number of collaborators to split the dataset between.

Returns:

A list of FederatedDataSet objects, each

representing a slice of the dataset for a collaborator.

Return type:

FederatedDataSets (list)