openfl.federated.data.federated_data.FederatedDataSet

class openfl.federated.data.federated_data.FederatedDataSet(X_train, y_train, X_valid, y_valid, batch_size=1, num_classes=None, train_splitter=None, valid_splitter=None)

Bases: PyTorchDataLoader

A Data Loader class used to represent a federated dataset for in-memory Numpy data.

Class Attributes:

train_splitter (NumPyDataSplitter) – An object that splits the training data.
valid_splitter (NumPyDataSplitter) – An object that splits the validation data.

Methods

`get_feature_shape`	Returns the shape of an example feature array.
`get_infer_loader`	Returns the data loader for inferencing data.
`get_train_data_size`	Returns the total number of training samples.
`get_train_loader`	Returns the data loader for the training data.
`get_valid_data_size`	Returns the total number of validation samples.
`get_valid_loader`	Returns the data loader for the validation data.
`split`	Splits the dataset into equal parts for each collaborator and returns a list of FederatedDataSet objects.

Attributes

`train_splitter`
`valid_splitter`

get_feature_shape()

Returns the shape of an example feature array.

Returns:: tuple – The shape of an example feature array.

get_infer_loader()

Returns the data loader for inferencing data.

Raises:: NotImplementedError – This method must be implemented by a child class.

get_train_data_size()

Returns the total number of training samples.

Returns:: int – The total number of training samples.

get_train_loader(batch_size=None, num_batches=None)

Returns the data loader for the training data.

Parameters:

batch_size (int, optional) – The batch size for the data loader (default is None).
num_batches (int, optional) – The number of batches for the data loader (default is None).

Returns:

DataLoader – The DataLoader object for the training data.

get_valid_data_size()

Returns the total number of validation samples.

Returns:: int – The total number of validation samples.

get_valid_loader(batch_size=None)

Returns the data loader for the validation data.

Parameters:: batch_size (int, optional) – The batch size for the data loader (default is None).
Returns:: DataLoader – The DataLoader object for the validation data.

split(num_collaborators)

Splits the dataset into equal parts for each collaborator and returns a list of FederatedDataSet objects.

Parameters:: num_collaborators (int) – The number of collaborators to split the dataset between.
Returns:: FederatedDataSets (list) – A list of FederatedDataSet objects, each representing a slice of the dataset for a collaborator.