openfl.federated.data.federated_data.FederatedDataSet#
- class openfl.federated.data.federated_data.FederatedDataSet(X_train, y_train, X_valid, y_valid, batch_size=1, num_classes=None, train_splitter=None, valid_splitter=None)[source]#
Bases:
PyTorchDataLoaderA Data Loader class used to represent a federated dataset for in-memory Numpy data.
- Parameters:
train_splitter (NumPyDataSplitter)
valid_splitter (NumPyDataSplitter)
- train_splitter#
An object that splits the training data.
- Type:
- valid_splitter#
An object that splits the validation data.
- Type:
- __init__(X_train, y_train, X_valid, y_valid, batch_size=1, num_classes=None, train_splitter=None, valid_splitter=None)[source]#
Initializes the FederatedDataSet object.
- Parameters:
X_train (np.array) – The training features.
y_train (np.array) – The training labels.
X_valid (np.array) – The validation features.
y_valid (np.array) – The validation labels.
batch_size (int, optional) – The batch size for the data loader. Defaults to 1.
num_classes (int, optional) – The number of classes the model will be trained on. Defaults to None.
train_splitter (NumPyDataSplitter, optional) – The object that splits the training data. Defaults to None.
valid_splitter (NumPyDataSplitter, optional) – The object that splits the validation data. Defaults to None.
Methods
__init__(X_train, y_train, X_valid, y_valid)Initializes the FederatedDataSet object.
Returns the shape of an example feature array.
Returns the data loader for inferencing data.
Returns the total number of training samples.
get_train_loader([batch_size, num_batches])Returns the data loader for the training data.
Returns the total number of validation samples.
get_valid_loader([batch_size])Returns the data loader for the validation data.
split(num_collaborators)Splits the dataset into equal parts for each collaborator and returns a list of FederatedDataSet objects.
Attributes
- get_feature_shape()[source]#
Returns the shape of an example feature array.
- Returns:
The shape of an example feature array.
- Return type:
tuple
- get_infer_loader()[source]#
Returns the data loader for inferencing data.
- Raises:
NotImplementedError – This method must be implemented by a child class.
- get_train_data_size()[source]#
Returns the total number of training samples.
- Returns:
The total number of training samples.
- Return type:
int
- get_train_loader(batch_size=None, num_batches=None)[source]#
Returns the data loader for the training data.
- Parameters:
batch_size (int, optional) – The batch size for the data loader (default is None).
num_batches (int, optional) – The number of batches for the data loader (default is None).
- Returns:
The DataLoader object for the training data.
- Return type:
- get_valid_data_size()[source]#
Returns the total number of validation samples.
- Returns:
The total number of validation samples.
- Return type:
int
- get_valid_loader(batch_size=None)[source]#
Returns the data loader for the validation data.
- Parameters:
batch_size (int, optional) – The batch size for the data loader (default is None).
- Returns:
The DataLoader object for the validation data.
- Return type:
- split(num_collaborators)[source]#
Splits the dataset into equal parts for each collaborator and returns a list of FederatedDataSet objects.
- Parameters:
num_collaborators (int) – The number of collaborators to split the dataset between.
- Returns:
- A list of FederatedDataSet objects, each
representing a slice of the dataset for a collaborator.
- Return type:
FederatedDataSets (list)