Core Components

Open Federated Learning (OpenFL) has the following components:

Short-Lived Components

These components are terminated when the experiment is finished.

  • The Aggregator which receives model updates from Collaborators and combines them to form the global model.

  • The Collaborator which uses local dataset to train a global model.

The Aggregator is framework-agnostic, as it operates tensors in OpenFL inner representation, while the Collaborator can use deep learning frameworks as computational backend, such as TensorFlow* or PyTorch*.

Aggregator

The Aggregator is a short-lived entity, which means that its lifespan is limited by the experiment execution time. It orchestrates Collaborators according to the FL plan, performs model aggregation at the end of each round, and acts as a parameter server for collaborators.

Model weight aggregation logic may be customized via plugin mechanism.

The Aggregator is spawned by the Director when a new experiment is submitted.

Collaborator

The Collaborator is a short-lived entity that manages training the model on local data, which includes

  • executing assigned tasks,

  • converting deep learning framework-specific tensor objects to OpenFL inner representation, and

  • exchanging model parameters with the Aggregator.

The Collaborator is created by the Envoy when a new experiment is submitted in the Director-based workflow. The Collaborator should be started from CLI if a user follows the Aggregator-based workflow

Every Collaborator is a unique service. The data loader is loaded with a local shard descriptor to perform tasks included in an FL experiment. At the end of the training task, weight tensors are extracted and sent to the central node and aggregated.

Converting tensor objects is handled by framework adapter plugins. Included in OpenFL are framework adapters for PyTorch and TensorFlow 2.x. The list of framework adapters is extensible. User can contribute new framework adapters for deep learning frameworks they would like see supported in OpenFL.

Long-Lived Components

These components were introduced to support the Director-based workflow.

  • The Director is the central node of the federation. This component starts an Aggregator for each experiment, broadcasts experiment archive to connected collaborator nodes, and provides updates on the status.

  • The Envoy runs on collaborator nodes and is always connected to the Director. When the Director starts an experiment, the Envoy starts the Collaborator to train the global model.

These components stay available to distribute several of experiments in the federation.

Director

The Director is a long-lived entity and is the central node of the federation. It accepts connections from:

  • Frontend clients (data scientists using Interactive Python API (Beta))

  • Envoys, if their Shard Descriptors are complient to the same data interface

The Director supports concurrent frontend connections. While the Director may take in several experiments, the experiments are executed in series.

When an experiment is reported, the Director starts an Aggregator and sends the experiment data to involved Envoys. While an experiment is running, the Director oversees the Aggregator and delivers updates on the status of the experiment, which includes trained model snapshots and metrics by request.

Envoy

The Envoy is a long-lived entity that runs on collaborator nodes connected to the Director.

Every Envoy is matched to one shard descriptor in order to run. When the Director starts an experiment, the Envoy accepts the experiment workspace, prepares the environment, and starts a Collaborator.

The envoy is also responsible for sending heartbeat messages to the Director. These messages may also include information regarding collaborator machine resource utilization. Refer to device monitor plugin for details.

Static Diagram

../../_images/director_workflow.svg