InfoSec Overview

InfoSec Overview#

Purpose#

This document provides the information needed when evaluating OpenFL for real world deployment in highly sensitive environments. The target audience is InfoSec reviewers who need detailed information about code contents, communication traffic, and potential exploit vectors.

Network Connectivity Overview#

OpenFL federations use a hub-and-spoke topology between collaborator clients that generate model parameter updates from their data and the aggregator server that combines their training updates into new models [ref]. Key details about this functionality are:

Connections are made using request/response gRPC connections [ref].
The aggregator listens for connections on a single port (usually decided by the experiment admin), and is explicitly defined in the FL plan (f.e. 50051), so all collaborators must be able to send outgoing traffic to this port.
All connections are initiated by the collaborator, i.e., a pull architecture [ref].
The collaborator does not open any listening sockets.
Connections are secured using mutually-authenticated TLS [ref].
Each request response pair is done on a new TLS connection.
The PKI for federations can be created using the OpenFL CLI. OpenFL internally leverages Python’s cryptography module. The organization hosting the aggregator usually acts as the Certificate Authority (CA) and verifies each identity before signing.
Currently, the collaborator polls the aggregator at a fixed interval. We have had a request to enable client-side configuration of this interval and hope to support that feature soon.
Connection timeouts are set to gRPC defaults.
If the aggregator is not available, the collaborator will retry connections indefinitely. This is currently useful so that we can take the aggregator down for bugfixes without collaborator processes exiting.

Overview of Contents of Network Messages#

Network messages are well defined protobufs which can be found in the following files:

Key points about the network messages/protocol:

No executable code is ever sent to the collaborator. All code to be executed is contained within the OpenFL package and the custom FL workspace. The code, along with the FL plan file that specifies the classes and initial parameters to be used, is available for review prior to the FL plans execution. This ensures that all potential operations are understood before they take place.
The collaborator typically requests the FL tasks to execute from the aggregator via the GetTasksRequest message [ref]
The aggregator reads the FL plan and returns a GetTasksResponse [ref] which includes metadata (Tasks) [ref] about the Python functions to be invoked by the collaborator (the code being installed locally as part of a pre-distributed workspace bundle)
The collaborator then uses its TaskRunner framework to execute the FL tasks on the locally available data, producing output tensors such as model weights or metrics
During task execution, the collaborator may additionally request tensors from the aggregator via the GetAggregatedTensor RPC method [ref]
Upon task completion, the collaborator transmits the results by emitting a SendLocalTaskResults call [ref] which contains NamedTensor [ref] objects that encode model weight updates or ML metrics such as loss or accuracy (among others).

Testing a Collaborator#

There is a “no-op” workspace template in OpenFL (available in versions >=1.9) which can be used to test the network connection between the aggregator and each collaborator without performing any computational task. More details can be found here.