Metric Logging Callback
By default, both the director based flow and the taskrunner API support Tensorboard to log metrics.
Once the experiment is over, the logs can be invoked from the workspace with tensorboard --logdir logs
. The metrics that are logged by default are:
Aggregated model validation accuracy (Aggregator/aggregated_model_validate/acc, validate_agg/aggregated_model_validate/acc)
Locally tuned model validation accuracy (Aggregator/locally_tuned_model_validate/acc, validate_local/locally_tuned_model_validate/acc)
Train loss (Aggregator/train/train_loss, trained/train/train_loss)
You can also use custom metric logging function for each task via Python* API or command line interface. This function calls on the aggregator node.
Python API
For logging metrics through Tensorboard, once fl_experiment.stream_metrics()
is called from the frontend API, it saves logs in the tensorboard format.
After the experiment has finished, the logs can be invoked from the workspace with tensorboard --logdir logs
.
You could also add your custom metric logging function by defining the function with the follow signature:
def callback_name(node_name, task_name, metric_name, metric, round_number):
"""
Write metric callback
Args:
node_name (str): Name of node, which generate metric
task_name (str): Name of task
metric_name (str): Name of metric
metric (np.ndarray): Metric value
round_number (int): Round number
"""
your code
Example of MLFlow’s Metric Callback
This example shows how to use MLFlow logger to log metrics:
import mlflow
def callback_name(node_name, task_name, metric_name, metric, round_number):
"""
Write metric callback
Args:
node_name (str): Name of node, which generate metric
task_name (str): Name of task
metric_name (str): Name of metric
metric (np.ndarray): Metric value
round_number (int): Round number
"""
mlflow.log_metrics({f'{node_name}/{task_name}/{metric_name}': float(metric), 'round_number': round_number})
You could view the log results either through UI interactively by typing mlflow ui
or through the use of MLflowClient
. By default, only the last logged value of the metric is returned.
If you want to retrieve all the values of a given metric, uses mlflow.get_metric_history
method.
import mlflow
client = mlflow.tracking.MlflowClient()
print(client.get_metric_history("<RUN ID>", "validate_local/locally_tuned_model_validation/accuracy"))
Command Line Interface
For logging through Tensorboard, enable the parameter write_logs : true
in aggregator’s plan settings :
aggregator :
template : openfl.component.Aggregator
settings :
write_logs : true
Follow the steps below to write your custom callback function instead. As an example, a full implementation can be found at Federated_Pytorch_MNIST_Tutorial.ipynb and in the torch_cnn_mnist workspace.
Define the callback function, like how you defined in Python API, in the src directory in your workspace.
Provide a way to your function with the
log_metric_callback
key in theaggregator
section of the plan.yaml file in your workspace.
aggregator :
defaults : plan/defaults/aggregator.yaml
template : openfl.component.Aggregator
settings :
init_state_path : save/torch_cnn_mnist_init.pbuf
best_state_path : save/torch_cnn_mnist_best.pbuf
last_state_path : save/torch_cnn_mnist_last.pbuf
rounds_to_train : 10
write_logs : true
log_metric_callback :
template : src.mnist_utils.callback_name
Example of TensorBoard’s Metric Callback
The following is an example of a log metric callback, which writes metric values to the TensorBoard.
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('./logs/cnn_mnist', flush_secs=5)
def write_metric(node_name, task_name, metric_name, metric, round_number):
writer.add_scalar("{}/{}/{}".format(node_name, task_name, metric_name),
metric, round_number)