Metric Logging#
TaskRunner API supports a built-in callback to log metrics to a plain text file, or in a TensorBoard compatible format.
To enable metric logging, you need to set the write_logs parameter in the plan settings of the aggregator component to true. An example of the plan settings is shown below:
aggregator :
template : openfl.component.Aggregator
settings :
write_logs : true
Metrics are captured at the end of each round and written to a file in the format of <node_name>/<task_name>/<metric_name>. The metric values are written in a plain text file, which can be used for further analysis or visualization. These logs are written under logs/.
Example contents of the log file:
To log metrics for visualization on TensorBoard, set the environment variable TENSORBOARD=1 before starting the aggregator/collaborator. Note that this still requires write_logs to be set to true in the plan settings as shown above.
Summaries are written under logs/tensorboard/. To visualize the logs, run the following command in a separate shell:
tensorboard --logdir logs/tensorboard/
You may use a compatible browser and navigate to the provided URL to open the TensorBoard dashboard.
Example of MLFlow’s Metric Callback#
This example shows how to use MLFlow logger to log metrics:
import mlflow
def callback_name(node_name, task_name, metric_name, metric, round_number):
"""
Write metric callback
Args:
node_name (str): Name of node, which generate metric
task_name (str): Name of task
metric_name (str): Name of metric
metric (np.ndarray): Metric value
round_number (int): Round number
"""
mlflow.log_metrics({f'{node_name}/{task_name}/{metric_name}': float(metric), 'round_number': round_number})
You could view the log results either through UI interactively by typing mlflow ui or through the use of MLflowClient. By default, only the last logged value of the metric is returned.
If you want to retrieve all the values of a given metric, uses mlflow.get_metric_history method.
import mlflow
client = mlflow.tracking.MlflowClient()
print(client.get_metric_history("<RUN ID>", "validate_local/locally_tuned_model_validation/accuracy"))
Known issues#
Metric writing via TensorBoard is not supported within enclaves due to lack of full support for pythonic multiprocessing within Gramine.
By default, metrics are only synchronously written to a text file when enabled. Outside enclave environments, you may enable tensorboard logging via TENSORBOARD=1 environment variable. We are assessing ways to synchronously write tensorboard-compatible proto files. If this is a feature you are interested in, or would like to contribute a PR, please create an issue or a pull request.