Overall Design¶

1. Logical Architecture¶

DSL defined jobs
Top-down vertical subtask flow scheduling, multi-participant joint subtask coordination
Independent isolated task execution work processes
Support for multiple types and versions of components
Computational abstraction API
Storage abstraction API
Cross-party transfer abstraction API

Stripping state (resources, jobs) and managers (schedulers, resource managers)
Resource state and job state are persisted in MySQL and shared globally to provide reliable transactional operations
Improve the high availability and scalability of managed services
Jobs can be intervened to support restart, rerun, parallel control, resource isolation, etc.

The total resource size of each engine is configured through the configuration file, and the system is subsequently interfaced
The cores_per_node in the total resource size indicates the number of cpu cores per compute node, and nodes indicates the number of compute nodes.
FATEFlow server reads the resource size configuration from the configuration file when it starts and registers the update to the database
The resources are requested in Job dimension, and take effect when Job Conf is submitted, formula: task_parallelism*task_cores
See separate section of the documentation for details

Definition
metric type: metric type, such as auc, loss, ks, etc.
metric namespace: custom metric namespace, e.g. train, predict
metric name: custom metric name, e.g. auc0, hetero_lr_auc0
metric data: metric data in key-value form
metric meta: metric meta information in key-value form, support flexible drawing
API
log_metric_data(metric_namespace, metric_name, metrics)
set_metric_meta(metric_namespace, metric_name, metric_meta)
get_metric_data(metric_namespace, metric_name)
get_metric_meta(metric_namespace, metric_name)

Using Google Protocol Buffer as the model storage protocol, using cross-language sharing, each algorithmic model consists of two parts: ModelParam & ModelMeta
A Pipeline generates a series of algorithmic models
The model named Pipeline stores Pipeline modeling DSL and online inference DSL
Under federal learning, model consistency needs to be guaranteed for all participants, i.e., model binding
model_key is the model identifier defined by the user when submitting the task
The model IDs of the federated parties are the party identification information role, party_id, plus model_key
The model version of the federated parties must be unique and consistent, and FATE-Flow directly sets it to job_id

Upload.
External storage is imported directly to FATE Storage, creating a new DTable
When the job runs, Reader reads directly from Storage
Table Bind.
Key the external storage address to a new DTable in FATE
When the job is running, Reader reads data from external storage via Meta and transfers it to FATE Storage
Connecting to the Big Data ecosystem: HDFS, Hive/MySQL