Data Access¶
1. Description¶
-
The storage tables of fate are identified by table name and namespace.
-
fate provides an upload component for users to upload data to a storage system supported by the fate compute engine.
-
If the user's data already exists in a storage system supported by fate, the storage information can be mapped to a fate storage table by table bind.
-
If the table bind's table storage type is not consistent with the current default engine, the reader component will automatically convert the storage type;
2. data upload¶
Used to upload the input data for the modeling task to the storage system supported by fate
flow data upload -c ${conf_path}
Note: conf_path is the parameter path, the specific parameters are as follows
Options
parameter name | required | type | description |
---|---|---|---|
file | yes | string | data storage path |
id_delimiter | yes | string | Data separator, e.g. "," |
head | no | int | Whether the data has a table header |
partition | yes | int | Number of data partitions |
storage_engine | no | string | storage engine type, default "EGGROLL", also support "HDFS", "LOCALFS", "HIVE", etc. |
namespace | yes | string | table namespace |
table_name | yes | string | table name |
storage_address | no | object | The storage address of the corresponding storage engine is required |
use_local_data | no | int | The default is 1, which means use the data from the client's machine; 0 means use the data from the fate flow service's machine. |
drop | no | int | Whether to overwrite uploads |
extend_sid | no | bool | Whether to add a new column for uuid id, default False |
auto_increasing_sid | no | bool | Whether the new id column is self-increasing (will only work if extend_sid is True), default False |
mete information
parameter name | required | type | description |
---|---|---|---|
input_format | no | string | The format of the data (danse, svmlight, tag:value), used to determine |
delimiter | no | string | The data separator, default "," |
tag_with_value | no | bool | Valid for tag data format, whether to carry value |
tag_value_delimiter | no | string | tag:value data separator, default ":" |
with_match_id | no | bool | Whether or not to carry match id |
with_match_id | no | object | The name of the id column, effective when extend_sid is enabled, e.g., ["email", "phone"] |
id_range | no | object | For tag/svmlight format data, which columns are ids |
exclusive_data_type | no | string | The format of the special type data columns |
data_type | no | string | Column data type, default "float64 |
with_label | no | bool | Whether to have a label, default False |
label_name | no | string | The name of the label, default "y" |
label_type | no | string | Label type, default "int" |
In version 1.9.0 and later, passing in the meta parameter will generate anonymous information about the feature. Example
- eggroll
{
"file": "examples/data/breast_hetero_guest.csv",
"id_delimiter": ",",
"head": 1,
"partition": 10,
"namespace": "experiment",
"table_name": "breast_hetero_guest",
"storage_engine": "EGGROLL"
}
- hdfs
{
"file": "examples/data/breast_hetero_guest.csv",
"id_delimiter": ",",
"head": 1,
"partition": 10,
"namespace": "experiment",
"table_name": "breast_hetero_guest",
"storage_engine": "HDFS"
}
- localfs
{
"file": "examples/data/breast_hetero_guest.csv",
"id_delimiter": ",",
"head": 1,
"partition": 4,
"namespace": "experiment",
"table_name": "breast_hetero_guest",
"storage_engine": "LOCALFS"
}
return parameters
parameter name | type | description |
---|---|---|
jobId | string | job id |
retcode | int | return code |
retmsg | string | return message |
data | object | return data |
Example
{
"data": {
"board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202111081218319075660&role=local&party_id=0",
"code": 0,
"dsl_path": "/data/projects/fate/jobs/202111081218319075660/job_dsl.json",
"job_id": "202111081218319075660",
"logs_directory": "/data/projects/fate/logs/202111081218319075660",
"message": "success",
"model_info": {
"model_id": "local-0#model",
"model_version": "202111081218319075660"
},
"namespace": "experiment",
"pipeline_dsl_path": "/data/projects/fate/jobs/202111081218319075660/pipeline_dsl.json",
"runtime_conf_on_party_path": "/data/projects/fate/jobs/202111081218319075660/local/0/job_runtime_on_party_conf.json",
"runtime_conf_path":"/data/projects/fate/jobs/202111081218319075660/job_runtime_conf.json",
"table_name": "breast_hetero_host",
"train_runtime_conf_path":"/data/projects/fate/jobs/202111081218319075660/train_runtime_conf.json"
},
"jobId": "202111081218319075660",
"retcode": 0,
"retmsg": "success"
}
3. table binding¶
Real storage addresses can be mapped to fate storage tables via table bind
flow table bind [options]
options
parameters | short format | long format | required | type | description |
---|---|---|---|---|---|
conf_path | -c |
--conf-path |
yes | string | configuration-path |
Note: conf_path is the parameter path, the specific parameters are as follows
parameter_name | required | type | description |
---|---|---|---|
name | yes | string | fate table name |
namespace | yes | string | fate table namespace |
engine | yes | string | storage engine, supports "HDFS", "MYSQL", "PATH" |
yes | object | real storage address | |
drop | no | int | Overwrite previous information |
head | no | int | Whether there is a data table header |
id_delimiter | no | string | Data separator |
id_column | no | string | id field |
feature_column | no | array | feature_field |
mete information
parameter name | required | type | description |
---|---|---|---|
input_format | no | string | The format of the data (danse, svmlight, tag:value), used to determine |
delimiter | no | string | The data separator, default "," |
tag_with_value | no | bool | Valid for tag data format, whether to carry value |
tag_value_delimiter | no | string | tag:value data separator, default ":" |
with_match_id | no | bool | Whether or not to carry match id |
with_match_id | no | object | The name of the id column, effective when extend_sid is enabled, e.g., ["email", "phone"] |
id_range | no | object | For tag/svmlight format data, which columns are ids |
exclusive_data_type | no | string | The format of the special type data columns |
data_type | no | string | Column data type, default "float64 |
with_label | no | bool | Whether to have a label, default False |
label_name | no | string | The name of the label, default "y" |
label_type | no | string | Label type, default "int" |
In version 1.9.0 and later, if the meta parameter is passed in during the table bind phase, no anonymous information about the feature is generated directly. The feature anonymization information of the original data will be updated after the data has passed through the reader component once
Sample
- hdfs
{
"namespace": "experiment",
"name": "breast_hetero_guest",
"engine": "HDFS",
"address": {
"name_node": "hdfs://fate-cluster",
"path": "/data/breast_hetero_guest.csv"
},
"id_delimiter": ",",
"head": 1,
"partitions": 10
}
- mysql
{
"engine": "MYSQL",
"address": {
"user": "fate",
"passwd": "fate",
"host": "127.0.0.1",
"port": 3306,
"db": "experiment",
"name": "breast_hetero_guest"
},
"namespace": "experiment",
"name": "breast_hetero_guest",
"head": 1,
"id_delimiter": ",",
"partitions": 10,
"id_column": "id",
"feature_column": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9"
}
- PATH
{
"namespace": "xxx",
"name": "xxx",
"engine": "PATH",
"address": {
"path": "xxx"
}
}
parameter name | type | description |
---|---|---|
retcode | int | return code |
retmsg | string | return information |
data | object | return data |
Sample
{
"data": {
"namespace": "xxx",
"table_name": "xxx"
},
"retcode": 0,
"retmsg": "success"
}
4. table information query¶
Query information about the fate table (real storage address, number, schema, etc.)
flow table info [options]
options
parameters | short-format | long-format | required | type | description |
---|---|---|---|---|---|
table_name | -t |
--table-name |
yes | string | fate table name |
namespace | -n |
--namespace |
yes | string | fate table namespace |
returns | parameter name | type | description | | :------ | :----- | -------- | | retcode | int | return code | | retmsg | string | return information | | data | object | return data |
Sample
{
"data": {
"address": {
"home": null,
"name": "breast_hetero_guest",
"namespace": "experiment"
},
"count": 569,
"exists": 1,
"namespace": "experiment",
"partition": 4,
"schema": {
"header": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9",
"sid": "id"
},
"table_name": "breast_hetero_guest"
},
"retcode": 0,
"retmsg": "success"
}
5. Delete table data¶
You can delete table data with table delete
flow table delete [options]
Options
parameters | short-format | long-format | required | type | description |
---|---|---|---|---|---|
table_name | -t |
--table-name |
yes | string | fate table name |
namespace | -n |
--namespace |
yes | string | fate table namespace |
returns
parameter name | type | description |
---|---|---|
retcode | int | return code |
retmsg | string | return message |
data | object | return data |
Sample
{
"data": {
"namespace": "xxx",
"table_name": "xxx"
},
"retcode": 0,
"retmsg": "success"
}
6. Download data¶
Brief description:
Used to download data from within the fate storage engine to file format data
flow data download -c ${conf_path}
Note: conf_path is the parameter path, the specific parameters are as follows
Options
parameter name | required | type | description |
---|---|---|---|
output_path | yes | string | download_path |
table_name | yes | string | fate table name |
namespace | yes | int | fate table namespace |
Example:
{
"output_path": "/data/projects/fate/breast_hetero_guest.csv",
"namespace": "experiment",
"table_name": "breast_hetero_guest"
}
return parameters
parameter name | type | description |
---|---|---|
jobId | string | job id |
retcode | int | return code |
retmsg | string | return message |
data | object | return data |
Example
{
"data": {
"board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202111081457135282090&role=local&party_id=0",
"code": 0,
"dsl_path": "/data/projects/fate/jobs/202111081457135282090/job_dsl.json",
"job_id": "202111081457135282090",
"logs_directory": "/data/projects/fate/logs/202111081457135282090",
"message": "success",
"model_info": {
"model_id": "local-0#model",
"model_version": "202111081457135282090"
},
"pipeline_dsl_path": "/data/projects/fate/jobs/202111081457135282090/pipeline_dsl.json",
"runtime_conf_on_party_path": "/data/projects/fate/jobs/202111081457135282090/local/0/job_runtime_on_party_conf.json",
"runtime_conf_path": "/data/projects/fate/jobs/202111081457135282090/job_runtime_conf.json",
"train_runtime_conf_path": "/data/projects/fate/jobs/202111081457135282090/train_runtime_conf.json"
},
"jobId": "202111081457135282090",
"retcode": 0,
"retmsg": "success"
}
7. disable data¶
Tables can be made unavailable by table disable
flow table disable [options]
Options
parameters | short-format | long-format | required | type | description |
---|---|---|---|---|---|
table_name | -t |
--table-name |
yes | string | fate table name |
namespace | -n |
--namespace |
yes | string | fate table namespace |
returns
parameter name | type | description |
---|---|---|
retcode | int | return code |
retmsg | string | return information |
data | object | return data |
Sample
{
"data": {
"namespace": "xxx",
"table_name": "xxx"
},
"retcode": 0,
"retmsg": "success"
}
8. enable data¶
Tables can be made available with table enable
flow table enable [options]
Options
parameters | short-format | long-format | required | type | description |
---|---|---|---|---|---|
table_name | -t |
--table-name |
yes | string | fate table name |
namespace | -n |
--namespace |
yes | string | fate table namespace |
returns
parameter name | type | description |
---|---|---|
retcode | int | return code |
retmsg | string | return information |
data | object | return data |
Sample
{
"data": [{
"namespace": "xxx",
"table_name": "xxx"
}],
"retcode": 0,
"retmsg": "success"
}
9. delete disable data¶
Tables that are currently unavailable can be deleted with disable-delete
flow table disable-delete
return
parameter name | type | description |
---|---|---|
retcode | int | return-code |
retmsg | string | return information |
data | object | return data |
Sample
{
"data": [
{
"namespace": "xxx",
"table_name": "xxx"
},
{
"namespace": "xxx",
"table_name": "xxx"
}
],
"retcode": 0,
"retmsg": "success"
}
10. Writer¶
Brief description:
Used to download data from the fate storage engine to the external engine or to save data as a new table
flow data writer -c ${conf_path}
Note: conf_path is the parameter path, the specific parameters are as follows
Options
parameter name | required | type | description |
---|---|---|---|
table_name | yes | string | fate table name |
namespace | yes | int | fate table namespace |
storage_engine | no | string | Storage type, e.g., MYSQL |
address | no | object | storage_address |
output_namespace | no | string | Save as a table namespace for fate |
output_name | no | string | Save as fate's table name |
**Note: storage_engine, address are combined parameters that provide storage to the specified engine. | |||
output_namespace, output_name are also combined parameters, providing the function to save as a new table of the same engine** |
Example:
{
"table_name": "name1",
"namespace": "namespace1",
"output_name": "name2",
"output_namespace": "namespace2"
}
return
parameter name | type | description |
---|---|---|
jobId | string | job id |
retcode | int | return code |
retmsg | string | return information |
data | object | return data |
Example
{
"data": {
"board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202201121235115028490&role=local&party_id=0",
"code": 0,
"dsl_path": "/data/projects/fate/fateflow/jobs/202201121235115028490/job_dsl.json",
"job_id": "202201121235115028490",
"logs_directory": "/data/projects/fate/fateflow/logs/202201121235115028490",
"message": "success",
"model_info": {
"model_id": "local-0#model",
"model_version": "202201121235115028490"
},
"pipeline_dsl_path": "/data/projects/fate/fateflow/jobs/202201121235115028490/pipeline_dsl.json",
"runtime_conf_on_party_path": "/data/projects/fate/fateflow/jobs/202201121235115028490/local/0/job_runtime_on_party_conf.json",
"runtime_conf_path":"/data/projects/fate/fateflow/jobs/202201121235115028490/job_runtime_conf.json",
"train_runtime_conf_path": "/data/projects/fate/fateflow/jobs/202201121235115028490/train_runtime_conf.json"
},
"jobId": "202201121235115028490",
"retcode": 0,
"retmsg": "success"
}
11. reader component¶
Brief description:
- The reader component is a data input component of fate;
- The reader component converts input data into data of the specified storage type;
Parameter configuration:
The input table of the reader is configured in the conf when submitting the job:
{
"role": {
"guest": {
"0": {
"reader_0": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
}
}
}
}
}
Component Output
The output data storage engine of the component is determined by the configuration file conf/service_conf.yaml, with the following configuration items:
default_engines:
storage: eggroll
- The computing engine and storage engine have certain support dependencies on each other, the list of dependencies is as follows.
computing_engine | storage_engine |
---|---|
standalone | standalone |
eggroll | eggroll |
spark | hdfs(distributed), localfs(standalone) |
- The reader component's input data storage type supports: eggroll, hdfs, localfs, mysql, path, etc;
- reader component output data type is determined by default_engines.storage configuration (except for path)
12. api-reader¶
Brief description:
- The data input of api-reader component is id, and the data output is feature;
- request parameters can be user-defined, e.g. version number, back month, etc..
- The component will request third-party services, and the third-party services need to implement upload, query, download interfaces and register with the fate flow, which can be referred to api-reader related service registration
Parameter configuration:
Configure the api-reader parameter in the conf when submitting the job:
{
"role": {
"guest": {
"0": { "api_reader_0": {
"server_name": "xxx",
"parameters": { "version": "xxx"},
"id_delimiter": ",",
"head": true
}
}
}
}
}