Data Access¶
1. Description¶
-
The storage tables of fate are identified by table name and namespace.
-
fate provides an upload component for users to upload data to a storage system supported by the fate compute engine.
-
If the user's data already exists in a storage system supported by fate, the storage information can be mapped to a fate storage table by table bind.
-
If the table bind's table storage type is not consistent with the current default engine, the reader component will automatically convert the storage type;
2. data upload¶
Used to upload the input data for the modeling task to the storage system supported by fate
flow data upload -c ${conf_path}
Note: conf_path is the parameter path, the specific parameters are as follows
Options
parameter name | required | type | description |
---|---|---|---|
file | yes | string | data storage path |
id_delimiter | yes | string | Data separator, e.g. "," |
head | no | int | Whether the data has a table header |
partition | yes | int | Number of data partitions |
storage_engine | no | storage engine type | default "EGGROLL", also support "HDFS", "LOCALFS", "HIVE", etc. |
namespace | yes | string | table namespace |
table_name | yes | string | table name |
storage_address | no | object | The storage address of the corresponding storage engine is required |
use_local_data | no | int | The default is 1, which means use the data from the client's machine; 0 means use the data from the fate flow service's machine. |
drop | no | int | Whether to overwrite uploads |
extend_sid | no | bool | Whether to add a new column for uuid id, default False |
auto_increasing_sid | no | bool | Whether the new id column is self-increasing (will only work if extend_sid is True), default False |
Example
- eggroll
{
"file": "examples/data/breast_hetero_guest.csv",
"id_delimiter": ",",
"head": 1,
"partition": 10,
"namespace": "experiment",
"table_name": "breast_hetero_guest",
"storage_engine": "EGGROLL"
}
- hdfs
{
"file": "examples/data/breast_hetero_guest.csv",
"id_delimiter": ",",
"head": 1,
"partition": 10,
"namespace": "experiment",
"table_name": "breast_hetero_guest",
"storage_engine": "HDFS"
}
- localfs
{
"file": "examples/data/breast_hetero_guest.csv",
"id_delimiter": ",",
"head": 1,
"partition": 4,
"namespace": "experiment",
"table_name": "breast_hetero_guest",
"storage_engine": "LOCALFS"
}
return parameters
parameter name | type | description |
---|---|---|
jobId | string | task id |
retcode | int | return code |
retmsg | string | return message |
data | object | return data |
Example
{
"data": {
"board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202111081218319075660&role=local&party_id=0",
"code": 0,
"dsl_path": "/data/projects/fate/jobs/202111081218319075660/job_dsl.json",
"job_id": "202111081218319075660",
"logs_directory": "/data/projects/fate/logs/202111081218319075660",
"message": "success",
"model_info": {
"model_id": "local-0#model",
"model_version": "202111081218319075660"
},
"namespace": "experiment",
"pipeline_dsl_path": "/data/projects/fate/jobs/202111081218319075660/pipeline_dsl.json",
"runtime_conf_on_party_path": "/data/projects/fate/jobs/202111081218319075660/local/0/job_runtime_on_party_conf.json",
"runtime_conf_path":"/data/projects/fate/jobs/202111081218319075660/job_runtime_conf.json",
"table_name": "breast_hetero_host",
"train_runtime_conf_path":"/data/projects/fate/jobs/202111081218319075660/train_runtime_conf.json"
},
"jobId": "202111081218319075660",
"retcode": 0,
"retmsg": "success"
}
3. table binding¶
Real storage addresses can be mapped to fate storage tables via table bind
flow table bind [options]
Note: conf_path is the parameter path, the specific parameters are as follows
Options
parameter name | required | type | description |
---|---|---|---|
name | yes | string | fate table name |
namespace | yes | string | fate table namespace |
engine | yes | string | storage engine, supports "HDFS", "MYSQL", "PATH" |
yes | object | real storage address | |
drop | no | int | Overwrite previous information |
head | no | int | Whether there is a data table header |
id_delimiter | no | string | Data separator |
id_column | no | string | id field |
feature_column | no | array | feature_field |
Example
- hdfs
{
"namespace": "experiment",
"name": "breast_hetero_guest",
"engine": "HDFS",
"address": {
"name_node": "hdfs://fate-cluster",
"path": "/data/breast_hetero_guest.csv"
},
"id_delimiter": ",",
"head": 1,
"partitions": 10
}
- mysql
{
"engine": "MYSQL",
"address": {
"user": "fate",
"passwd": "fate",
"host": "127.0.0.1",
"port": 3306,
"db": "experiment",
"name": "breast_hetero_guest"
},
"namespace": "experiment",
"name": "breast_hetero_guest",
"head": 1,
"id_delimiter": ",",
"partitions": 10,
"id_column": "id",
"feature_column": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9"
}
- PATH
{
"namespace": "xxx",
"name": "xxx",
"engine": "PATH",
"address": {
"path": "xxx"
}
}
parameter name | type | description |
---|---|---|
retcode | int | return code |
retmsg | string | return message |
data | object | return data |
Sample
{
"data": {
"namespace": "xxx",
"table_name": "xxx"
},
"retcode": 0,
"retmsg": "success"
}
4. table information query¶
Query information about the fate table (real storage address, number, schema, etc.)
flow table info [options]
Options
parameter name | required | type | description |
---|---|---|---|
name | yes | string | fate table name |
namespace | yes | string | fate table namespace |
return parameters
parameter name | type | description |
---|---|---|
retcode | int | return code |
retmsg | string | return message |
data | object | return data |
Sample
{
"data": {
"address": {
"home": null,
"name": "breast_hetero_guest",
"namespace": "experiment"
},
"count": 569,
"exists": 1,
"namespace": "experiment",
"partition": 4,
"schema": {
"header": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9",
"sid": "id"
},
"table_name": "breast_hetero_guest"
},
"retcode": 0,
"retmsg": "success"
}
5. Delete table data¶
You can delete table data with table delete
flow table delete [options]
Options
parameter name | required | type | description |
---|---|---|---|
name | yes | string | fate table name |
namespace | yes | string | fate table namespace |
return parameters
parameter name | type | description |
---|---|---|
retcode | int | return code |
retmsg | string | return message |
data | object | return data |
Sample
{
"data": {
"namespace": "xxx",
"table_name": "xxx"
},
"retcode": 0,
"retmsg": "success"
}
6. Download data¶
Brief description:
Used to download data from within the fate storage engine to file format data
flow data download -c ${conf_path}
Note: conf_path is the parameter path, the specific parameters are as follows
Options
parameter name | required | type | description |
---|---|---|---|
output_path | yes | string | download_path |
table_name | yes | string | fate table name |
namespace | yes | int | fate table namespace |
Example:
{
"output_path": "/data/projects/fate/breast_hetero_guest.csv",
"namespace": "experiment",
"table_name": "breast_hetero_guest"
}
return parameters
parameter name | type | description |
---|---|---|
retcode | int | return code |
retmsg | string | return message |
data | object | return data |
Example
{
"data": {
"board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202111081457135282090&role=local&party_id=0",
"code": 0,
"dsl_path": "/data/projects/fate/jobs/202111081457135282090/job_dsl.json",
"job_id": "202111081457135282090",
"logs_directory": "/data/projects/fate/logs/202111081457135282090",
"message": "success",
"model_info": {
"model_id": "local-0#model",
"model_version": "202111081457135282090"
},
"pipeline_dsl_path": "/data/projects/fate/jobs/202111081457135282090/pipeline_dsl.json",
"runtime_conf_on_party_path": "/data/projects/fate/jobs/202111081457135282090/local/0/job_runtime_on_party_conf.json",
"runtime_conf_path": "/data/projects/fate/jobs/202111081457135282090/job_runtime_conf.json",
"train_runtime_conf_path": "/data/projects/fate/jobs/202111081457135282090/train_runtime_conf.json"
},
"jobId": "202111081457135282090",
"retcode": 0,
"retmsg": "success"
}
7. reader component¶
Brief description:
- The reader component is a data input component of fate;
- The reader component converts input data into data of the specified storage type;
Parameter configuration:
The input table of the reader is configured in the conf when submitting the job:
{
"role": {
"guest": {
"0": { "reader_0": { "table": { "name": "breast_hetero_guest", "namespace": "experiment"}
}
}
}
Component Output
The output data storage engine of the component is determined by the configuration file conf/service_conf.yaml, with the following configuration items:
default_engines:
storage: eggroll
- The computing engine and storage engine have certain support dependencies on each other, the list of dependencies is as follows.
computing_engine | storage_engine |
---|---|
standalone | standalone |
eggroll | eggroll |
spark | hdfs(distributed), localfs(standalone) |
- The reader component's input data storage type supports: eggroll, hdfs, localfs, mysql, path, etc;
- reader component output data type is determined by default_engines.storage configuration (except for path)