Skip to content

Data Access

1. Description

  • The storage tables of fate are identified by table name and namespace.

  • fate provides an upload component for users to upload data to a storage system supported by the fate compute engine.

  • If the user's data already exists in a storage system supported by fate, the storage information can be mapped to a fate storage table by table bind.

  • If the table bind's table storage type is not consistent with the current default engine, the reader component will automatically convert the storage type;

2. data upload

Used to upload the input data for the modeling task to the storage system supported by fate

flow data upload -c ${conf_path}

Note: conf_path is the parameter path, the specific parameters are as follows

Options

parameter name required type description
file yes string data storage path
id_delimiter yes string Data separator, e.g. ","
head no int Whether the data has a table header
partition yes int Number of data partitions
storage_engine no string storage engine type, default "EGGROLL", also support "HDFS", "LOCALFS", "HIVE", etc.
namespace yes string table namespace
table_name yes string table name
storage_address no object The storage address of the corresponding storage engine is required
use_local_data no int The default is 1, which means use the data from the client's machine; 0 means use the data from the fate flow service's machine.
drop no int Whether to overwrite uploads
extend_sid no bool Whether to add a new column for uuid id, default False
auto_increasing_sid no bool Whether the new id column is self-increasing (will only work if extend_sid is True), default False

mete information

parameter name required type description
input_format no string The format of the data (danse, svmlight, tag:value), used to determine
delimiter no string The data separator, default ","
tag_with_value no bool Valid for tag data format, whether to carry value
tag_value_delimiter no string tag:value data separator, default ":"
with_match_id no bool Whether or not to carry match id
with_match_id no object The name of the id column, effective when extend_sid is enabled, e.g., ["email", "phone"]
id_range no object For tag/svmlight format data, which columns are ids
exclusive_data_type no string The format of the special type data columns
data_type no string Column data type, default "float64
with_label no bool Whether to have a label, default False
label_name no string The name of the label, default "y"
label_type no string Label type, default "int"

In version 1.9.0 and later, passing in the meta parameter will generate anonymous information about the feature. Example

  • eggroll
{
    "file": "examples/data/breast_hetero_guest.csv",
    "id_delimiter": ",",
    "head": 1,
    "partition": 10,
    "namespace": "experiment",
    "table_name": "breast_hetero_guest",
    "storage_engine": "EGGROLL"
}
  • hdfs
{
    "file": "examples/data/breast_hetero_guest.csv",
    "id_delimiter": ",",
    "head": 1,
    "partition": 10,
    "namespace": "experiment",
    "table_name": "breast_hetero_guest",
    "storage_engine": "HDFS"
}
  • localfs
{
    "file": "examples/data/breast_hetero_guest.csv",
    "id_delimiter": ",",
    "head": 1,
    "partition": 4,
    "namespace": "experiment",
    "table_name": "breast_hetero_guest",
    "storage_engine": "LOCALFS"
}

return parameters

parameter name type description
jobId string job id
retcode int return code
retmsg string return message
data object return data

Example

{
    "data": {
        "board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202111081218319075660&role=local&party_id=0",
        "code": 0,
        "dsl_path": "/data/projects/fate/jobs/202111081218319075660/job_dsl.json",
        "job_id": "202111081218319075660",
        "logs_directory": "/data/projects/fate/logs/202111081218319075660",
        "message": "success",
        "model_info": {
            "model_id": "local-0#model",
            "model_version": "202111081218319075660"
        },
        "namespace": "experiment",
        "pipeline_dsl_path": "/data/projects/fate/jobs/202111081218319075660/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/jobs/202111081218319075660/local/0/job_runtime_on_party_conf.json",
        "runtime_conf_path":"/data/projects/fate/jobs/202111081218319075660/job_runtime_conf.json",
        "table_name": "breast_hetero_host",
        "train_runtime_conf_path":"/data/projects/fate/jobs/202111081218319075660/train_runtime_conf.json"
    },
    "jobId": "202111081218319075660",
    "retcode": 0,
    "retmsg": "success"
}

3. table binding

Real storage addresses can be mapped to fate storage tables via table bind

flow table bind [options]

options

parameters short format long format required type description
conf_path -c --conf-path yes string configuration-path

Note: conf_path is the parameter path, the specific parameters are as follows

parameter_name required type description
name yes string fate table name
namespace yes string fate table namespace
engine yes string storage engine, supports "HDFS", "MYSQL", "PATH"
yes object real storage address
drop no int Overwrite previous information
head no int Whether there is a data table header
id_delimiter no string Data separator
id_column no string id field
feature_column no array feature_field

mete information

parameter name required type description
input_format no string The format of the data (danse, svmlight, tag:value), used to determine
delimiter no string The data separator, default ","
tag_with_value no bool Valid for tag data format, whether to carry value
tag_value_delimiter no string tag:value data separator, default ":"
with_match_id no bool Whether or not to carry match id
with_match_id no object The name of the id column, effective when extend_sid is enabled, e.g., ["email", "phone"]
id_range no object For tag/svmlight format data, which columns are ids
exclusive_data_type no string The format of the special type data columns
data_type no string Column data type, default "float64
with_label no bool Whether to have a label, default False
label_name no string The name of the label, default "y"
label_type no string Label type, default "int"

In version 1.9.0 and later, if the meta parameter is passed in during the table bind phase, no anonymous information about the feature is generated directly. The feature anonymization information of the original data will be updated after the data has passed through the reader component once

Sample

  • hdfs
{
    "namespace": "experiment",
    "name": "breast_hetero_guest",
    "engine": "HDFS",
    "address": {
        "name_node": "hdfs://fate-cluster",
        "path": "/data/breast_hetero_guest.csv"
    },
    "id_delimiter": ",",
    "head": 1,
    "partitions": 10
}
  • mysql
{
  "engine": "MYSQL",
  "address": {
    "user": "fate",
    "passwd": "fate",
    "host": "127.0.0.1",
    "port": 3306,
    "db": "experiment",
    "name": "breast_hetero_guest"
  },
  "namespace": "experiment",
  "name": "breast_hetero_guest",
  "head": 1,
  "id_delimiter": ",",
  "partitions": 10,
  "id_column": "id",
  "feature_column": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9"
}
  • PATH

{
    "namespace": "xxx",
    "name": "xxx",
    "engine": "PATH",
    "address": {
        "path": "xxx"
    }
}
return

parameter name type description
retcode int return code
retmsg string return information
data object return data

Sample

{
    "data": {
        "namespace": "xxx",
        "table_name": "xxx"
    },
    "retcode": 0,
    "retmsg": "success"
}

4. table information query

Query information about the fate table (real storage address, number, schema, etc.)

flow table info [options]

options

parameters short-format long-format required type description
table_name -t --table-name yes string fate table name
namespace -n --namespace yes string fate table namespace

returns | parameter name | type | description | | :------ | :----- | -------- | | retcode | int | return code | | retmsg | string | return information | | data | object | return data |

Sample

{
    "data": {
        "address": {
            "home": null,
            "name": "breast_hetero_guest",
            "namespace": "experiment"
        },
        "count": 569,
        "exists": 1,
        "namespace": "experiment",
        "partition": 4,
        "schema": {
            "header": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9",
            "sid": "id"
        },
        "table_name": "breast_hetero_guest"
    },
    "retcode": 0,
    "retmsg": "success"
}

5. Delete table data

You can delete table data with table delete

flow table delete [options]

Options

parameters short-format long-format required type description
table_name -t --table-name yes string fate table name
namespace -n --namespace yes string fate table namespace

returns

parameter name type description
retcode int return code
retmsg string return message
data object return data

Sample

{
    "data": {
        "namespace": "xxx",
        "table_name": "xxx"
    },
    "retcode": 0,
    "retmsg": "success"
}

6. Download data

Brief description:

Used to download data from within the fate storage engine to file format data

flow data download -c ${conf_path}

Note: conf_path is the parameter path, the specific parameters are as follows

Options

parameter name required type description
output_path yes string download_path
table_name yes string fate table name
namespace yes int fate table namespace

Example:

{
  "output_path": "/data/projects/fate/breast_hetero_guest.csv",
  "namespace": "experiment",
  "table_name": "breast_hetero_guest"
}

return parameters

parameter name type description
jobId string job id
retcode int return code
retmsg string return message
data object return data

Example

{
    "data": {
        "board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202111081457135282090&role=local&party_id=0",
        "code": 0,
        "dsl_path": "/data/projects/fate/jobs/202111081457135282090/job_dsl.json",
        "job_id": "202111081457135282090",
        "logs_directory": "/data/projects/fate/logs/202111081457135282090",
        "message": "success",
        "model_info": {
            "model_id": "local-0#model",
            "model_version": "202111081457135282090"
        },
        "pipeline_dsl_path": "/data/projects/fate/jobs/202111081457135282090/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/jobs/202111081457135282090/local/0/job_runtime_on_party_conf.json",
        "runtime_conf_path": "/data/projects/fate/jobs/202111081457135282090/job_runtime_conf.json",
        "train_runtime_conf_path": "/data/projects/fate/jobs/202111081457135282090/train_runtime_conf.json"
    },
    "jobId": "202111081457135282090",
    "retcode": 0,
    "retmsg": "success"
}

7. disable data

Tables can be made unavailable by table disable

flow table disable [options]

Options

parameters short-format long-format required type description
table_name -t --table-name yes string fate table name
namespace -n --namespace yes string fate table namespace

returns

parameter name type description
retcode int return code
retmsg string return information
data object return data

Sample

{
    "data": {
        "namespace": "xxx",
        "table_name": "xxx"
    },
    "retcode": 0,
    "retmsg": "success"
}

8. enable data

Tables can be made available with table enable

flow table enable [options]

Options

parameters short-format long-format required type description
table_name -t --table-name yes string fate table name
namespace -n --namespace yes string fate table namespace

returns

parameter name type description
retcode int return code
retmsg string return information
data object return data

Sample

{
    "data": [{
        "namespace": "xxx",
        "table_name": "xxx"
    }],
    "retcode": 0,
    "retmsg": "success"
}

9. delete disable data

Tables that are currently unavailable can be deleted with disable-delete

flow table disable-delete 

return

parameter name type description
retcode int return-code
retmsg string return information
data object return data

Sample

{
  "data": [
    {
      "namespace": "xxx",
      "table_name": "xxx"
    },
    {
      "namespace": "xxx",
      "table_name": "xxx"
    }
  ],
  "retcode": 0,
  "retmsg": "success"
}

10. Writer

Brief description:

Used to download data from the fate storage engine to the external engine or to save data as a new table

flow data writer -c ${conf_path}

Note: conf_path is the parameter path, the specific parameters are as follows

Options

parameter name required type description
table_name yes string fate table name
namespace yes int fate table namespace
storage_engine no string Storage type, e.g., MYSQL
address no object storage_address
output_namespace no string Save as a table namespace for fate
output_name no string Save as fate's table name
**Note: storage_engine, address are combined parameters that provide storage to the specified engine.
output_namespace, output_name are also combined parameters, providing the function to save as a new table of the same engine**

Example:

{
  "table_name": "name1",
  "namespace": "namespace1",
  "output_name": "name2",
  "output_namespace": "namespace2"
}

return

parameter name type description
jobId string job id
retcode int return code
retmsg string return information
data object return data

Example

{
    "data": {
        "board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202201121235115028490&role=local&party_id=0",
        "code": 0,
        "dsl_path": "/data/projects/fate/fateflow/jobs/202201121235115028490/job_dsl.json",
        "job_id": "202201121235115028490",
        "logs_directory": "/data/projects/fate/fateflow/logs/202201121235115028490",
        "message": "success",
        "model_info": {
            "model_id": "local-0#model",
            "model_version": "202201121235115028490"
        },
        "pipeline_dsl_path": "/data/projects/fate/fateflow/jobs/202201121235115028490/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/fateflow/jobs/202201121235115028490/local/0/job_runtime_on_party_conf.json",
        "runtime_conf_path":"/data/projects/fate/fateflow/jobs/202201121235115028490/job_runtime_conf.json",
        "train_runtime_conf_path": "/data/projects/fate/fateflow/jobs/202201121235115028490/train_runtime_conf.json"
    },
    "jobId": "202201121235115028490",
    "retcode": 0,
    "retmsg": "success"
}

11. reader component

Brief description:

  • The reader component is a data input component of fate;
  • The reader component converts input data into data of the specified storage type;

Parameter configuration:

The input table of the reader is configured in the conf when submitting the job:

{
  "role": {
    "guest": {
      "0": {
        "reader_0": {
          "table": {
            "name": "breast_hetero_guest",
            "namespace": "experiment"
          }
        }
      }
    }
  }
}

Component Output

The output data storage engine of the component is determined by the configuration file conf/service_conf.yaml, with the following configuration items:

default_engines:
  storage: eggroll
  • The computing engine and storage engine have certain support dependencies on each other, the list of dependencies is as follows.
computing_engine storage_engine
standalone standalone
eggroll eggroll
spark hdfs(distributed), localfs(standalone)
  • The reader component's input data storage type supports: eggroll, hdfs, localfs, mysql, path, etc;
  • reader component output data type is determined by default_engines.storage configuration (except for path)

12. api-reader

Brief description:

  • The data input of api-reader component is id, and the data output is feature;
  • request parameters can be user-defined, e.g. version number, back month, etc..
  • The component will request third-party services, and the third-party services need to implement upload, query, download interfaces and register with the fate flow, which can be referred to api-reader related service registration

Parameter configuration:

Configure the api-reader parameter in the conf when submitting the job:

{
  "role": {
    "guest": {
      "0": { "api_reader_0": {
        "server_name": "xxx",
        "parameters": { "version": "xxx"},
        "id_delimiter": ",",
        "head": true
        }
      }
    }
  }
}
Parameter meaning: - server_name: the name of the service to be requested - parameters: the parameters of the requested feature - id_delimiter: the data separator to be returned - head: whether the returned data contains a header or not