Data Access¶

1. Description¶

The storage tables of fate are identified by table name and namespace.
fate provides an upload component for users to upload data to a storage system supported by the fate compute engine.
If the user's data already exists in a storage system supported by fate, the storage information can be mapped to a fate storage table by table bind.
If the table bind's table storage type is not consistent with the current default engine, the reader component will automatically convert the storage type;

2. data upload¶

Used to upload the input data for the modeling task to the storage system supported by fate

flow data upload -c ${conf_path}

Note: conf_path is the parameter path, the specific parameters are as follows

Options

parameter name	required	type	description
file	yes	string	data storage path
id_delimiter	yes	string	Data separator, e.g. ","
head	no	int	Whether the data has a table header
partition	yes	int	Number of data partitions
storage_engine	no	string	storage engine type, default "EGGROLL", also support "HDFS", "LOCALFS", "HIVE", etc.
namespace	yes	string	table namespace
table_name	yes	string	table name
storage_address	no	object	The storage address of the corresponding storage engine is required
use_local_data	no	int	The default is 1, which means use the data from the client's machine; 0 means use the data from the fate flow service's machine.
drop	no	int	Whether to overwrite uploads
extend_sid	no	bool	Whether to add a new column for uuid id, default False
auto_increasing_sid	no	bool	Whether the new id column is self-increasing (will only work if extend_sid is True), default False

mete information

parameter name	required	type	description
input_format	no	string	The format of the data (danse, svmlight, tag:value), used to determine
delimiter	no	string	The data separator, default ","
tag_with_value	no	bool	Valid for tag data format, whether to carry value
tag_value_delimiter	no	string	tag:value data separator, default ":"
with_match_id	no	bool	Whether or not to carry match id
with_match_id	no	object	The name of the id column, effective when extend_sid is enabled, e.g., ["email", "phone"]
id_range	no	object	For tag/svmlight format data, which columns are ids
exclusive_data_type	no	string	The format of the special type data columns
data_type	no	string	Column data type, default "float64
with_label	no	bool	Whether to have a label, default False
label_name	no	string	The name of the label, default "y"
label_type	no	string	Label type, default "int"

In version 1.9.0 and later, passing in the meta parameter will generate anonymous information about the feature. Example

eggroll

{
    "file": "examples/data/breast_hetero_guest.csv",
    "id_delimiter": ",",
    "head": 1,
    "partition": 10,
    "namespace": "experiment",
    "table_name": "breast_hetero_guest",
    "storage_engine": "EGGROLL"
}

hdfs

{
    "file": "examples/data/breast_hetero_guest.csv",
    "id_delimiter": ",",
    "head": 1,
    "partition": 10,
    "namespace": "experiment",
    "table_name": "breast_hetero_guest",
    "storage_engine": "HDFS"
}

localfs

{
    "file": "examples/data/breast_hetero_guest.csv",
    "id_delimiter": ",",
    "head": 1,
    "partition": 4,
    "namespace": "experiment",
    "table_name": "breast_hetero_guest",
    "storage_engine": "LOCALFS"
}

return parameters

parameter name	type	description
jobId	string	job id
retcode	int	return code
retmsg	string	return message
data	object	return data

Example

{
    "data": {
        "board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202111081218319075660&role=local&party_id=0",
        "code": 0,
        "dsl_path": "/data/projects/fate/jobs/202111081218319075660/job_dsl.json",
        "job_id": "202111081218319075660",
        "logs_directory": "/data/projects/fate/logs/202111081218319075660",
        "message": "success",
        "model_info": {
            "model_id": "local-0#model",
            "model_version": "202111081218319075660"
        },
        "namespace": "experiment",
        "pipeline_dsl_path": "/data/projects/fate/jobs/202111081218319075660/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/jobs/202111081218319075660/local/0/job_runtime_on_party_conf.json",
        "runtime_conf_path":"/data/projects/fate/jobs/202111081218319075660/job_runtime_conf.json",
        "table_name": "breast_hetero_host",
        "train_runtime_conf_path":"/data/projects/fate/jobs/202111081218319075660/train_runtime_conf.json"
    },
    "jobId": "202111081218319075660",
    "retcode": 0,
    "retmsg": "success"
}

3. table binding¶

Real storage addresses can be mapped to fate storage tables via table bind

flow table bind [options]

options

parameters	short format	long format	required	type	description
conf_path	`-c`	`--conf-path`	yes	string	configuration-path

Note: conf_path is the parameter path, the specific parameters are as follows

parameter_name	required	type	description
name	yes	string	fate table name
namespace	yes	string	fate table namespace
engine	yes	string	storage engine, supports "HDFS", "MYSQL", "PATH"
yes	object	real storage address
drop	no	int	Overwrite previous information
head	no	int	Whether there is a data table header
id_delimiter	no	string	Data separator
id_column	no	string	id field
feature_column	no	array	feature_field

mete information

parameter name	required	type	description
input_format	no	string	The format of the data (danse, svmlight, tag:value), used to determine
delimiter	no	string	The data separator, default ","
tag_with_value	no	bool	Valid for tag data format, whether to carry value
tag_value_delimiter	no	string	tag:value data separator, default ":"
with_match_id	no	bool	Whether or not to carry match id
with_match_id	no	object	The name of the id column, effective when extend_sid is enabled, e.g., ["email", "phone"]
id_range	no	object	For tag/svmlight format data, which columns are ids
exclusive_data_type	no	string	The format of the special type data columns
data_type	no	string	Column data type, default "float64
with_label	no	bool	Whether to have a label, default False
label_name	no	string	The name of the label, default "y"
label_type	no	string	Label type, default "int"

In version 1.9.0 and later, if the meta parameter is passed in during the table bind phase, no anonymous information about the feature is generated directly. The feature anonymization information of the original data will be updated after the data has passed through the reader component once

Sample

hdfs

{
    "namespace": "experiment",
    "name": "breast_hetero_guest",
    "engine": "HDFS",
    "address": {
        "name_node": "hdfs://fate-cluster",
        "path": "/data/breast_hetero_guest.csv"
    },
    "id_delimiter": ",",
    "head": 1,
    "partitions": 10
}

mysql

{
  "engine": "MYSQL",
  "address": {
    "user": "fate",
    "passwd": "fate",
    "host": "127.0.0.1",
    "port": 3306,
    "db": "experiment",
    "name": "breast_hetero_guest"
  },
  "namespace": "experiment",
  "name": "breast_hetero_guest",
  "head": 1,
  "id_delimiter": ",",
  "partitions": 10,
  "id_column": "id",
  "feature_column": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9"
}

PATH

{
    "namespace": "xxx",
    "name": "xxx",
    "engine": "PATH",
    "address": {
        "path": "xxx"
    }
}

return

parameter name	type	description
retcode	int	return code
retmsg	string	return information
data	object	return data

Sample

{
    "data": {
        "namespace": "xxx",
        "table_name": "xxx"
    },
    "retcode": 0,
    "retmsg": "success"
}

4. table information query¶

Query information about the fate table (real storage address, number, schema, etc.)

flow table info [options]

options

parameters	short-format	long-format	required	type	description
table_name	`-t`	`--table-name`	yes	string	fate table name
namespace	`-n`	`--namespace`	yes	string	fate table namespace

returns | parameter name | type | description | | :------ | :----- | -------- | | retcode | int | return code | | retmsg | string | return information | | data | object | return data |

Sample

{
    "data": {
        "address": {
            "home": null,
            "name": "breast_hetero_guest",
            "namespace": "experiment"
        },
        "count": 569,
        "exists": 1,
        "namespace": "experiment",
        "partition": 4,
        "schema": {
            "header": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9",
            "sid": "id"
        },
        "table_name": "breast_hetero_guest"
    },
    "retcode": 0,
    "retmsg": "success"
}

5. Delete table data¶

You can delete table data with table delete

flow table delete [options]

Options

parameters	short-format	long-format	required	type	description
table_name	`-t`	`--table-name`	yes	string	fate table name
namespace	`-n`	`--namespace`	yes	string	fate table namespace

returns

parameter name	type	description
retcode	int	return code
retmsg	string	return message
data	object	return data

Sample

{
    "data": {
        "namespace": "xxx",
        "table_name": "xxx"
    },
    "retcode": 0,
    "retmsg": "success"
}

6. Download data¶

Brief description:

Used to download data from within the fate storage engine to file format data

flow data download -c ${conf_path}

Note: conf_path is the parameter path, the specific parameters are as follows

Options

parameter name	required	type	description
output_path	yes	string	download_path
table_name	yes	string	fate table name
namespace	yes	int	fate table namespace

Example:

{
  "output_path": "/data/projects/fate/breast_hetero_guest.csv",
  "namespace": "experiment",
  "table_name": "breast_hetero_guest"
}

return parameters

parameter name	type	description
jobId	string	job id
retcode	int	return code
retmsg	string	return message
data	object	return data

Example

{
    "data": {
        "board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202111081457135282090&role=local&party_id=0",
        "code": 0,
        "dsl_path": "/data/projects/fate/jobs/202111081457135282090/job_dsl.json",
        "job_id": "202111081457135282090",
        "logs_directory": "/data/projects/fate/logs/202111081457135282090",
        "message": "success",
        "model_info": {
            "model_id": "local-0#model",
            "model_version": "202111081457135282090"
        },
        "pipeline_dsl_path": "/data/projects/fate/jobs/202111081457135282090/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/jobs/202111081457135282090/local/0/job_runtime_on_party_conf.json",
        "runtime_conf_path": "/data/projects/fate/jobs/202111081457135282090/job_runtime_conf.json",
        "train_runtime_conf_path": "/data/projects/fate/jobs/202111081457135282090/train_runtime_conf.json"
    },
    "jobId": "202111081457135282090",
    "retcode": 0,
    "retmsg": "success"
}

7. disable data¶

Tables can be made unavailable by table disable

flow table disable [options]

Options

parameters	short-format	long-format	required	type	description
table_name	`-t`	`--table-name`	yes	string	fate table name
namespace	`-n`	`--namespace`	yes	string	fate table namespace

returns

parameter name	type	description
retcode	int	return code
retmsg	string	return information
data	object	return data

Sample

{
    "data": {
        "namespace": "xxx",
        "table_name": "xxx"
    },
    "retcode": 0,
    "retmsg": "success"
}

8. enable data¶

Tables can be made available with table enable

flow table enable [options]

Options

parameters	short-format	long-format	required	type	description
table_name	`-t`	`--table-name`	yes	string	fate table name
namespace	`-n`	`--namespace`	yes	string	fate table namespace

returns

parameter name	type	description
retcode	int	return code
retmsg	string	return information
data	object	return data

Sample

{
    "data": [{
        "namespace": "xxx",
        "table_name": "xxx"
    }],
    "retcode": 0,
    "retmsg": "success"
}

9. delete disable data¶

Tables that are currently unavailable can be deleted with disable-delete

flow table disable-delete

return

parameter name	type	description
retcode	int	return-code
retmsg	string	return information
data	object	return data

Sample

{
  "data": [
    {
      "namespace": "xxx",
      "table_name": "xxx"
    },
    {
      "namespace": "xxx",
      "table_name": "xxx"
    }
  ],
  "retcode": 0,
  "retmsg": "success"
}

10. Writer¶

Brief description:

Used to download data from the fate storage engine to the external engine or to save data as a new table

flow data writer -c ${conf_path}

Note: conf_path is the parameter path, the specific parameters are as follows

Options

parameter name	required	type	description
table_name	yes	string	fate table name
namespace	yes	int	fate table namespace
storage_engine	no	string	Storage type, e.g., MYSQL
address	no	object	storage_address
output_namespace	no	string	Save as a table namespace for fate
output_name	no	string	Save as fate's table name
**Note: storage_engine, address are combined parameters that provide storage to the specified engine.
output_namespace, output_name are also combined parameters, providing the function to save as a new table of the same engine**

Example:

{
  "table_name": "name1",
  "namespace": "namespace1",
  "output_name": "name2",
  "output_namespace": "namespace2"
}

return

parameter name	type	description
jobId	string	job id
retcode	int	return code
retmsg	string	return information
data	object	return data

Example

{
    "data": {
        "board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202201121235115028490&role=local&party_id=0",
        "code": 0,
        "dsl_path": "/data/projects/fate/fateflow/jobs/202201121235115028490/job_dsl.json",
        "job_id": "202201121235115028490",
        "logs_directory": "/data/projects/fate/fateflow/logs/202201121235115028490",
        "message": "success",
        "model_info": {
            "model_id": "local-0#model",
            "model_version": "202201121235115028490"
        },
        "pipeline_dsl_path": "/data/projects/fate/fateflow/jobs/202201121235115028490/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/fateflow/jobs/202201121235115028490/local/0/job_runtime_on_party_conf.json",
        "runtime_conf_path":"/data/projects/fate/fateflow/jobs/202201121235115028490/job_runtime_conf.json",
        "train_runtime_conf_path": "/data/projects/fate/fateflow/jobs/202201121235115028490/train_runtime_conf.json"
    },
    "jobId": "202201121235115028490",
    "retcode": 0,
    "retmsg": "success"
}

11. reader component¶

Brief description:

The reader component is a data input component of fate;
The reader component converts input data into data of the specified storage type;

Parameter configuration:

The input table of the reader is configured in the conf when submitting the job:

{
  "role": {
    "guest": {
      "0": {
        "reader_0": {
          "table": {
            "name": "breast_hetero_guest",
            "namespace": "experiment"
          }
        }
      }
    }
  }
}

Component Output

The output data storage engine of the component is determined by the configuration file conf/service_conf.yaml, with the following configuration items:

default_engines:
  storage: eggroll

The computing engine and storage engine have certain support dependencies on each other, the list of dependencies is as follows.

computing_engine	storage_engine
standalone	standalone
eggroll	eggroll
spark	hdfs(distributed), localfs(standalone)

The reader component's input data storage type supports: eggroll, hdfs, localfs, mysql, path, etc;
reader component output data type is determined by default_engines.storage configuration (except for path)

12. api-reader¶

Brief description:

The data input of api-reader component is id, and the data output is feature;
request parameters can be user-defined, e.g. version number, back month, etc..
The component will request third-party services, and the third-party services need to implement upload, query, download interfaces and register with the fate flow, which can be referred to api-reader related service registration

Parameter configuration:

Configure the api-reader parameter in the conf when submitting the job:

{
  "role": {
    "guest": {
      "0": { "api_reader_0": {
        "server_name": "xxx",
        "parameters": { "version": "xxx"},
        "id_delimiter": ",",
        "head": true
        }
      }
    }
  }
}

Parameter meaning: - server_name: the name of the service to be requested - parameters: the parameters of the requested feature - id_delimiter: the data separator to be returned - head: whether the returned data contains a header or not