Data Access¶

1. Description¶

The storage tables of fate are identified by table name and namespace.
fate provides an upload component for users to upload data to a storage system supported by the fate compute engine.
If the user's data already exists in a storage system supported by fate, the storage information can be mapped to a fate storage table by table bind.
If the table bind's table storage type is not consistent with the current default engine, the reader component will automatically convert the storage type;

2. data upload¶

Used to upload the input data for the modeling task to the storage system supported by fate

flow data upload -c ${conf_path}

Note: conf_path is the parameter path, the specific parameters are as follows

Options

parameter name	required	type	description
file	yes	string	data storage path
id_delimiter	yes	string	Data separator, e.g. ","
head	no	int	Whether the data has a table header
partition	yes	int	Number of data partitions
storage_engine	no	storage engine type	default "EGGROLL", also support "HDFS", "LOCALFS", "HIVE", etc.
namespace	yes	string	table namespace
table_name	yes	string	table name
storage_address	no	object	The storage address of the corresponding storage engine is required
use_local_data	no	int	The default is 1, which means use the data from the client's machine; 0 means use the data from the fate flow service's machine.
drop	no	int	Whether to overwrite uploads
extend_sid	no	bool	Whether to add a new column for uuid id, default False
auto_increasing_sid	no	bool	Whether the new id column is self-increasing (will only work if extend_sid is True), default False

Example

eggroll

{
    "file": "examples/data/breast_hetero_guest.csv",
    "id_delimiter": ",",
    "head": 1,
    "partition": 10,
    "namespace": "experiment",
    "table_name": "breast_hetero_guest",
    "storage_engine": "EGGROLL"
}

hdfs

{
    "file": "examples/data/breast_hetero_guest.csv",
    "id_delimiter": ",",
    "head": 1,
    "partition": 10,
    "namespace": "experiment",
    "table_name": "breast_hetero_guest",
    "storage_engine": "HDFS"
}

localfs

{
    "file": "examples/data/breast_hetero_guest.csv",
    "id_delimiter": ",",
    "head": 1,
    "partition": 4,
    "namespace": "experiment",
    "table_name": "breast_hetero_guest",
    "storage_engine": "LOCALFS"
}

return parameters

parameter name	type	description
jobId	string	task id
retcode	int	return code
retmsg	string	return message
data	object	return data

Example

{
    "data": {
        "board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202111081218319075660&role=local&party_id=0",
        "code": 0,
        "dsl_path": "/data/projects/fate/jobs/202111081218319075660/job_dsl.json",
        "job_id": "202111081218319075660",
        "logs_directory": "/data/projects/fate/logs/202111081218319075660",
        "message": "success",
        "model_info": {
            "model_id": "local-0#model",
            "model_version": "202111081218319075660"
        },
        "namespace": "experiment",
        "pipeline_dsl_path": "/data/projects/fate/jobs/202111081218319075660/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/jobs/202111081218319075660/local/0/job_runtime_on_party_conf.json",
        "runtime_conf_path":"/data/projects/fate/jobs/202111081218319075660/job_runtime_conf.json",
        "table_name": "breast_hetero_host",
        "train_runtime_conf_path":"/data/projects/fate/jobs/202111081218319075660/train_runtime_conf.json"
    },
    "jobId": "202111081218319075660",
    "retcode": 0,
    "retmsg": "success"
}

3. table binding¶

Real storage addresses can be mapped to fate storage tables via table bind

flow table bind [options]

Note: conf_path is the parameter path, the specific parameters are as follows

Options

parameter name	required	type	description
name	yes	string	fate table name
namespace	yes	string	fate table namespace
engine	yes	string	storage engine, supports "HDFS", "MYSQL", "PATH"
yes	object	real storage address
drop	no	int	Overwrite previous information
head	no	int	Whether there is a data table header
id_delimiter	no	string	Data separator
id_column	no	string	id field
feature_column	no	array	feature_field

Example

hdfs

{
    "namespace": "experiment",
    "name": "breast_hetero_guest",
    "engine": "HDFS",
    "address": {
        "name_node": "hdfs://fate-cluster",
        "path": "/data/breast_hetero_guest.csv"
    },
    "id_delimiter": ",",
    "head": 1,
    "partitions": 10
}

mysql

{
  "engine": "MYSQL",
  "address": {
    "user": "fate",
    "passwd": "fate",
    "host": "127.0.0.1",
    "port": 3306,
    "db": "experiment",
    "name": "breast_hetero_guest"
  },
  "namespace": "experiment",
  "name": "breast_hetero_guest",
  "head": 1,
  "id_delimiter": ",",
  "partitions": 10,
  "id_column": "id",
  "feature_column": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9"
}

PATH

{
    "namespace": "xxx",
    "name": "xxx",
    "engine": "PATH",
    "address": {
        "path": "xxx"
    }
}

return parameters

parameter name	type	description
retcode	int	return code
retmsg	string	return message
data	object	return data

Sample

{
    "data": {
        "namespace": "xxx",
        "table_name": "xxx"
    },
    "retcode": 0,
    "retmsg": "success"
}

4. table information query¶

Query information about the fate table (real storage address, number, schema, etc.)

flow table info [options]

Options

parameter name	required	type	description
name	yes	string	fate table name
namespace	yes	string	fate table namespace

return parameters

parameter name	type	description
retcode	int	return code
retmsg	string	return message
data	object	return data

Sample

{
    "data": {
        "address": {
            "home": null,
            "name": "breast_hetero_guest",
            "namespace": "experiment"
        },
        "count": 569,
        "exists": 1,
        "namespace": "experiment",
        "partition": 4,
        "schema": {
            "header": "y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9",
            "sid": "id"
        },
        "table_name": "breast_hetero_guest"
    },
    "retcode": 0,
    "retmsg": "success"
}

5. Delete table data¶

You can delete table data with table delete

flow table delete [options]

Options

parameter name	required	type	description
name	yes	string	fate table name
namespace	yes	string	fate table namespace

return parameters

parameter name	type	description
retcode	int	return code
retmsg	string	return message
data	object	return data

Sample

{
    "data": {
        "namespace": "xxx",
        "table_name": "xxx"
    },
    "retcode": 0,
    "retmsg": "success"
}

6. Download data¶

Brief description:

Used to download data from within the fate storage engine to file format data

flow data download -c ${conf_path}

Note: conf_path is the parameter path, the specific parameters are as follows

Options

parameter name	required	type	description
output_path	yes	string	download_path
table_name	yes	string	fate table name
namespace	yes	int	fate table namespace

Example:

{
  "output_path": "/data/projects/fate/breast_hetero_guest.csv",
  "namespace": "experiment",
  "table_name": "breast_hetero_guest"
}

return parameters

parameter name	type	description
retcode	int	return code
retmsg	string	return message
data	object	return data

Example

{
    "data": {
        "board_url": "http://xxx.xxx.xxx.xxx:8080/index.html#/dashboard?job_id=202111081457135282090&role=local&party_id=0",
        "code": 0,
        "dsl_path": "/data/projects/fate/jobs/202111081457135282090/job_dsl.json",
        "job_id": "202111081457135282090",
        "logs_directory": "/data/projects/fate/logs/202111081457135282090",
        "message": "success",
        "model_info": {
            "model_id": "local-0#model",
            "model_version": "202111081457135282090"
        },
        "pipeline_dsl_path": "/data/projects/fate/jobs/202111081457135282090/pipeline_dsl.json",
        "runtime_conf_on_party_path": "/data/projects/fate/jobs/202111081457135282090/local/0/job_runtime_on_party_conf.json",
        "runtime_conf_path": "/data/projects/fate/jobs/202111081457135282090/job_runtime_conf.json",
        "train_runtime_conf_path": "/data/projects/fate/jobs/202111081457135282090/train_runtime_conf.json"
    },
    "jobId": "202111081457135282090",
    "retcode": 0,
    "retmsg": "success"
}

7. reader component¶

Brief description:

The reader component is a data input component of fate;
The reader component converts input data into data of the specified storage type;

Parameter configuration:

The input table of the reader is configured in the conf when submitting the job:

{
  "role": {
    "guest": {
      "0": { "reader_0": { "table": { "name": "breast_hetero_guest", "namespace": "experiment"}
    }
  }
}

Component Output

The output data storage engine of the component is determined by the configuration file conf/service_conf.yaml, with the following configuration items:

default_engines:
  storage: eggroll

The computing engine and storage engine have certain support dependencies on each other, the list of dependencies is as follows.

computing_engine	storage_engine
standalone	standalone
eggroll	eggroll
spark	hdfs(distributed), localfs(standalone)

The reader component's input data storage type supports: eggroll, hdfs, localfs, mysql, path, etc;
reader component output data type is determined by default_engines.storage configuration (except for path)