S3¶
- class verta.dataset.S3(paths, enable_mdb_versioning=False)¶
Captures metadata about S3 objects.
If your S3 object requires additional information to identify it, such as its version ID, you can use
S3.location()
.- Parameters
paths (list) – List of S3 URLs of the form
"s3://<bucket-name>"
or"s3://<bucket-name>/<key>"
, or objects returned byS3.location()
.enable_mdb_versioning (bool, default False) – Whether to upload the data itself to ModelDB to enable managed data versioning.
Examples
from verta.dataset import S3 dataset1 = S3([ "s3://verta-starter/census-train.csv", "s3://verta-starter/census-test.csv", ]) dataset2 = S3([ "s3://verta-starter", ]) dataset3 = S3([ S3.location("s3://verta-starter/census-train.csv", version_id="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"), ])
- dataset += other
Updates the dataset, adding paths from
other
.
- dataset + other + ...
Returns a new dataset with paths from the dataset and all others.
- add(paths)¶
Adds paths to this dataset.
- Parameters
paths (list) – List of S3 URLs of the form
"s3://<bucket-name>"
or"s3://<bucket-name>/<key>"
, or objects returned byS3.location()
.
- static blob_msg_to_object(blob_msg)¶
Deserialize a blob protobuf message into an instance.
- Parameters
blob_msg (
VersioningService_pb2.Blob
) –- Returns
instance of subclass of
Blob
- download(component_path=None, download_to_path=None)¶
Downloads component_path from this dataset if ModelDB-managed versioning was enabled.
- Parameters
component_path (str, optional) – Original path of the file or directory in this dataset to download. If not provided, all files will be downloaded.
download_to_path (str, optional) – Path to download to. If not provided, the file(s) will be downloaded into a new path in the current directory. If provided and the path already exists, it will be overwritten.
- Returns
downloaded_to_path (str) – Absolute path where file(s) were downloaded to. Matches download_to_path if it was provided as an argument.
- list_components()¶
Returns the components in this dataset.
- Returns
components (list of
Component
) – Components.
- list_paths()¶
Returns the paths of all components in this dataset.
- Returns
component_paths (list of str) – Paths of all components.
- static location(path, version_id=None)¶
Returns an object describing an S3 location that can be passed into a new
S3
.- Parameters
- Returns
S3Location
– A location in S3.- Raises
ValueError – If version_id is provided but path represents a bucket rather than a single object.
- classmethod with_spark(sc, paths)¶
Creates a dataset blob with a SparkContext instance.