ExperimentRun

class verta.tracking.entities.ExperimentRun(conn, conf, msg)

Object representing a machine learning Experiment Run.

This class provides read/write functionality for Experiment Run metadata.

There should not be a need to instantiate this class directly; please use Client.set_experiment_run().

Variables:
  • id (str) – ID of this Experiment Run.

  • name (str) – Name of this Experiment Run.

  • has_environment (bool) – Whether there is an environment associated with this Experiment Run.

  • url (str) – Verta web app URL.

clone(experiment_id=None)

Returns a newly-created copy of this experiment run.

Parameters:

experiment_id (str, optional) – ID of experiment to clone this run into. If not provided, the new run will be cloned into this run’s experiment.

Returns:

ExperimentRun

delete()

Deletes this experiment run.

download_artifact(key, download_to_path)

Downloads the artifact with name key to path download_to_path.

Parameters:
  • key (str) – Name of the artifact.

  • download_to_path (str) – Path to download to.

Returns:

downloaded_to_path (str) – Absolute path where artifact was downloaded to. Matches download_to_path.

download_docker_context(download_to_path, self_contained=False)

Downloads this Experiment Run’s Docker context tgz.

Parameters:
  • download_to_path (str) – Path to download Docker context to.

  • self_contained (bool, default False) – Whether the downloaded Docker context should be self-contained.

Returns:

downloaded_to_path (str) – Absolute path where Docker context was downloaded to. Matches download_to_path.

download_model(download_to_path)

Downloads the model logged with log_model() to path download_to_path.

Parameters:

download_to_path (str) – Path to download to.

Returns:

downloaded_to_path (str) – Absolute path where artifact was downloaded to. Matches download_to_path.

fetch_artifacts(keys)

Downloads artifacts that are associated with a Standard Verta Model.

Parameters:

keys (list of str) – Keys of artifacts to download.

Returns:

dict of str to str – Map of artifacts’ keys to their cache filepaths—for use as the artifacts parameter to a Standard Verta Model.

Examples

run.log_artifact("weights", open("weights.npz", 'rb'))
# upload complete (weights)
run.log_artifact("text_embeddings", open("embedding.csv", 'rb'))
# upload complete (text_embeddings)
artifact_keys = ["weights", "text_embeddings"]
artifacts = run.fetch_artifacts(artifact_keys)
artifacts
# {'weights': '/Users/convoliution/.verta/cache/artifacts/50a9726b3666d99aea8af006cf224a7637d0c0b5febb3b0051192ce1e8615f47/weights.npz',
#  'text_embeddings': '/Users/convoliution/.verta/cache/artifacts/2d2d1d809e9bce229f0a766126ae75df14cadd1e8f182561ceae5ad5457a3c38/embedding.csv'}
ModelClass(artifacts=artifacts).predict(["Good book.", "Bad book!"])
# [0.955998517288053, 0.09809996313422353]
run.log_model(ModelClass, artifacts=artifact_keys)
# upload complete (custom_modules.zip)
# upload complete (model.pkl)
# upload complete (model_api.json)
get_artifact(key)

Gets the artifact with name key from this Experiment Run.

If the artifact was originally logged as just a filesystem path, that path will be returned. Otherwise, the artifact object will be returned. If the object is unable to be deserialized, the raw bytes are returned instead.

Parameters:

key (str) – Name of the artifact.

Returns:

str or bytes – Filesystem path of the artifact, the artifact object, or a bytestream representing the artifact.

get_artifact_keys()

Gets the artifact keys of this Experiment Run.

Returns:

list of str – List of artifact keys of this Experiment Run.

get_attribute(key)

Gets the attribute with name key from this Experiment Run.

Parameters:

key (str) – Name of the attribute.

Returns:

one of {None, bool, float, int, str} – Value of the attribute.

get_attributes()

Gets all attributes from this Experiment Run.

Returns:

dict of str to {None, bool, float, int, str} – Names and values of all attributes.

get_code()

Gets the code version.

Returns:

dict or zipfile.ZipFile

Either:
  • a dictionary containing Git snapshot information with at most the following items:
    • filepaths (list of str)

    • repo_url (str) – Remote repository URL

    • commit_hash (str) – Commit hash

    • is_dirty (bool)

  • a ZipFile containing Python source code files

get_dataset(key)

Gets the dataset artifact with name key from this Experiment Run.

If the dataset was originally logged as just a filesystem path, that path will be returned. Otherwise, the dataset object itself will be returned. If the object is unable to be deserialized, the raw bytes are returned instead.

Parameters:

key (str) – Name of the dataset.

Returns:

str or object or file-like – DatasetVersion if logged using log_dataset_version(). Filesystem path of the dataset, the dataset object, or a bytestream representing the dataset.

get_dataset_version(key)

Gets the DatasetVersion with name key from this Experiment Run.

Parameters:

key (str) – Name of the dataset version.

Returns:

DatasetVersion – DatasetVersion associated with the given key.

get_date_created()

Gets a timestamp representing the time (in UTC) this Experiment Run was created.

Returns:

timestamp (int) – Unix timestamp in milliseconds.

get_date_updated()

Gets a timestamp representing the time (in UTC) this Experiment Run was updated.

Returns:

timestamp (int) – Unix timestamp in milliseconds.

get_environment()

Get the logged environment.

Returns:

environment – Logged environment.

get_hyperparameter(key)

Gets the hyperparameter with name key from this Experiment Run.

Parameters:

key (str) – Name of the hyperparameter.

Returns:

one of {None, bool, float, int, str} – Value of the hyperparameter.

get_hyperparameters()

Gets all hyperparameters from this Experiment Run.

Returns:

dict of str to {None, bool, float, int, str} – Names and values of all hyperparameters.

get_image(key)

Gets the image artifact with name key from this Experiment Run.

If the image was originally logged as just a filesystem path, that path will be returned. Otherwise, the image object will be returned. If the object is unable to be deserialized, the raw bytes are returned instead.

Parameters:

key (str) – Name of the image.

Returns:

str or PIL Image or file-like – Filesystem path of the image, the image object, or a bytestream representing the image.

get_metric(key)

Gets the metric with name key from this Experiment Run.

Parameters:

key (str) – Name of the metric.

Returns:

one of {None, bool, float, int, str} – Value of the metric.

get_metrics()

Gets all metrics from this Experiment Run.

Returns:

dict of str to {None, bool, float, int, str} – Names and values of all metrics.

get_model()

Gets the model artifact for Verta model deployment from this Experiment Run.

Returns:

object – Model for deployment.

get_observation(key)

Gets the observation series with name key from this Experiment Run.

Parameters:

key (str) – Name of observation series.

Returns:

list of {None, bool, float, int, str} – Values of observation series.

get_observations()

Gets all observations from this Experiment Run.

Returns:

dict of str to list of {None, bool, float, int, str} – Names and values of all observation series.

get_schema() Dict[str, dict]

Gets the input and output JSON schemas, in the format:

{
    "input": <input schema>,
    "output": <output schema>
}

New in version 0.24.0.

If no output schema was provided, output will not be included in the returned dict.

Returns:

dict of str to dict – Input and output JSON schemas.

get_tags()

Gets all tags from this Experiment Run.

Returns:

list of str – All tags.

log_artifact(key, artifact, overwrite=False)

Logs an artifact to this Experiment Run.

Note

The following artifact keys are reserved for internal use within the Verta system:

  • "custom_modules"

  • "model"

  • "model.pkl"

  • "model_api.json"

  • "requirements.txt"

  • "train_data"

  • "tf_saved_model"

  • "setup_script"

Parameters:
  • key (str) – Name of the artifact.

  • artifact (str or file-like or object) –

    Artifact or some representation thereof.
    • If str, then it will be interpreted as a filesystem path, its contents read as bytes, and uploaded as an artifact. If it is a directory path, its contents will be zipped.

    • If file-like, then the contents will be read as bytes and uploaded as an artifact.

    • Otherwise, the object will be serialized and uploaded as an artifact.

  • overwrite (bool, default False) – Whether to allow overwriting an existing artifact with key key.

log_attribute(key, value, overwrite=False)

Logs an attribute to this Experiment Run.

Parameters:
  • key (str) – Name of the attribute.

  • value (one of {None, bool, float, int, str, list, dict}) – Value of the attribute.

  • overwrite (bool, default False) – Whether to allow overwriting an existing atribute with key key.

log_attributes(attributes, overwrite=False)

Logs potentially multiple attributes to this Experiment Run.

Parameters:
  • attributes (dict of str to {None, bool, float, int, str, list, dict}) – Attributes.

  • overwrite (bool, default False) – Whether to allow overwriting an existing atributes.

log_code(exec_path=None, repo_url=None, commit_hash=None, overwrite=False, is_dirty=None, autocapture=True)

Logs the code version.

A code version is either information about a Git snapshot or a bundle of Python source code files.

repo_url and commit_hash can only be set if use_git was set to True in the Client.

Parameters:
  • exec_path (str, optional) – Filepath to the executable Python script or Jupyter notebook. If no filepath is provided, the Client will make its best effort to find the currently running script/notebook file.

  • repo_url (str, optional) – URL for a remote Git repository containing commit_hash. If no URL is provided, the Client will make its best effort to find it.

  • commit_hash (str, optional) – Git commit hash associated with this code version. If no hash is provided, the Client will make its best effort to find it.

  • overwrite (bool, default False) – Whether to allow overwriting a code version.

  • is_dirty (bool, optional) – Whether git status is dirty relative to commit_hash. If not provided, the Client will make its best effort to find it.

  • autocapture (bool, default True) – Whether to enable the automatic capturing behavior of parameters above in git mode.

Examples

With Client(use_git=True) (default):

Log Git snapshot information, plus the location of the currently executing notebook/script relative to the repository root:

run.log_code()
run.get_code()
# {'exec_path': 'comparison/outcomes/classification.ipynb',
#  'repo_url': 'git@github.com:VertaAI/experiments.git',
#  'commit_hash': 'f99abcfae6c3ce6d22597f95ad6ef260d31527a6',
#  'is_dirty': False}

Log Git snapshot information, plus the location of a specific source code file relative to the repository root:

run.log_code("../trainer/training_pipeline.py")
run.get_code()
# {'exec_path': 'comparison/trainer/training_pipeline.py',
#  'repo_url': 'git@github.com:VertaAI/experiments.git',
#  'commit_hash': 'f99abcfae6c3ce6d22597f95ad6ef260d31527a6',
#  'is_dirty': False}

With Client(use_git=False):

Find and upload the currently executing notebook/script:

run.log_code()
zip_file = run.get_code()
zip_file.printdir()
# File Name                          Modified             Size
# classification.ipynb        2019-07-10 17:18:24        10287

Upload a specific source code file:

run.log_code("../trainer/training_pipeline.py")
zip_file = run.get_code()
zip_file.printdir()
# File Name                          Modified             Size
# training_pipeline.py        2019-05-31 10:34:44          964
log_dataset_version(key, dataset_version, overwrite=False)

Logs a Verta DatasetVersion to this ExperimentRun with the given key.

Parameters:
  • key (str) – Name of the dataset version.

  • dataset_version (DatasetVersion) – Dataset version.

  • overwrite (bool, default False) – Whether to allow overwriting a dataset version.

log_environment(env, overwrite=False)

Log an environment.

Parameters:
  • env (environment) – Environment to log.

  • overwrite (bool, default False) – Whether to allow overwriting an existing artifact with key key.

log_hyperparameter(key, value, overwrite=False)

Logs a hyperparameter to this Experiment Run.

Parameters:
  • key (str) – Name of the hyperparameter.

  • value (one of {None, bool, float, int, str}) – Value of the hyperparameter.

  • overwrite (bool, default False) – Whether to allow overwriting an existing hyperparameter with key key.

log_hyperparameters(hyperparams, overwrite=False)

Logs potentially multiple hyperparameters to this Experiment Run.

Parameters:
  • hyperparameters (dict of str to {None, bool, float, int, str}) – Hyperparameters.

  • overwrite (bool, default False) – Whether to allow overwriting an existing hyperparameter with key key.

log_image(key, image, overwrite=False)

Logs a image artifact to this Experiment Run.

Parameters:
  • key (str) – Name of the image.

  • image (one of {str, file-like, pyplot, matplotlib Figure, PIL Image, object}) –

    Image or some representation thereof.
    • If str, then it will be interpreted as a filesystem path, its contents read as bytes, and uploaded as an artifact.

    • If file-like, then the contents will be read as bytes and uploaded as an artifact.

    • If matplotlib pyplot, then the image will be serialized and uploaded as an artifact.

    • If matplotlib Figure, then the image will be serialized and uploaded as an artifact.

    • If PIL Image, then the image will be serialized and uploaded as an artifact.

    • Otherwise, the object will be serialized and uploaded as an artifact.

  • overwrite (bool, default False) – Whether to allow overwriting an existing image with key key.

log_metric(key, value, overwrite=False)

Logs a metric to this Experiment Run.

If the metadatum of interest might recur, log_observation() should be used instead.

Parameters:
  • key (str) – Name of the metric.

  • value (one of {None, bool, float, int, str}) – Value of the metric.

  • overwrite (bool, default False) – Whether to allow overwriting an existing metric with key key.

log_metrics(metrics, overwrite=False)

Logs potentially multiple metrics to this Experiment Run.

Parameters:
  • metrics (dict of str to {None, bool, float, int, str}) – Metrics.

  • overwrite (bool, default False) – Whether to allow overwriting an existing metric with key key.

log_model(model, custom_modules=None, model_api=None, artifacts=None, overwrite=False)

Logs a model and associated code dependencies.

Note

If using an XGBoost model from their scikit-learn API, "scikit-learn" must also be specified in log_environment() (in addition to "xgboost").

Parameters:
  • model (str or object) –

    Model. For deployment, this parameter can be one of the following types:
    For more general model logging, the following types are also supported:
    • str path to a file or directory

    • arbitrary pickleable object

  • custom_modules (list of str, optional) –

    Paths to local Python modules and other files that the deployed model depends on. Modules from the standard library should not be included here.
    • If directories are provided, all files within—excluding virtual environments—will be included.

    • If module names are provided, all files within the corresponding module inside a folder in sys.path will be included.

    • If not provided, all Python files located within sys.path—excluding virtual environments—will be included.

    • If an empty list is provided, no local files will be included at all. This can be useful for decreasing upload times or resolving certain types of package conflicts when a model has no local dependencies.

  • model_api (ModelAPI, optional) – Model API specifying details about the model and its deployment.

  • artifacts (list of str, optional) – Keys of logged artifacts to be used by a Standard Verta Model.

  • overwrite (bool, default False) – Whether to allow overwriting existing model artifacts.

log_modules(paths, search_path=None)

Logs local files that are dependencies for a deployed model to this Experiment Run.

Deprecated since version 0.13.13: The behavior of this function has been merged into log_model() as its custom_modules parameter; consider using that instead.

Deprecated since version 0.12.4: The search_path parameter is no longer necessary and will be removed in an upcoming version; consider removing it from the function call.

Parameters:

paths (str or list of str) – Paths to local Python modules and other files that the deployed model depends on. If a directory is provided, all files within will be included.

log_observation(key, value, timestamp=None, epoch_num=None, overwrite=False)

Logs an observation to this Experiment Run.

Parameters:
  • key (str) – Name of the observation.

  • value (one of {None, bool, float, int, str}) – Value of the observation.

  • timestamp (str or float or int, optional) – String representation of a datetime or numerical Unix timestamp. If not provided, the current time will be used.

  • epoch_num (non-negative int, optional) – Epoch number associated with this observation. If not provided, it will automatically be incremented from prior observations for the same key.

  • overwrite (bool, default False) – Whether to allow overwriting an existing observation with key key.

Warning

If timestamp is provided by the user, it must contain timezone information. Otherwise, it will be interpreted as UTC.

log_schema(input: dict, output: Optional[dict] = None) None

Sets the input and output schemas, which are stored as model artifacts.

New in version 0.24.0.

To propagate this change to any live endpoints, you must redeploy the model by calling update().

The output schema is optional.

To validate a prediction’s input and output against these schemas, use the validate_schema() decorator on your model’s predict() function.

Parameters:
  • input (dict) – Input schema as an OpenAPI-compatible JSON dict. Easiest to create using pydantic.BaseModel.schema() [1].

  • output (dict, optional) – Output schema as an OpenAPI-compatible JSON dict. Easiest to create using pydantic.BaseModel.schema().

References

log_setup_script(script, overwrite=False)

Associate a model deployment setup script with this Experiment Run.

New in version 0.13.8.

Parameters:
  • script (str) – String composed of valid Python code for executing setup steps at the beginning of model deployment. An on-disk file can be passed in using open("path/to/file.py", 'r').read().

  • overwrite (bool, default False) – Whether to allow overwriting an existing setup script.

Raises:

SyntaxError – If script contains invalid Python.

log_tag(tag)

Logs a tag to this Experiment Run.

Parameters:

tag (str) – Tag.

log_tags(tags)

Logs multiple tags to this Experiment Run.

Parameters:

tags (list of str) – Tags.

log_training_data(train_features, train_targets, overwrite=False)

Associate training data with this model reference.

Changed in version 0.14.4: Instead of uploading the data itself as a CSV artifact 'train_data', this method now generates a histogram for internal use by our deployment data monitoring system.

Deprecated since version 0.18.0: This method is no longer supported. Please see our documentation for information about our platform’s data monitoring features.

Parameters:
  • train_features (pd.DataFrame) – pandas DataFrame representing features of the training data.

  • train_targets (pd.DataFrame or pd.Series) – pandas DataFrame representing targets of the training data.

  • overwrite (bool, default False) – Whether to allow overwriting existing training data.

rename(name: str) None

Rename this experiment run

Parameters:

name (str) – New name for this experiment run.