Introduction

In this page we will see how the Data management features work, mainly focused on 3 objects, your Data lake, Datasets and annotations.

Data lake

Every organization has a Data Lake, it's designed to store all the assets to be used on Picsellia Platform.

This page allows you to interact with :

  • Pictures

  • Datasets

Pictures

A picture object corresponds to the location of an asset, its file name, and some tags.

What's a tag, and why use it?

When working with a lot of data, being able to quickly search in it is key to efficiency. With the tags, one can create batches, upload assets by context, etc .

You can add as many tags as you want for every Picture :) Feel free to add a lot of them.

Dataset

A dataset is the first step through training data, this object will contain labels, corresponding to the class to be annotated, the annotations and metadata such as questions & answers, context, etc.

This object contains :

  • Name : ( 60 character max length )

  • Description : ( 300 character max length )

  • Version : ( 100 character max length )

  • Origin: Corresponding of the reference to the origin dataset

  • Pictures: It's a list of all the pictures that you have selected for your dataset.

Dataset Version

Every Dataset can be versioned, you can create a new version of a dataset from the Data lake page or the Dataset page.

Each version of your Dataset can have different labels and annotations, but you can also import labels and annotations from an older version to a specific version.