Train a custom Classification model

In this tutorial, we will see how to train and iterate over an image classification model, from image annotation to training and test !

TL:DR

If you already have a clean dataset and want to just start training your model we have have cooked a notebook for you that you can open in Google Colab : https://colab.research.google.com/github/PicselliaTeam/picsellia-training-engine/blob/master/PicselliaClassification.ipynb

For this tutorial, we will create an image classifier trained on radiographies and try to classify some disease

The Dataset

Upload the images

If you know Picsellia a little bit, you should know that all your assets are stored in our Datalake, if you need a refreshment on the Datalake, Datasets and annotations you can go check this tutorial.

Reach the Datalake from the Navbar on the left and click on the "Upload data" button on the top right of the screen, you should see this popup appear :

We advise you to add some tags to your images before upload so they will be easier to find within your Datalake later. When you have found the images you want and added your tags you can click on 'Upload' !

Create the Dataset

Now, you can refresh your Datalake and search for your assets using the Search bar at the top and our Data Query Language (I uploaded my images with the tag 'chest' so I'm searching for this tag) :

I have all the assets I want, now I'm going to click on 'Select all' and then create a new dataset :

Click on 'Create Dataset' and a popup will appear so you can enter the details :

Once you're happy with the name and descriptions, you can click on "Create", it can take a few minutes depending on the size of the dataset.

Now that we have a dataset created, let's check on it !

We can see that our dataset has been created successfully, now we can go to the 'Settings' tab in order to create the labels we want for this project :

As I have 3 classes in my case, I created the 3 labels by choosing 'Classification' at labels init.

As this is set up, we can go to the annotation step.

Annotate the data

Classification problems are a bit simpler to address than object detection or segmentation because the annotations are much lighter. Indeed, you just have to set a label per image, and sometimes the label of the image is contained in the filename of the picture !

We will see two methods to quickly annotate your data when you are dealing with a classification problem:

  • With the annotation interface

  • Convert the filenames into annotations

With the annotation interface

If you hover an image in your dataset, you will see a green 'Annotate' button appear, you can click on it to access the annotation interface.

This looks like our classical UI but with a few changes as here we want to classify images.

On the right, you will see a blue rectangle written 'Classification', it's here to remind you that you are annotating for a classification problem.

The red rectangle telling 'No Class' notifies you that the current image has no label yet, to classify the image, just click on the corresponding label (here: covid, pneumonia or normal) and click on the green 'Save' button when you are done. This will automatically drive you to the next image.

If you want to increase your annotation speed, you can use our keyboard shortcuts !

Press on 0 to 9 on the numerical keys to select a label and press Enter to save.

Now that you know how to annotate your images with the UI, let's see how we could have done it automatically using our images filenames !

Convert filenames to annotations

When you are storing images in anticipation of a classification projects, it often happen that you prefix your files with the class it belongs to. For example, I have three classes in my dataset, covid, pneumonia, normal and my filenames looks like this :

  • covid_img_003.jpg

  • pneumonia_img_188.jpg

  • normal_img_101.jpg

If the names of your files looks like this (And only if the class name is the first word of the filename and followed by an underscore (_)) then I will show you how to automatically create the annotations in Picsellia using the UI.

Let's get back to our Dataset

On the top right of the screen, you can click on the violet button that will show you some actions you can perform on you dataset.

As I know that the name of my files respect the rule I told you just before, I can now click on 'Transform filenames in annotations' and the magic will happen...

Here it is ! We can see on the bottom-left of each images that a label has been created, which means that our annotations are now fully functionnal and that our dataset is ready for training !

Let's get to the second phase, training a classification model 🚀

The Project

In Picsellia if you want to run experiments and train models, you must create a project that will act as a placeholder for everything. Check this page of the documentation to know more.

Here is my project Dashboard, as you can see it's quite empty for now !

Attach the dataset

To begin with, let's attach the dataset we just created, click on 'Open Datalake' and select the dataset you need.

Create an experiment

Now that we have some data, let's create our first experiment, click on the 'With UI' button so we can interactively create our experiment.

What we want to do is to use a pre-trained architecture suited for classification in order to obtain a model trained on our own data. To do this, click on the 'Public HUB' button in order to open a popup where you will be able to select model.

Just scroll down the list until you find the model you want (sorry we don't have a searchbar for this). For classification we only have one base architecture for now and it's called mobilenet_v2_classif_160.

Click on 'Select Model' and you will now be able to see and change the parameters for the experiment.

We can see that our architecture has been selected well, we also have some pre-filled Hyperparameters that can be used for training, let's see what they mean :

  • fine_tune, tells that this is the first time the model is trained so it's the base architecture (set it to true if you are using a base model that you already fined-tuned)

  • batch_size, the number of images passed forward at each training steps (depends on your GPU memory)

  • image_size, the image resolution when it is rescaled before training

  • initial_epochs, the number of epochs used to train the head of the classifier

  • annotation_type, type of dataset used, you can leave it that way

  • fine_tune_epochs, the number of epochs used to fine-tuned the wole architecture on your data

You have to keep all those parameters if you want to used our packaged notebooks or remote training engines as we will see right after

If you are happy with the experiment configuration, you can click on 'Create Experiment' and we will land on our experiment dashboard.

Train the model

This is the experiment dashboard, here we see that we correctly attached our dataset and are using the mobilenet_v2_classif160 as a base model. We can also check our parameters.

Let's go see our options for training, click on the 'Launch' tab.

As we have already packaged a lot of things in Picsellia, you have two options to easily train a model for this experiment :

  • Use a Jupyter Notebook (on Google Colab for example)

  • Launch the training on Picsellia Servers (Team accounts only)

Using Notebooks

If you don't want to pay for the compute, click on the 'colab' button and you will land on the notebook ready for training !

Just fill the second cell with your account information (API Token, Project token and experiment name) and then follow the instructions in the notebook to perform the training and upload your model to Picsellia 😉

Using Picsellia Servers

If you want the training to run automatically on servers equipped with the latest and fastest GPU on the market, you can click on the green button on the 'Launch' tab.

This will launch everything and you will be able to see the logs of the run in the 'Telemetry' tab during the training. You will also see the logs and metrics in real-time in the 'Logs' tab.

And that's it ! You now have trained classification models on your own data 😎

Test the model

When you upload trained weights at the end of an experiment, you can then try the model directly in your experiment dashboard by going to the 'Test' tab.

There you can run your different trained models against each other and see the predictions in the UI

That's the end of this tutorial, I hope that you are now comfortable with training models for classification tasks, if you want to go further, I suggest you go check our Object Detection tutorials 👇

pageTrain a custom Object Detection model

Last updated