readme.md

# Semantic segmentation from robot camera


## Structure of this repository

### code

all importand code is in `src` directory
 - main.py -> the main function
 - utils.py -> most of the code is here, basically all of the functions


### weights

contains weights to the model

### our_dataset

contains images from robot camera and csv file with image paths

### requirements.txt

contains the list of all the required packages to run the code


## Description

This project uses Deeplabv3 with ResNet50 to segment the image from a robot camera into multiple classes.

The training and validation runs on the KITTI-Step dataset [1]. 


The following image depicts testing the model on an image from robot camera.
![](readme_photos/Figure_1.png) 


### Kitti-Step dataset

The Kitti dataset consists of 20 classes of cityscapes.
 However, in our usecase, we do not need all of them. We have grouped the classes into 12 classes. The mapping is shown below:

| ID | Category                           | Class IDs (Kitti)      |
|----|------------------------------------|-----------------|
| 1  | Road, Sidewalk                     | 0, 1            |
| 2  | Building                           | 2               |
| 3  | Wall                               | 3               |
| 4  | Fence                              | 4               |
| 5  | Pole, Traffic Light, Traffic Sign  | 5, 6, 7         |
| 6  | Vegetation                         | 8               |
| 7  | Terrain                            | 9               |
| 8  | Sky                                | 10              |
| 9  | Person                             | 11              |
| 10 | Rider, Car, Truck, Bus, Train      | 12, 13, 14, 15, 16 |
| 11 | Motorcycle, Bicycle                | 17, 18          |
| 0  | Void                               | 255             |

ID stands for the class ID in our model. Category is the name of the class. Class IDs are the class IDs from the KITTI dataset.

To train and validate, the user is supposed also have the image paths and mask paths in csv files. The example of the csv files is shown below:

| image_path                                              | mask_path                                                |
|---------------------------------------------------------|----------------------------------------------------------|
| 00000/pylon_camera_node/frame001707-1581624823_450.jpg  | 00000/pylon_camera_node_label_id/frame001707-1581624823_450.png |
| 00000/pylon_camera_node/frame001709-1581624823_650.jpg  | 00000/pylon_camera_node_label_id/frame001709-1581624823_650.png |
| 00000/pylon_camera_node/frame001711-1581624823_849.jpg  | 00000/pylon_camera_node_label_id/frame001711-1581624823_849.png |
| 00000/pylon_camera_node/frame001713-1581624824_050.jpg  | 00000/pylon_camera_node_label_id/frame001713-1581624824_050.png |
| 00000/pylon_camera_node/frame001715-1581624824_250.jpg  | 00000/pylon_camera_node_label_id/frame001715-1581624824_250.png |
| 00000/pylon_camera_node/frame001717-1581624824_449.jpg  | 00000/pylon_camera_node_label_id/frame001717-1581624824_449.png |
| 00000/pylon_camera_node/frame001719-1581624824_650.jpg  | 00000/pylon_camera_node_label_id/frame001719-1581624824_650.png |
| 00000/pylon_camera_node/frame001721-1581624824_849.jpg  | 00000/pylon_camera_node_label_id/frame001721-1581624824_849.png |
| 00000/pylon_camera_node/frame001723-1581624825_049.jpg  | 00000/pylon_camera_node_label_id/frame001723-1581624825_049.png |
| 00000/pylon_camera_node/frame001725-1581624825_249.jpg  | 00000/pylon_camera_node_label_id/frame001725-1581624825_249.png |


The directories, where the user is storing the dataset and csv's can be adjusted in class KittiDataset in `__init__` method. This is located in 
`src/utils.py` file.


## Installation

Clone this repository and install the required packages using the following command:

```bash
pip install -r requirements.txt
```
The dataset can be downloaded from [1]. You will need to create a free account and then you will be able to download the dataset. If you want to resume the training, we provide the train.csv and val.csv in the project's root directory with list (and paths) to the images and masks used.

The training is recommended to be done on a GPU. The inference can be done on a CPU.

## Usage

This section explains how to use the project.
If you are unsure, please run the following command to get help:

```bash
python3 main.py --help
```


### Train


To train the model, run the following command:
```bash
python3 main.py --train --dataset <dataset> -b <batch_size> -e <epochs> 
```
Example:
```bash
python3 main.py --train --dataset kitti -b 4 -e 10
```
This will run the training for 10 epochs with a batch size of 4 using the KITTI dataset.
Plese note that currently only the KITTI dataset is supported. This feature
has been introduced to test different datasets and can be expanded later.


Each epoch, when the best validation loss is achieved, the model will be saved in the `src` directory as `best_model.pth` along with  `my_metrics.csv`. The example of the metrics file is shown below:

| epoch | train_loss    | train_acc    | train_mIoU   | valid_loss    | valid_acc    | valid_mIoU   | lr    |
|-------|---------------|--------------|--------------|---------------|--------------|--------------|-------|
| 0     | 0.76410479525 | 0.7632103416 | 0.6213652193 | 0.47230016700 | 0.8474194517 | 0.6025163291 | 0.001 |
| 1     | 0.50933702405 | 0.8376588118 | 0.5848964697 | 0.34550594359 | 0.8886793129 | 0.6004003462 | 0.001 |
| 2     | 0.41976822275 | 0.8653014826 | 0.5963675622 | 0.34501011908 | 0.8874446186 | 0.6118302799 | 0.001 |
| 3     | 0.36635888226 | 0.8819761690 | 0.5990552651 | 0.24491700816 | 0.9190653181 | 0.6061746711 | 0.001 |
| 4     | 0.33089562705 | 0.8930528389 | 0.6012453075 | 0.25520572519 | 0.9160627570 | 0.6105592662 | 0.001 |
| 5     | 0.31855319423 | 0.8959040747 | 0.6007909913 | 0.23256588253 | 0.9220116575 | 0.6189839657 | 0.001 |
| 6     | 0.28384636132 | 0.9063298203 | 0.6027299528 | 0.20399692611 | 0.9307124982 | 0.6161537994 | 0.001 |


The table consists of number of epochs, training and validation loss, accuracy and mean IoU for both train and validation. The learning rate is also shown in the last column. We stuck to the learning rate of 0.001 for the whole training process, however this can be easily changed via the learning scheduler (might be implemented in the future).


### Inference

To use the model on `our_dataset`, run the following command:
```bash
python3 main.py --test_our_kitti
```
This will run the model on images in the `our_dataset` directory. The 
image will show up and user can hover over the image to see the segmented classes.
This can be easily adjusted by user to run without the GUI (needs to be done
programmatically).


The path to the images in `our_dataset` directory also needs to be saved in 
csv file `our_dataset.csv` located in `our_dataset` directory. The example of the csv file is shown below:

| Image Path    |
|---------------|
| group1_3.jpg  |
| group2_1.jpg  |
| group4_3.jpg  |
| group6_2.jpg  |
| group5_1.jpg  |
| group7_1.jpg  |


# Image size

The Kitti images are 1242x375. During training use 
cropped version of the photos. More, the images are cut into 2 halfs and then sent to model.

To run model on our dataset we use the following augmentation techniques:
```python
def get_our_dataset_augmentations():
   # the images are 720x1280 so cut them from sides without change of aspect raitio
    return A.Compose([
        A.Resize(600, 1067),
        # cutout 53 pixels from the left and 53 pixels from the right
        A.Crop(x_min=53, x_max=1067-53, y_min=0, y_max=600),
    ])
```

We introduced these techniques to tailor the images to best fit our purpose.


## Results

We provide the best performing model in  `weights/2024_05_10` directory. The model was trained for 46 epochs. It achieved accuracy of 96% with mIoU of 62%. The graphs of performance of the model are shown below:

![](readme_photos/train_vs_val_loss.png)
![](readme_photos/train_vs_val_acc.png)
![](readme_photos/train_vs_val_miou.png)

We can see that apart from the strange anomaly in epoch number 18, the model training was stable. More, we can see that the model is not overfitting, thus user can resume the training with more epochs to achieve better results.


## What did not work

First, we tried running our training pipline on dataset called Rellis dataset [2]. The problem with this dataset is that it does not 
contain enough images from urban areas. Thus, our robot was able to detect 
terrain, trees very well, but it was not able to detect cars, people, roads, buildings. That is why we switched to Kitti dataset.


## Bibliography

1. [KITTI-STEP](https://www.cvlibs.net/datasets/kitti/eval_step.php)
2. [Rellis-3D](https://github.com/unmannedlab/RELLIS-3D)