Remo PyTorch Image Classification Dataset


Any software which aims to do something in a smarter and more efficient way, must seamlessly integrate with existing workflows.

Remo provides an smarter to organize, annotate and visualize data and we are constantly trying to improve the way this will work downstream with various deep learning framework used by the community.

As part of our documentation, we are iterating on an efficient custom Dataset object, that will allow users to go straight from Remo to training a model in PyTorch which can be found here.

The aim of this post is to describe our thought process while building this, and get feedback from the community on how this could be further improved.

class FlowerDataset(Dataset):
    Custom PyTorch Dataset Class to facilitate loading data for the Image Classifcation Task
    def __init__(self, annotations, train_test_valid_split, mapping = None, mode = 'train', transform = None):
            annotations: The path to the annotations CSV file. Format: file_name, class_name
            train_test_valid_split: The path to the tags CSV file for train, test, valid split. Format: file_name, tag
            mapping: a dictionary containing mapping of class name and class index. Format : {'class_name' : 'class_index'}, Default: None
            mode: Mode in which to instantiate class. Default: 'train'
            transform: The transforms to be applied to the image data

            image : Torch Tensor, label_tensor : Torch Tensor, file_name : str

        my_data = pd.read_csv(annotations, index_col='file_name')
        my_data['tag'] = pd.read_csv(train_test_valid_split, index_col='file_name')
        my_data = my_data.reset_index()

        self.mapping = mapping
        self.transform = transform
        self.mode = mode

        my_data = my_data.loc[my_data['tag'] == mode].reset_index(drop=True) = my_data
    def __len__(self):
        return len(

    def __getitem__(self, idx):
        if self.mapping is not None:
            labels = int(self.mapping[[idx, 'class_name'].lower()])
            labels = int([idx, 'class_name'])

        im_path =[idx, 'file_name']

        label_tensor =  torch.as_tensor(labels, dtype=torch.long)
        im =

        if self.transform:
            im = self.transform(im)

        if self.mode == 'test':
            # For saving the predictions, the file name is required
            return {'im' : im, 'labels': label_tensor, 'im_name' :[idx, 'file_name']}
            return {'im' : im, 'labels' : label_tensor}

The ideal Dataset object is easily modifiable and extendable to any dataset which is used for the task of Image Classification. And based on different modes returns information needed for training/validation or testing. This formed the motivation for this design.

The parameters and input to the object is described as follows:

This Dataset objects accepts the annotations in the form a CSV which contains between the location of the image) and the label.

CSV Columns: ‘file_name’, ‘class_name’

This CSV file can be generated using Remo providing the path to a folder of folders via remo.generate_annotations_from_folders() or could be generated by the user, and the path to this file is provided to the Dataset object.


A stratified train test validation split is generated in the demo, using the model_selection.train_test_split methods and tags are assigned based on this. The Dataset class handles splitting the dataset, and the path to the train_test_valid_split CSV is provided as an input.

CSV Columns: ‘file_name’, ‘tags’


The classes provided might be a string which represents the name of the class, or an integer class label. In case the annotations contain the name of the class label, a dictionary can be provided to automatically convert it to a format required by PyTorch.

Example: {“car” : 0, “dog” : 1}

The data split and the nature of return is selected based on a string which describes the mode.

Transforms include the data processing steps like data augmentation, resizing, conversion to Tensor and normalizing the images before training the model. These differ based on the mode of training.

Following this, you can instantiate the Dataset with these parameters provided, and wrap it around a DataLoader for the training process.

We appreciate feedback and any suggestions from the community to improve this.