Skip to content

preprocessor

OneHotEncoder

A class that can format data. This class provides ways to transform data's classification label into vector.

Attributes
  • data: The input data

  • n_classes: The number of classes in the classification problem.

  • labels: The number of labels.

  • label_to_vec: Mapping from label to vector.

  • int_to_label: Mapping from int to label.

init

Initialize a OneHotEncoder

fit

Create mapping from label to vector, and vector to label.

transform

Get vector for every element in the data array.

inverse_transform

Get label for every element in data.

Cutout

Randomly mask out one or more patches from an image.

Args
  • n_holes (int): Number of patches to cut out of each image.

  • length (int): The length (in pixels) of each square patch.

call

Perform the actual transformation.

Args
  • img (Tensor): Tensor image of size (C, H, W).
Returns
  • Tensor: Image with n_holes of dimension length x length cut out of it.

DataTransformer

A superclass for all the DataTransformer.

transform_train

Transform the training data and get the DataLoader class.

Args
  • data: x.

  • targets: y.

  • batch_size: the batch size.

Returns
  • dataloader: A torch.DataLoader class to represent the transformed data.

transform_test

Transform the training data and get the DataLoader class.

Args
  • data: x.

  • targets: y.

  • batch_size: the batch size.

Returns
  • dataloader: A torch.DataLoader class to represent the transformed data.

TextDataTransformer

A DataTransformer class for the text data.

transform_train

Transform the training dataset.

transform_test

Transform the testing dataset.

ImageDataTransformer

Perform basic image transformation and augmentation.

Attributes
  • max_val: the maximum value of all data.

  • mean: the mean value.

  • std: the standard deviation.

  • augment: whether to perform augmentation on data.

transform_train

Transform the training data, perform random cropping data augmentation and basic random flip augmentation.

Args
  • data: Numpy array. The data to be transformed.

  • batch_size: int batch_size.

  • targets: the target of training set.

Returns

transform_test

Transform the test data, perform normalization.

Args
  • data: Numpy array. The data to be transformed.

  • batch_size: int batch_size.

  • targets: the target of test set.

Returns

_transform

Perform the actual transformation.

Args
  • compose_list: a list of transforming operation.

  • data: x.

  • targets: y.

Returns

MultiTransformDataset

A class incorporate all transform method into a torch.Dataset class.

BatchDataset

A torch.Dataset class that can read data batch by batch.