TextClassifier

[source]

TextClassifier class

autokeras.TextClassifier(
    num_classes: Optional[int] = None,
    multi_label: bool = False,
    loss: Union[str, Callable, None] = None,
    metrics: Union[
        List[Union[str, Callable, None]],
        List[List[Union[str, Callable, None]]],
        Dict[str, Union[str, Callable, None]],
        None,
    ] = None,
    name: str = "text_classifier",
    max_trials: int = 100,
    directory: Union[str, pathlib.Path, None] = None,
    objective: str = "val_loss",
    overwrite: bool = True,
    seed: Optional[int] = None,
)

AutoKeras text classification class.

Arguments

  • num_classes: Int. Defaults to None. If None, it will infer from the data.
  • multi_label: Boolean. Defaults to False.
  • loss: A Keras loss function. Defaults to use 'binary_crossentropy' or 'categorical_crossentropy' based on the number of classes.
  • metrics: A list of Keras metrics. Defaults to use 'accuracy'.
  • name: String. The name of the AutoModel. Defaults to 'text_classifier'.
  • max_trials: Int. The maximum number of different Keras Models to try. The search may finish before reaching the max_trials. Defaults to 100.
  • directory: String. The path to a directory for storing the search outputs. Defaults to None, which would create a folder with the name of the AutoModel in the current directory.
  • objective: String. Name of model metric to minimize or maximize, e.g. 'val_accuracy'. Defaults to 'val_loss'.
  • overwrite: Boolean. Defaults to True. If False, reloads an existing project of the same name if one is found. Otherwise, overwrites the project.
  • seed: Int. Random seed.

[source]

fit method

TextClassifier.fit(
    x=None, y=None, epochs=None, callbacks=None, validation_split=0.2, validation_data=None, **kwargs
)

Search for the best model and hyperparameters for the AutoModel.

It will search for the best model based on the performances on validation data.

Arguments

  • x: numpy.ndarray or tensorflow.Dataset. Training data x. The input data should be numpy.ndarray or tf.data.Dataset. The data should be one dimensional. Each element in the data should be a string which is a full sentence.
  • y: numpy.ndarray or tensorflow.Dataset. Training data y. It can be raw labels, one-hot encoded if more than two classes, or binary encoded for binary classification.
  • epochs: Int. The number of epochs to train each model during the search. If unspecified, by default we train for a maximum of 1000 epochs, but we stop training if the validation loss stops improving for 10 epochs (unless you specified an EarlyStopping callback as part of the callbacks argument, in which case the EarlyStopping callback you specified will determine early stopping).
  • callbacks: List of Keras callbacks to apply during training and validation.
  • validation_split: Float between 0 and 1. Defaults to 0.2. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling. This argument is not supported when x is a dataset. The best model found would be fit on the entire dataset including the validation data.
  • validation_data: Data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. validation_data will override validation_split. The type of the validation data should be the same as the training data. The best model found would be fit on the training dataset without the validation data.
  • **kwargs: Any arguments supported by keras.Model.fit.

[source]

predict method

TextClassifier.predict(x, batch_size=32, **kwargs)

Predict the output for a given testing data.

Arguments

  • x: Any allowed types according to the input node. Testing data.
  • batch_size: Int. Number of samples per gradient update. Defaults to 32.
  • **kwargs: Any arguments supported by keras.Model.predict.

Returns

A list of numpy.ndarray objects or a single numpy.ndarray. The predicted results.


[source]

evaluate method

TextClassifier.evaluate(x, y=None, batch_size=32, **kwargs)

Evaluate the best model for the given data.

Arguments

  • x: Any allowed types according to the input node. Testing data.
  • y: Any allowed types according to the head. Testing targets. Defaults to None.
  • batch_size: Int. Number of samples per gradient update. Defaults to 32.
  • **kwargs: Any arguments supported by keras.Model.evaluate.

Returns

Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has multiple outputs and/or metrics). The attribute model.metrics_names will give you the display labels for the scalar outputs.


[source]

export_model method

TextClassifier.export_model()

Export the best Keras Model.

Returns

tf.keras.Model instance. The best model found during the search, loaded with trained weights.