Skip to content

Text Regression

View in Colab     GitHub source

!export KERAS_BACKEND="torch"
!pip install autokeras
import os

import keras
import numpy as np
from sklearn.datasets import load_files

import autokeras as ak

To make this tutorial easy to follow, we just treat IMDB dataset as a regression dataset. It means we will treat prediction targets of IMDB dataset, which are 0s and 1s as numerical values, so that they can be directly used as the regression targets.

A Simple Example

The first step is to prepare your data. Here we use the IMDB dataset as an example.

dataset = keras.utils.get_file(
    fname="aclImdb.tar.gz",
    origin="http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz",
    extract=True,
)

# set path to dataset
IMDB_DATADIR = os.path.join(
    os.path.dirname(dataset), "aclImdb_extracted", "aclImdb"
)

classes = ["pos", "neg"]
train_data = load_files(
    os.path.join(IMDB_DATADIR, "train"), shuffle=True, categories=classes
)
test_data = load_files(
    os.path.join(IMDB_DATADIR, "test"), shuffle=False, categories=classes
)

x_train = np.array(train_data.data)[:100]
y_train = np.array(train_data.target)[:100]
x_test = np.array(test_data.data)[:100]
y_test = np.array(test_data.target)[:100]

print(x_train.shape)  # (25000,)
print(y_train.shape)  # (25000, 1)
print(x_train[0][:50])  # <START> this film was just brilliant casting <UNK>

The second step is to run the TextRegressor. As a quick demo, we set epochs to 2. You can also leave the epochs unspecified for an adaptive number of epochs.

# Initialize the text regressor.
reg = ak.TextRegressor(
    overwrite=True, max_trials=1  # It tries 10 different models.
)
# Feed the text regressor with training data.
reg.fit(x_train, y_train, epochs=1, batch_size=2)
# Predict with the best model.
predicted_y = reg.predict(x_test)
# Evaluate the best model with testing data.
print(reg.evaluate(x_test, y_test))

Validation Data

By default, AutoKeras use the last 20% of training data as validation data. As shown in the example below, you can use validation_split to specify the percentage.

reg.fit(
    x_train,
    y_train,
    # Split the training data and use the last 15% as validation data.
    validation_split=0.15,
)

You can also use your own validation set instead of splitting it from the training data with validation_data.

split = 5
x_val = x_train[split:]
y_val = y_train[split:]
x_train = x_train[:split]
y_train = y_train[:split]
reg.fit(
    x_train,
    y_train,
    epochs=1,
    # Use your own validation set.
    validation_data=(x_val, y_val),
    batch_size=2,
)

Customized Search Space

For advanced users, you may customize your search space by using AutoModel instead of TextRegressor. You can configure the TextBlock for some high-level configurations. You can also do not specify these arguments, which would leave the different choices to be tuned automatically. See the following example for detail.

input_node = ak.TextInput()
output_node = ak.TextBlock()(input_node)
output_node = ak.RegressionHead()(output_node)
reg = ak.AutoModel(
    inputs=input_node, outputs=output_node, overwrite=True, max_trials=1
)
reg.fit(x_train, y_train, epochs=1, batch_size=2)

Reference

TextRegressor, AutoModel, TextBlock, ConvBlock, TextInput, RegressionHead.