StructuredDataClassifier
StructuredDataClassifier
autokeras.StructuredDataClassifier(
column_names=None,
column_types=None,
num_classes=None,
multi_label=False,
loss=None,
metrics=None,
project_name="structured_data_classifier",
max_trials=100,
directory=None,
objective="val_accuracy",
tuner=None,
overwrite=False,
seed=None,
max_model_size=None,
**kwargs
)
AutoKeras structured data classification class.
Arguments
- column_names
Optional[List[str]]
: A list of strings specifying the names of the columns. The length of the list should be equal to the number of columns of the data excluding the target column. Defaults to None. If None, it will obtained from the header of the csv file or the pandas.DataFrame. - column_types
Optional[Dict]
: Dict. The keys are the column names. The values should either be 'numerical' or 'categorical', indicating the type of that column. Defaults to None. If not None, the column_names need to be specified. If None, it will be inferred from the data. - num_classes
Optional[int]
: Int. Defaults to None. If None, it will be inferred from the data. - multi_label
bool
: Boolean. Defaults to False. - loss
Optional[Union[str, Callable, tensorflow.keras.losses.Loss]]
: A Keras loss function. Defaults to use 'binary_crossentropy' or 'categorical_crossentropy' based on the number of classes. - metrics
Optional[Union[List[Union[str, Callable, tensorflow.keras.metrics.Metric]], List[List[Union[str, Callable, tensorflow.keras.metrics.Metric]]], Dict[str, Union[str, Callable, tensorflow.keras.metrics.Metric]]]]
: A list of Keras metrics. Defaults to use 'accuracy'. - project_name
str
: String. The name of the AutoModel. Defaults to 'structured_data_classifier'. - max_trials
int
: Int. The maximum number of different Keras Models to try. The search may finish before reaching the max_trials. Defaults to 100. - directory
Optional[Union[str, pathlib.Path]]
: String. The path to a directory for storing the search outputs. Defaults to None, which would create a folder with the name of the AutoModel in the current directory. - objective
str
: String. Name of model metric to minimize or maximize. Defaults to 'val_accuracy'. - tuner
Optional[Union[str, Type[autokeras.engine.tuner.AutoTuner]]]
: String or subclass of AutoTuner. If string, it should be one of 'greedy', 'bayesian', 'hyperband' or 'random'. It can also be a subclass of AutoTuner. If left unspecified, it uses a task specific tuner, which first evaluates the most commonly used models for the task before exploring other models. - overwrite
bool
: Boolean. Defaults toFalse
. IfFalse
, reloads an existing project of the same name if one is found. Otherwise, overwrites the project. - seed
Optional[int]
: Int. Random seed. - max_model_size
Optional[int]
: Int. Maximum number of scalars in the parameters of a model. Models larger than this are rejected. - **kwargs: Any arguments supported by AutoModel.
fit
StructuredDataClassifier.fit(
x=None, y=None, epochs=None, callbacks=None, validation_split=0.2, validation_data=None, **kwargs
)
Search for the best model and hyperparameters for the AutoModel.
Arguments
- x: String, numpy.ndarray, pandas.DataFrame or tensorflow.Dataset. Training data x. If the data is from a csv file, it should be a string specifying the path of the csv file of the training data.
- y: String, numpy.ndarray, or tensorflow.Dataset. Training data y. If the data is from a csv file, it should be a string, which is the name of the target column. Otherwise, It can be raw labels, one-hot encoded if more than two classes, or binary encoded for binary classification.
- epochs: Int. The number of epochs to train each model during the search. If unspecified, we would use epochs equal to 1000 and early stopping with patience equal to 30.
- callbacks: List of Keras callbacks to apply during training and validation.
- validation_split: Float between 0 and 1. Defaults to 0.2.
Fraction of the training data to be used as validation data.
The model will set apart this fraction of the training data,
will not train on it, and will evaluate
the loss and any model metrics
on this data at the end of each epoch.
The validation data is selected from the last samples
in the
x
andy
data provided, before shuffling. This argument is not supported whenx
is a dataset. - validation_data: Data on which to evaluate the loss and any model metrics
at the end of each epoch. The model will not be trained on this data.
validation_data
will overridevalidation_split
. The type of the validation data should be the same as the training data. - **kwargs: Any arguments supported by keras.Model.fit.
Returns
history: A Keras History object corresponding to the best model. Its History.history attribute is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable).
predict
StructuredDataClassifier.predict(x, **kwargs)
Predict the output for a given testing data.
Arguments
- x: String, numpy.ndarray, pandas.DataFrame or tensorflow.Dataset. Testing data x. If the data is from a csv file, it should be a string specifying the path of the csv file of the testing data.
- **kwargs: Any arguments supported by keras.Model.predict.
Returns
A list of numpy.ndarray objects or a single numpy.ndarray. The predicted results.
evaluate
StructuredDataClassifier.evaluate(x, y=None, **kwargs)
Evaluate the best model for the given data.
Arguments
- x: String, numpy.ndarray, pandas.DataFrame or tensorflow.Dataset. Testing data x. If the data is from a csv file, it should be a string specifying the path of the csv file of the testing data.
- y: String, numpy.ndarray, or tensorflow.Dataset. Testing data y. If the data is from a csv file, it should be a string corresponding to the label column.
- **kwargs: Any arguments supported by keras.Model.evaluate.
Returns
Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has multiple outputs and/or metrics). The attribute model.metrics_names will give you the display labels for the scalar outputs.
export_model
StructuredDataClassifier.export_model()
Export the best Keras Model.
Returns
keras.Model instance. The best model found during the search, loaded with trained weights.