Utils
image_dataset_from_directory
autokeras.image_dataset_from_directory(
directory,
batch_size=32,
color_mode="rgb",
image_size=(256, 256),
interpolation="bilinear",
shuffle=True,
seed=None,
validation_split=None,
subset=None,
)
Generates a tf.data.Dataset
from image files in a directory.
If your directory structure is:
main_directory/
...class_a/
......a_image_1.jpg
......a_image_2.jpg
...class_b/
......b_image_1.jpg
......b_image_2.jpg
Then calling image_dataset_from_directory(main_directory)
will return a tf.data.Dataset
that yields batches of images from
the subdirectories class_a
and class_b
, together with labels
'class_a' and 'class_b'.
Supported image formats: jpeg, png, bmp, gif. Animated gifs are truncated to the first frame.
Arguments
- directory
str
: Directory where the data is located. Iflabels
is "inferred", it should contain subdirectories, each containing images for a class. Otherwise, the directory structure is ignored. - batch_size
int
: Size of the batches of data. Default: 32. - color_mode
str
: One of "grayscale", "rgb", "rgba". Default: "rgb". Whether the images will be converted to have 1, 3, or 4 channels. - image_size
Tuple[int, int]
: Size to resize images to after they are read from disk. Defaults to(256, 256)
. Since the pipeline processes batches of images that must all have the same size, this must be provided. - interpolation
str
: String, the interpolation method used when resizing images. Defaults tobilinear
. Supportsbilinear
,nearest
,bicubic
,area
,lanczos3
,lanczos5
,gaussian
,mitchellcubic
. - shuffle
bool
: Whether to shuffle the data. Default: True. If set to False, sorts the data in alphanumeric order. - seed
int | None
: Optional random seed for shuffling and transformations. - validation_split
float | None
: Optional float between 0 and 1, fraction of data to reserve for validation. - subset
str | None
: One of "training" or "validation". Only used ifvalidation_split
is set.
Returns
A tf.data.Dataset
object, which yields a tuple (texts, labels)
,
where images
has shape (batch_size, image_size[0], image_size[1],
num_channels)
where labels
has shape (batch_size,)
and type of
tf.string.
- if color_mode
is grayscale
, there's 1 channel in the image
tensors.
- if color_mode
is rgb
, there are 3 channel in the image tensors.
- if color_mode
is rgba
, there are 4 channel in the image tensors.
text_dataset_from_directory
autokeras.text_dataset_from_directory(
directory, batch_size=32, max_length=None, shuffle=True, seed=None, validation_split=None, subset=None
)
Generates a tf.data.Dataset
from text files in a directory.
If your directory structure is:
main_directory/
...class_a/
......a_text_1.txt
......a_text_2.txt
...class_b/
......b_text_1.txt
......b_text_2.txt
Then calling text_dataset_from_directory(main_directory)
will return a tf.data.Dataset
that yields batches of texts from
the subdirectories class_a
and class_b
, together with labels
'class_a' and 'class_b'.
Only .txt
files are supported at this time.
Arguments
- directory
str
: Directory where the data is located. Iflabels
is "inferred", it should contain subdirectories, each containing text files for a class. Otherwise, the directory structure is ignored. - batch_size
int
: Size of the batches of data. Defaults to 32. - max_length
int | None
: Maximum size of a text string. Texts longer than this will be truncated tomax_length
. - shuffle
bool
: Whether to shuffle the data. Default: True. If set to False, sorts the data in alphanumeric order. - seed
int | None
: Optional random seed for shuffling and transformations. - validation_split
float | None
: Optional float between 0 and 1, fraction of data to reserve for validation. - subset
str | None
: One of "training" or "validation". Only used ifvalidation_split
is set.
Returns
A tf.data.Dataset
object, which yields a tuple (texts, labels)
,
where both has shape (batch_size,)
and type of tf.string.