Introduction to Keras
Justin Zhang
December 2017
Introduction
Introduction
Keras is a high-level Python machine learning API, which allows you to easily run neural networks. Keras is simply a specification; it provides a set of methods that you can use, and it will use a backend (TensorFlow, Theano, or CNTK, as chosen by the user) to actually run your code. Like many machine learning frameworks, Keras is a so-called define-and-run framework. This means that it will define and optimize your neural network in a compilation step before training starts.
First Steps
First, install Keras: pip install keras
We’ll go over a fully-connected neural network designed for the MNIST (classifying handwritten digits) dataset.
# Can be found at https://github.com/fchollet/keras
# Released under the MIT License
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop
First, we handle the imports. keras.datasets
includes many popular machine learning datasets, including CIFAR and IMDB. keras.layers
includes a variety of neural network layer types, including Dense
, Conv2D
, and LSTM
. Dense
simply refers to a fully-connected layer, and Dropout
is a layer commonly used to address the problem of overfitting. keras.optimizers
includes most widely used optimizers. Here, we opt to use RMSprop, an alternative to the classic stochastic gradient decent (SGD) algorithm. Regarding keras.models
, Sequential
is most widely used for simple networks; there is also a functional Model
class you can use for more complex networks.
batch_size = 128
num_classes = 10
epochs = 20
Here, we define our constants...
# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
...and the dataset.
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
The training set will contain \(60000\) \(28\) by \(28\) images; we resize this to \(60000\) by \(784\) since our MLP will take as input a vector of length \(784\). Similarly, the test set, which consists of \(10000\) images is resized to \(10000\) by \(784\). We next convert these tensors to floats, and normalize the values to \([0,1]\).
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
This code converts the ground-truth labels from a class vector (each digit is labeled with an integer \(0\) through \(9\)) to a one-hot encoding (each digit is labeled with a length \(9\) vector which consists of zeros except for the index represented by the digit, which equals one). This is so that the ground-truth labels will be compatible with our categorical cross-entropy loss function.
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
Here, the neural network is defined. The inputs are \(784 \rightarrow 512 \rightarrow 512 \rightarrow 10\), with the ReLU activation function (alternative to the classical sigmoid). A softmax is applied at the end to convert our ten outputs to probabilities, each output denoting a digit. Dropout is applied in between layers to avoid overfitting.
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
Here, the model is compiled. Our loss function is categorical cross-entropy, which is used for categorical data with more than two classes. The metrics
keyword argument is a list of metrics we want to keep track of in throughout training, validation, and testing. We can provide our own functions, or provide strings denoting built-in metrics (e.g. accuracy).
history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
Here, the model is trained. The verbose
keyword argument gives us a nice progress bar as the training process progresses. Returned is a history of the metrics throughout training.
score = model.evaluate(x_test, y_test, verbose=0)
Finally, the model is evaluated on the test set. The metrics, as defined in the compile function, are returned.
More Stuff
Though this was a fairly simple example, it covers mostly what you will use in Keras. More advanced layers, optimizers, or losses can be found at Keras’s homepage; Keras has great documentation! Things that may interest you are Keras’s functional API, which allows more flexible networks with multiple inputs/outputs, as well as the backend API, which allows for defining custom layers. Note that Keras and TensorFlow are interoperable; you can write TensorFlow code that operates on Keras tensors, for defining operations at an even lower level than that specified in the backend API (given that you use the TensorFlow backend for Keras).
Conclusion
Keras is a great framework for rapid prototyping; it provides a high-level API with potential for lower level operations, when necessary. Note that, however, when working with large or customized models over long-term projects, you may want to look into other frameworks such as PyTorch, which doesn’t have a compile step and is much more low-level.
Practice
Do Kaggle competitions!