First Convolutional Neural Network [Binary Bird Classification]

September 13, 2024

CATEGORIES

Birds are awesome and extremely diverse creatures and found in every corner of the earth. Each of them is unique yet all of them share in their charm, as they are universally intriguing. Perhaps it was no surprise then that for my very first Machine Learning Project I would decide to work on something related to birds. In this case, I chose to work on Bird Classification. I found this topic to be extremely interesting and convenient as Bird Classification datasets can be easily obtained from the internet. This would be my second ever Machine Learning project, proceeded only by a small model for MNIST.
The first thing I did was go onto Kraggle to find a suitable dataset to use. Although there were numerous great datasets available, most seemed too difficult and complex for a beginner. After looking at a few datasets, I finally landed on The Tiny Bird Binary Classification Dataset. With only two classes and a much smaller dataset which was perfect, as a smaller dataset would also mean that much less computation. This was quite important considering I was using Google Collab at the time and would still take several hours to run the smallest of models.
This was also my first time using a custom dataset instead of a TensorFlow dataset, and with it came the challenges of learning new syntax and how to work with the TensorFlow libraries. I had a lot to learn, but one of the most helpful resources I found were the TensorFlow Tutorials by Aladinn Pearson found on YouTube. His 20 video tutorial covered a wide range of useful topics, including everything I needed, from accessing datasets, to coding models, and performing data augmentation. As some one who has learned almost exclusively theory up to this point, this resource has been very helpful, and I greatly recommend this to any others in a similar situation as me.
Moving on, the way I accessed the dataset was using the function tf.keras.preprocessing.image_dataset_from_directory() that given the file path will automatically collect everything in that folder into a Tensorflow Dataset for me to use.

Code for compiling training set:
ds_train = tf.keras.preprocessing.image_dataset_from_directory(
train_dir,
labels=’inferred’,
label_mode=”int”,
color_mode=’rgb’,
batch_size=None,
image_size=IMG_HEIGHT, IMG_WIDTH),
)

For my actual model, I decided to base it off the classic CNN architecture of LeNet-5, using many 5×5 convolutions and MaxPools. I thought this was a good idea since the idea of using switching between convolutional layers and pooling layers periodically throughout the model was a well-established, and using 5×5 convolutional layers allowed me to get more learnable parameters with the same number of layers. I favored this also because of its relatively small size, and how well it performed on the MNIST dataset.
For the loss function I used the Binary Cross Entropy Function given that the problem was a binary classification problem. I also used the Adam optimizer and started with a learning rate of 0.01.
I couldn’t exactly follow the LeNet-5 exactly, as my input image was 256x256x3, many times larger than LeNet-5’s 32×32 so I added more layers while maintaining the other parts of the architecture.
However, after training the model, the results were really bad, with the training accuracy stuck consistently around 50-60% and the validation accuracy changing very rapidly and inconsistently. This is very bad considering that this is a binary classification, and essentially means the model is not training at all.
This behavior, I would later realize, is the textbook sign of having an excessively large learning rate. This was the first time I had tried such a complex problem (the previous one being MNIST), so I gravely underestimated the learning rate. The model’s behavior is due to having a learning rate that was too large, making the model unable to fine-tune and train to more delicate details., therefore in later iterations, as I decreased the learning rate from 0.01 to 0.001 and finally to 0.0001, the accuracy of the model grew more and more as the model was able to train to a much finer accuracy than before.
Unfortunately, I did not realize that this was the problem and instead thought that the model was too simple, causing me to constantly add more and more parameters to no avail. Looking back, this is a really easy trap to fall into and one that I will be sure to be more conscious of in the future. Though it is true that increasing the size of the model will often increase the accuracy, it isn’t the only factor to consider, not to mention the danger of overfitting.
In the end, my model performed much better than before, achieving a training set accuracy of 98.6% and a test and validation set accuracy of 100%. I was surprised at how well the model generalized considering that I didn’t use any regularizes, but this is likely due to the small size of the model. In the end I am extremely satisfied with the results and of this project which did far better than I had hoped as my very first CNN.
Final code:
model = keras.Sequential(
[
layers.Input((256, 256, 3)),
layers.Conv2D(16, (5, 5), padding=”same”, activation=”relu”),
layers.Conv2D(16, (5, 5), padding=”same”, activation=”relu”),
layers.MaxPool2D(2, 2),
layers.Conv2D(32, (5, 5), padding=”same”, activation=”relu”),
layers.Conv2D(32, (5, 5), padding=”same”, activation=”relu”),
layers.MaxPool2D(2, 2),
layers.Conv2D(64, (5, 5), padding=”same”, activation=”relu”),
layers.Conv2D(64, (5, 5), padding=”same”, activation=”relu”),
layers.MaxPool2D(2, 2),
layers.Conv2D(64, (5, 5), padding=”same”, activation=”relu”),
layers.Conv2D(64, (5, 5), padding=”same”, activation=”relu”),
layers.Flatten(),
layers.Dense(512, activation=”relu”),
layers.Dense(512, activation=”relu”),
layers.Dense(1, activation = “sigmoid”),
]
)
model.summary()
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss=keras.losses.BinaryCrossentropy(),
metrics=[‘accuracy’],
)
history = model.fit(ds_train, validation_data=ds_val, batch_size=BATCH_SIZE, epochs=15)
model.evaluate(ds_test)

Output:
Model: “sequential”
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D) │ (None, 256, 256, 16) │ 1,216 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_1 (Conv2D) │ (None, 256, 256, 16) │ 6,416 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d (MaxPooling2D) │ (None, 128, 128, 16) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_2 (Conv2D) │ (None, 128, 128, 32) │ 12,832 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_3 (Conv2D) │ (None, 128, 128, 32) │ 25,632 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_1 (MaxPooling2D) │ (None, 64, 64, 32) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_4 (Conv2D) │ (None, 64, 64, 64) │ 51,264 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_5 (Conv2D) │ (None, 64, 64, 64) │ 102,464 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_2 (MaxPooling2D) │ (None, 32, 32, 64) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_6 (Conv2D) │ (None, 32, 32, 64) │ 102,464 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_7 (Conv2D) │ (None, 32, 32, 64) │ 102,464 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ flatten (Flatten) │ (None, 65536) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense (Dense) │ (None, 512) │ 33,554,944 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense) │ (None, 512) │ 262,656 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_2 (Dense) │ (None, 1) │ 513 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 34,222,865 (130.55 MB)
Trainable params: 34,222,865 (130.55 MB)
Non-trainable params: 0 (0.00 B)
Epoch 1/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 275s 4s/step – accuracy: 0.6030 – loss: 0.6879 – val_accuracy: 0.7000 – val_loss: 0.6603
Epoch 2/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 192s 4s/step – accuracy: 0.7323 – loss: 0.6062 – val_accuracy: 0.7667 – val_loss: 0.4010
Epoch 3/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 192s 4s/step – accuracy: 0.8731 – loss: 0.2603 – val_accuracy: 0.9333 – val_loss: 0.2056
Epoch 4/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 190s 4s/step – accuracy: 0.9199 – loss: 0.2624 – val_accuracy: 1.0000 – val_loss: 0.0732
Epoch 5/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 199s 4s/step – accuracy: 0.9147 – loss: 0.2143 – val_accuracy: 0.9000 – val_loss: 0.4666
Epoch 6/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 195s 4s/step – accuracy: 0.9179 – loss: 0.2517 – val_accuracy: 0.8667 – val_loss: 0.2422
Epoch 7/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 201s 4s/step – accuracy: 0.9419 – loss: 0.1712 – val_accuracy: 0.9667 – val_loss: 0.1649
Epoch 8/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 194s 4s/step – accuracy: 0.9305 – loss: 0.1764 – val_accuracy: 0.9667 – val_loss: 0.0867
Epoch 9/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 190s 4s/step – accuracy: 0.9899 – loss: 0.0607 – val_accuracy: 1.0000 – val_loss: 0.0209
Epoch 10/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 187s 4s/step – accuracy: 0.9721 – loss: 0.0648 – val_accuracy: 1.0000 – val_loss: 0.0088
Epoch 11/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 192s 4s/step – accuracy: 0.9996 – loss: 0.0030 – val_accuracy: 1.0000 – val_loss: 7.7336e-04
Epoch 12/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 186s 4s/step – accuracy: 0.9917 – loss: 0.0254 – val_accuracy: 1.0000 – val_loss: 0.0344
Epoch 13/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 188s 4s/step – accuracy: 0.9782 – loss: 0.0778 – val_accuracy: 1.0000 – val_loss: 0.0014
Epoch 14/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 194s 4s/step – accuracy: 0.9822 – loss: 0.0478 – val_accuracy: 0.9667 – val_loss: 0.0563
Epoch 15/15
47/47 ━━━━━━━━━━━━━━━━━━━━ 190s 4s/step – accuracy: 0.9772 – loss: 0.0637 – val_accuracy: 1.0000 – val_loss: 0.0084
5/5 ━━━━━━━━━━━━━━━━━━━━ 6s 961ms/step – accuracy: 1.0000 – loss: 0.0115

Lingyuan's blog

First Convolutional Neural Network [Binary Bird Classification]

Jazz Ensemble Elementary School Tour!

How information changes our world.

Incredible Winter Concert!

Little Women Theater Production at my School!

Kaleb Joseph and Mental Health

Thoughts and personal experiences with Bilingualism and Mandarin proverbs.

First Convolutional Neural Network [Binary Bird Classification]

Jazz Ensemble Elementary School Tour!

How information changes our world.

Incredible Winter Concert!

Little Women Theater Production at my School!

Kaleb Joseph and Mental Health

Thoughts and personal experiences with Bilingualism and Mandarin proverbs.

Share this: