Using Pre Trained Models for Image Classification

Learn how to classify images with pre-trained models in OpenCV, one of the most popular computer vision libraries. Follow our step-by-step guide with code examples to understand the theory behind pre-trained models, how to load them, and how to use them to classify images.

Updated March 20, 2023


Hey! If you love Computer Vision and AI, let's connect on Twitter or LinkedIn. I talk about this stuff all the time!

Welcome to this tutorial on using pre-trained models for image classification with OpenCV. Image classification is a fundamental task in computer vision, with a wide range of applications such as object recognition, facial recognition, and scene understanding.

In this tutorial, we will explore how to use pre-trained models for image classification with OpenCV. We will discuss the theory behind pre-trained models, provide multiple code examples to illustrate the concept, and explain how and why someone would use pre-trained models.

Theory

Pre-trained models are deep neural networks trained on large datasets such as ImageNet. These models can be used for a variety of computer vision tasks such as image classification, object detection, and semantic segmentation.

Pre-trained models have several advantages over training a new model from scratch. They can save time and resources by avoiding the need to collect and annotate a large dataset, and they can achieve higher accuracy due to the large amount of data used for training.

OpenCV provides a range of pre-trained models for image classification, including the VGG16, VGG19, and ResNet models.

Code Examples

We will use Python for our examples, but the concept applies to other programming languages supported by OpenCV.

First, let’s start by importing the necessary libraries:

import cv2
import numpy as np

Next, let’s load a sample image and load the pre-trained model:

img = cv2.imread('sample_image.jpg')
model = cv2.dnn.readNetFromCaffe('deploy.prototxt', 'model.caffemodel')

VGG16 Model

To classify an image using the VGG16 model, we can use the following code:

# Define the class labels
classes = ['background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']

# Create a blob from the image
blob = cv2.dnn.blobFromImage(img, 1, (224, 224), (104, 117, 123))

# Set the input to the model
model.setInput(blob)

# Forward pass through the model
output = model.forward()

# Get the top 5 predictions
predictions = np.argsort(output[0])[::-1][:5]

# Display the predictions
for i in range(len(predictions)):
    label = classes[predictions[i]]
    confidence = output[0][predictions[i]]
    print(f"{i+1}. {label}: {confidence*100}%")

In the above code, we first define the class labels for the VGG16 model.

Next, we create a blob from the image using the blobFromImage() function, set the input to the model using the setInput() function, and perform a forward pass through the model using the forward() function.

Finally, we get the top 5 predictions using the argsort() function, and display the predictions along with their confidence scores.

ResNet Model

To classify an image using the ResNet model, we can use the following code:

# Define the class labels
classes = ['background', 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

# Create a blob from the image
blob = cv2.dnn.blobFromImage(img, 1, (224, 224), (104, 117, 123))

# Set the input to the model
model.setInput(blob)

Forward pass through the model
output = model.forward()

Get the top 5 predictions
predictions = np.argsort(output[0])[::-1][:5]

Display the predictions
for i in range(len(predictions)):
label = classes[predictions[i]]
confidence = output[0][predictions[i]]
print(f"{i+1}. {label}: {confidence*100}%")

In the above code, we first define the class labels for the ResNet model.

Next, we create a blob from the image using the blobFromImage() function, set the input to the model using the setInput() function, and perform a forward pass through the model using the forward() function.

Finally, we get the top 5 predictions using the argsort() function, and display the predictions along with their confidence scores.

Why Use Pre-Trained Models?

Pre-trained models are widely used in computer vision applications due to their ability to achieve high accuracy with minimal effort. Pre-trained models are especially useful when working with limited data or resources, as they can save time and resources by avoiding the need to collect and annotate a large dataset.

In addition, pre-trained models can be fine-tuned for specific tasks, allowing for further improvement in accuracy. This is achieved by removing the final layers of the pre-trained model and replacing them with new layers tailored to the specific task.

Conclusion

In this tutorial, we’ve explored how to use pre-trained models for image classification with OpenCV. We discussed the theory behind pre-trained models, provided multiple code examples to illustrate the concept, and explained how and why someone would use pre-trained models.

Pre-trained models are an essential tool in computer vision, allowing for high accuracy and minimal effort in image classification tasks. By using pre-trained models, you can save time and resources while achieving high accuracy in your computer vision applications.

We hope that this tutorial has been helpful and informative for beginners and those looking to explore the world of computer vision and image classification. For further information, please refer to the OpenCV documentation and explore the different image processing techniques and their applications.