Skip to main content

Deep learning and computer vision frequently make headlines in both technological publications and on mainstream news websites. Neither are particularly new ideas, however, technological, algorithmic, and statistical innovations have brought them into the spotlight due to rapid advancements in what is possible.  

In this article, you’ll learn the basic definitions of deep learning and computer vision and some applications of deep learning for computer vision. You’ll also get some tips and best practices if you are interested in experimenting in this area of artificial intelligence. 

What is Deep Learning?

Deep learning is a machine learning method that uses special algorithms and architectures that together enable computer systems to learn how to perform tasks without explicit instructions or coding. There are dedicated deep learning platforms now available for organizations to experiment and use these methods. 

Machine learning methods all rely on using algorithms to build a mathematical model. By feeding the model with large volumes of labeled, structured training data, the model begins to accurately make decisions and predictions about the data over time. 

Machine learning models can function using decision trees and other systems but the most recognized type of system is the artificial neural network (ANN). An ANN is a connected group of nodes (neurons) with a design inspired by biological nervous systems. The network has an input layer, a hidden layer, and an output layer.

The computer architecture for deep learning is more complex than standard machine learning neural networks because there are multiple hidden layers. The computational power needed for deep learning is, therefore, higher than for machine learning. This extra complexity means that deep learning models can learn from unlabelled and unstructured data.   

What is Computer Vision?

Computer vision is an interdisciplinary field that aims to get computers to “see” images/videos, identify their objects, and understand their context like humans can. Deep learning is one method of many for helping to achieve some of the desired functions of computer vision.

Some applications of deep learning in computer vision are:

  • Improving the safety of self-driving cars by identifying pedestrians and other vehicles in real-time video footage
  • Advanced facial recognition systems that can unlock phones and doors to authorized users and automatically tag photos on social media
  • Drones that can use pattern recognition to identify threatened crops
  • Improved medical diagnostics by analyzing medical images

Tips and Best Practices

If you want to learn more and delve into deep learning for computer vision, here are some useful tips and best practices. 

  1. Choose Beginner-friendly Frameworks and Libraries

Few people have the time or resources to build and implement a neural network model from scratch. Thankfully, there are now many deep learning frameworks available that dramatically simplify and reduce the time needed to implement these complex models. Libraries simplify experimentation even further by abstracting away some of the more.

A framework is an interface/tool that provides a concise way to define and build deep learning models without needing to get into the algorithmic details. There are several open source deep learning frameworks and libraries that are ideal for beginners, including:

  • TensorFlow
  • Keras
  • MXNet

TensorFlow is the most widely used framework and it has a huge community of open source supporters. Keras is actually a library that runs on top of a framework, and its modular design enables fast experimentation with deep learning. 

  1. Start with the Classic MNIST Dataset

The classic problem that every programmer solves at the beginning of their learning is creating a simple application that displays the message, “Hello World!”. The equivalent, albeit much more demanding task in deep learning for computer vision is to train a model to classify the digits from 0-9 using a handwritten MNIST dataset.

Training a model to accurately classify the digits of this dataset into ten classes (representing each digit) provides the foundation for solving the types of image classification problems used widely in deep learning for computer vision. There are some helpful tutorials online to guide you through solving the MNIST dataset, and with enough practice, you’ll get there in a few weeks. 

  1. Implement Your Models in Three Phases

Whether you opt for a framework or taking the time to create and build a model from scratch, there are three essential steps to get the best results. 

  • Training—in this step, you feed training data into the network so that it can correlate certain data characteristics with classes. The training data needs to go through pre-processing for the model to make accurate decisions and predictions.
  • Testing—in the testing phase, you begin to understand how well your model is working. You get metrics on accuracy, precision, and recall to estimate the properties of your model. There is often a validation stage just before the testing stage in which you compare different models and select the best performing one.  
  • Inference—now, you apply your model to new data to try and solve real-world problems. Depending on performance, you might be lucky enough to be able to build a deep learning for computer vision application. 
  1. Use the Internet for More Data

As you advance in deep learning for computer vision beyond the MNIST database problem, you will want to train models to classify other objects. An obstacle you might encounter is when you don’t have enough training data to get any kind of classification accuracy. 

The Internet is a useful resource for increasing datasets because you can use Bing / Google APIs to programmatically pull images. Bing’s image search API and Google’s Custom Search API both provide access to millions of images. 

  1. Don’t Forget the Black Box Problem

The black box problem of neural networks is that you cannot inspect a trained model to get insights about its behavior and decisions. While some papers have tried to dispel the idea that neural networks are still black boxes, the scope of understanding you can get from studying today’s complex networks is very limited. 

Closing Thoughts

Deep learning for computer vision is at the forefront of modern developments in A.I. These tips and best practices can serve as good starting points for people who want to dive into an exciting and complex area where many disciplines meet. 

Leave a Reply