Prepare better with the best interview questions and answers, and walk away with top interview tips. These interview questions and answers will boost your core interview skills and help you perform better. Be smarter with every interview.
It is a science of making computers understand what is happening within the image. For example, the objects within the image i.e. let’s say in the case of driverless cars, a pedestrian gets detected, lane detection, traffic signs and so on.
It is a science that allows computers to understand the images and videos and determine what the computer sees or recognizes.
It is divided into 3 basic categories:
It is applicable everywhere such as –
Computer vision allows the computers to emulate human vision which relates to image understanding. Example – object recognition, defect detection or automatic driving.
Image processing itself is a part of computer vision. It is related to enhancing the image and play with the features like colors. Example, perform smoothing, sharpening, contrasting and stretching.
Paul Viola & Michael Jones
Note: Binary value of 11111111 is equal to the decimal value of 255.
#OpenCV stores color in the BGR format.
Image features are important areas of an image that are unique to a specific image. A feature specifically is one piece of information in an image such as edges, objects that is unique.
They are important because they form the critical part in machine learning to analyze, describe and match the images. They are used to train different classifiers to detect objects such as pedestrians, cars in case of autonomous vehicles.
Our input image has a lot of extra information that is not required when performing image classification. Thereby, we extract the important information from the image, leaving out the rest. For example, running an edge detector on an image to simply it, retaining the essential info and throwing away the non- essential info. This step is called Feature extraction.
It converts an image of a fixed size to a feature vector of fixed size.
In HOG feature descriptor, the distribution(histogram) of the direction of gradients(oriented gradients) are used as features. It is a manually designed feature which debuted in 2005, which converts the pixel-based representations into a gradient based one, and are often used in linear classification techniques. Basically is based on the idea that the local object appearance can be effectively described by the distribution(Histogram) of edge directions (oriented gradients)
ROI stands for a region of interest, is the portion of the image that you want to filter or perform operations on to improve the accuracy and the performance. Example, in case of eye detection, instead of searching for the whole image, we obtain the face region alone and search for eyes.
The purpose of the image subtraction is to find absolute changes between 2 different images.
Gradients are 2D principle derivatives that indicate the change in the intensity values across the image. While edges, on the other hand, are considered to be the binary indicator of whether an edge is present, also indicates where the change is high.
Denoising means removing the noise explicitly. Image denoising can be achieved by applying a Gaussian filtering technique or wave thresholding.
Image filtering, on the other hand, is used for image enhancement, edge detection etc.
Usually, there are 3 steps in the edge detection process:
Suppress as much noise as possible without removing the edges
Highlight edges and weaken elsewhere.
Look at the maxima of the output and eliminate the spurious edges.
The operator sometimes called the Sobel-fedlman operator used within the edge detection algorithms to create image emphasizing edges. It works by calculating the gradient of the image intensity at each pixel and finds the direction of the largest increase from light to dark and rate of change in that direction.
The edge detection method can be grouped into 2 categories:
It detects the edges by looking at the minima and maxima in the first derivative of the image.
This method searches for the zero crossings in the second derivative of the image.
Suppose there’s a wine shop that purchases wine from the dealers which they will resell later. But, there are some dealers who sell fake wine as well. In this case, the shop owner should be able to distinguish between the fake and the authentic news. Where, the forger will try different techniques to sell the fake wine and make sure certain techniques go past the shop owner’s check and on the other hand, shop owner received feedback from the wine experts that some of his (dealer’s) wine is not original and would have to improve how he determines whether a wine is fake or authentic.
In a similar manner, there are 2 components of GAN:
A generator is a convolutional neural net that keeps producing the images that are closer in appearance to the real images while the discriminator tries to determine the difference between the real and fake images.
It is a popular edge detection algorithm developed by John F kanny. It includes 4 steps:
Below is an implementation of canny edge algorithm for edge detection using OpenCV:
import cv2 import numpy as np import matplotlib.pyplot as plt img = cv2.imread('abc.jpg',0) edges = cv2.Canny(img,100,200) plt.subplot(121),plt.imshow(img,cmap = 'gray') plt.title('Original Image'), plt.xticks(), plt.yticks() plt.subplot(122),plt.imshow(edges,cmap = 'gray') plt.title('Edge Image'), plt.xticks(), plt.yticks() plt.show()
Fourier transform (FT), decomposes an image into its sine and cosine components, starting at the min and the max points respectively. It is used extensively in image processing and computer vision. For example, convolution, a fundamental image processing operation, can be done much faster by using the Fast FT. When applying the FT to an image, we transform it from its spatial domain into a "frequency domain", which in essence is the image represented in terms of its variation in color and brightness over time.
In simple words, it tells you what is happening in the image in terms of the frequencies of sine and cosine components. Therefore, the output of the transformation represents the image in the frequency(Fourier) domain.
Numpy has an FFT package, providing us the frequency transform:
A kernel, convolutional matrix or mask is a small matrix that is used for blurring, sharpening, embossing, edge detection and more operations which are usually accomplished by doing a convolution (Integral of the product of 2 functions) between a kernel and an image.
Template matching is essentially required for object detection. It is a technique, where you recognize the small parts of the image matching the template image. Let say you have a football and you create a template of it. Now perform a pixel by pixel match of the template with the image to be scanned, placing template at every possible pixel. Using a similarity metric, find the pixels giving the max match, which will give you the pattern most similar to your object.
OpenCV comes with the function cv2.matchTemplate( ) for this purpose.
Hough transform is an efficient method where spatially extended patterns are transformed to produce the compact features in parameter space. It is a technique used in image processing for detecting a line in the binary images, finding the straight lines (functions) in OpenCV, where line plotted as x and y, is modeled as –
And each of the lines is represented as a single point with (m,b) coordinates or (rho, theta) parameters.
In short, this theory converts the detection problem in the image space into an easier local peak detection problem in the parameter space.
#To apply the transform, first apply the canny edge detection pre-processing
Cv2.HoughLines( ) # to detect straight lines
The idea of mathematical morphology is fixing up the picture. where we find the shape and size or the structure of the object. Here, we use the concept of structuring element.
Now, the structuring element is the mask or the window that we place on the original image to find the desired output. There are 2 main characteristics of the structuring elements:
Shape: Circular, square, rectangle, triangle
Size: varies from 3x3 to 21x21
Fundamentally, there are two basic operations that we referred to are:
It adds/expands the pixels to the boundaries of the object in an image using vector addition or subtraction. It can be used for:
It is the complete opposite to the dilation. It shrinks/removes the pixels on the object boundaries, decreases the brightness. It is used for:
Some other operations that are performed :
An operation that involves erosion followed by dilation
It involves dilation followed by an erosion.
Intuitively, the watershed is an area of high ground from where the water flows down to the river. In the case of image processing, it is simply a technique used to segment the images typically when 2 ROI (region of interest) are close to each other i.e. their edges touch. It is an image enhancement method, can think of like a possible pre-processing result to improve the results of the algorithm.
Which basically start to give us features, which can be used to match features between two images.
SIFT is termed as scale-invariant feature transform, which is a feature detector developed in 2004, by Lowe that solves the image rotation, affine transformations, intensity and viewpoint change in matching features.
It has 4 basic steps:
As the name suggests, speeded-up robust features, an algorithm which is a speeded-up version of SIFT.
It approximates the Difference of Gaussian with box filters. Instead of averaging(Gaussian) the image, squares are used for approximation since the convolution with the square is much faster if the integral image is used. It relies on the determinant of a Hessian matrix for both scale and location. For orientation assignment, it uses wavelet responses in both horizontal and vertical directions by applying adequate Gaussian weights. For feature description also SURF uses the wavelet responses
ORB is oriented FAST and Rotated BRIEF, where BRIEF is referred to as Binary Robust Independent elementary features presented as an alternative to SIFT and requires less complexity with almost similar matching performance.
The feature point detector has 2 parts: FAST & BRIEF
FAST: It finds the x,y coordinates of the points that are stable under the transformations like translation, increase and decrease in size.
BRIEF: It works as a descriptor which encodes the appearance of the point so that we can tell one feature point from other.
It is a transformation ( 3x3 matrix) that maps the points in one image to the corresponding points in the other image(warp one image on to another) or in short, it relates two images with the same camera center. For example, creating panoramas.
Let say, we have a 3x3 matrix.
And let x1, y1 be the coordinates of the first image and x2, y2 be he coordinates of the second. Then, homography relates them in the following way:
# calculate homography
H, status = cv2.findHomography(points1, points2)
Where point 1 and 2 are array of corresponding points and h being the homography matrix.
The Viola-Jones detector is a strong, binary classifier build of several weak detectors where each weak detector is an extremely simple binary classifier.
Three major contributions/phases of the algorithm are:
It is a technique widely used for tracking where you are in the world and where other things are. The objective of the Kalman filter is to minimize the mean squared error between the actual and the estimated data. It is also known as the Recursive Least Square filter which works as a Max. Likelihood function, to fit the set of model parameters to a model.
Here, each pixel coordinate (x,y) of the image contains 3 values ranging for the intensities of 0-255 (8-bit). The image is split into 3 matrices corresponding to red, green and blue (RGB). We can also come up with any other color, created by mixing intensities of RGB and so on.
Yellow - (255, 255, 0)
Orange – (255,128,0)
Pink – (255, 153,255)
import matplotlib.image as mpimg import matplotlib.pyplot as plt image = mpimg.imread(“image.jpg”) plt.imshow(image)
import matplotlib.image as mpimg import matplotlib.pyplot as plt import cv2 #OpenCV lib Gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) plt.imshow(Gray_image, cmap = ‘gray’)
Image search engines that quantify the content of an image are called CBIR systems (Content-based image retrieval systems ). It is where the image is analyzed, quantified, and stored so that similar images are returned by the system during a search. (does the search by example for you)
Let’s say we have a 64x128 image,
import numpy as np import cv2 # Python gradient calculation # Read image im = cv2.imread(‘abc.jpg’) im = np.float32(im) / 255.0 # Calculate gradient using sobel operator with kernel size 1 gx = cv2.Sobel(img, cv2.CV_32F, 1, 0, ksize=1) #Horizontal Gradient gy = cv2.Sobel(img, cv2.CV_32F, 0, 1, ksize=1) #Vertical Gradient mag, angle = cv2.cartToPolar( gx, gy, angleInDegrees = True)
It simply means to convert an image into binary format. Thresholding is done to trim the high-frequency values to be able to separate the darker and the lighter regions. The values trimmed contribute less to the overall picture, hence, retains the essential information that is required.
Three broad types are:
Where one provide the threshold value as an input constant. This threshold is applied for all pixels of the image.
It is where a threshold is not a constant scalar, rather a distribution that is applied over a small window of pixels.
It automatically calculates a threshold value from image histogram for a bimodal image.
Images are not smooth because adjacent pixels are different. And we apply Smoothing to make adjacent pixels look more similar using an average of its neighbors. Smoothing also known as blurring is an operation performed in image processing to remove the high-frequency content from the image which is done by convolving an image through a low pass filter.
Different techniques are:
Resizing the image
Shifting of the image
A transformational operation that converts one coordinate space onto another.
Is done to correct the geometric distortions/deformations that occur.
Conversion of 3D image into 2D image.
# Below is the implementation for the image translation for a shift of (220,50)
As described in the OpenCV doc,
import numpy as np import cv2 as cv img = cv.imread('messi5.jpg',0) rows,cols = img.shape M = np.float32([[1,0,220],[0,1,50]]) dst = cv.warpAffine(img,M,(cols,rows)) cv.imshow('img',dst) cv.waitKey(0) cv.destroyAllWindows()
Background subtraction is an important step in video analysis where you separate out the foreground objects from the background in a sequence of video frames.
Frame differencing is the simplest form of background subtraction where the current frame is simply subtracted from the previous frame, and if the difference in the pixel values for a given pixel is greater than the threshold Th, then that pixel is considered the part of the foreground.
Where users can manually choose the threshold, or use automatic thresholding technique
CNNs are the most powerful algorithms for image classification and analysis. They process visual info in a feed-forward manner, passing an image through the image filters which extracts certain features from the input image. These feature level representations are useful for image construction as well and form the basis for style transfer which composes images based on CNN layer activations and extracted features.
When a CNN is trained to classify an image, the convolutional layers learn to extract more and more complex features from a given image. And max-pooling layers discards the detailed spatial information alternatively (info that is irrelevant for classification task). The effect of this is, that the input image is transformed into feature maps that increasingly care about the content of the image rather than any detail about the texture or color of pixels. These later layers are sometimes called, content representation of an image.
Style can be termed as something that can be found in the brush strokes of the painting, its textures, colors, curvature and so on. To perform style transfer, we need to combine the content of one image with the style of another.
To represent the style of an image, a feature space designed to capture the texture and color information is used. This space essentially looks at spatial correlations within layers of a network. For example, is a certain color detected in one map similar to color in another map or detected edges and corners? So the similarities and the differences between the features in a layer give us some info about the texture and the color info in the image and at the same time leaves info about the actual arrangement and the identity of different objects in that image.
This is how to separate the style and the content. Now, let's see how style transfer works:
It will look at 2 different images which we call it as, content image and style image. Using a trained CNN, style transfer finds the style of one image and content of the other. And finally, it tries to merge the two to create a new third image. In this newly created image, the objects and their arrangement are taken from the content image, and color and texture from the style image.
Example of style transfer:
Features from the accelerated segmented test (FAST) algorithm was proposed by Edward Rosten and Tom Drummond in their paper ‘machine learning in high-speed corner detection’ in 2006. This algorithm is used to extract the feature points and later used to track and map the objects when performing computer vision tasks.