ecommerce ai

Visual Search: The Future of ECommerce

Share on facebook
Share on twitter
Share on linkedin
Share on email

A whopping 56% of enterprises are using computer vision today and 95% are using or planning to use computer vision in the next year. This data point tells us how important computer vision is for a thriving enterprise. But what is visual search and how is this technology embedded in a compelling value proposition for customers? Below, we look at this ever-growing area of AI. 

How Did we Get here?

Computer vision and the subset of visual search has been around as a research topic for over a generation, but it was not until the 1990s that real usable models and their applications began to appear. Growth since then has been due to three significant advances. 

1. The raw computing power of machines now deployable on demand in scalable cloud services. 

2. NVIDIA’s parallelizable Graphics Processings Units for deep learning.

3. The creation of large, labelled, high-dimensional visual data sets such as ImageNet and Pascal.

These have prevented many instances of overfitting (fitting your conclusions to the data set and declaring victory), as the image net alone contains 14 million hand labelled images! 

What happened Next?

With hardware mitigating computational expenses, in and around 2012-2015, Convolutional Neural Networks (CNN) began to appear. This was the breakthrough needed to train a computer to recognize an object. A CNN consists of a few layers: 

1. Convolutional layer, which can be thought of as a way to scan blocks of pixels and create a map of the features of an image.

2. Max-pooling layer that tells you the relative position of a feature and gets rid of the less important information to make the model faster.

3. Fully connected layers that assign various weights and probabilities to the features. 

In the end, the network gives you the probability that your selected image matches another image. If the probability is high enough, the answer will be affirmative. This exciting ability was able to open a range of new possibilities. 

CNN Architecture: Types of Layers

How is this used in eCommerce?

Visual Search can encompass a range of activities, but we can summarize the major efforts into three basic categories:

Image Processing – enhancing an image for viewing or modifying it for further use. Not strictly speaking, visual search.

Narrative Description – the cutting edge of computer vision and still in the beginning stages of development. This may involve a conceptual description of a scene or description of an activity or the intention or behavior of an object. This is where Natural Language Processing joins computer vision.

Object Detection – extracting an object(s) from an image and labeling it, the most popular use of Visual Search. This may include 3D scene interpretation. 

Object Detection: the most popular tool for eCommerce

The research into Visual Search has spawned several variations of architecture designed to search for an object. Some are more accurate than others and some are faster. The trade-off between speed and accuracy is an important decision for any decision. 

If you are an insurance company trying to determine if the damage to a car is significant or if you are trying to determine the probability fraud for a previously submitted claim, you will be using Fast (or Faster) R-CNN architecture. If you are concerned about matching something roughly, such as a sofa or chair, you might have a customer draw a box around the object (a bounding box) and use YOLO or SSD architecture. This is less accurate and requires your user to actively participate by drawing the box, but it is significantly faster.

What if you have more than one product in an image? Image Segmentation has shown great promise for that. Image segmentation is the process of identifying pixels in meaningful or perceptually similar regions. This is currently in use for both images and videos. This method is called Mask-R-CNN, which is a modification of the Faster R-CNN architecture. This method adds a separate mask “head” to the Faster R-CNN network. The mask “head” is an additional CNN (to scan for additional features such as edges) and outputs an m x m mask (a series of really tight squares in and around an object) for each region. 

Advances have come so fast that this is now available for use in video as well. Although this is immediately recognizable as a product for use in autonomous vehicles, it can be used by stores to identify items on shelves or warehouses to analyze and optimize utilized space. An example that we have created is shown below:  

what the future beholds: endless variations

There are many applications of Visual Search yet to be explored. While only a few years ago people thought it was impossible for cars to be self-driving, we are closer than ever to this reality. The ability to identify objects in real-time are advanced enough that we can begin to imagine that this technology will be able to guide public buses along fixed routes at pre-determined safe speeds. With this, insurance companies will be called upon to calculate the safety of these new algorithms opening new jobs where none existed before.

In medicine, much of the hype and attention currently has been around detection of anomalies in tissue. Visual Search will allow doctors to find new patterns in electrical activity of brain neurons and look at minor variations in the gait and movement of athletes to inform optimal performance. Some products collecting data like Sony’s Hawk-eye will begin to be harnessed to more complex machine learning analytics. 

More prosaically Visual Search has been harnessed to deliver more accurate discovery of objects in stores. The variety and variation of products in many online shops is staggering. Etsy alone is said to have over 60 million objects for sale. Finding what you want on this type of platform can be a difficult task as users may not communicate well in English, rendering keyword search useless. 

Descriptions play a crucial role in the sale of a product (which we will discuss in a future blog) but the journey to the point where you can describe your product is primarily visual, and this can help improve customer engagement. As customers and buyers depend heavily on phones and their activity is more connected to scrolling through pictures, the need to connect your product to what a customer is viewing is critical. 

Showing similar but not identical products are one way advertising will be made better as algorithms learn how to match what is best in your catalog with what customers are interested in. 

Get in touch with our team today to understand how you can utilize Visual Search within your business. 

Subscribe for latest updates

Want to keep updated on the latest happenings in AI and Programmatic? Subscribe to our updates and stay in the loop

Drive your business growth today with delvify

Speak to our team to schedule a demo today