
How do you interact with the world? If you were a bird or a tetrochromat, you could see in four colors: UV, blue, green and red. If you were a snake, you could see heat. As envious as this might be, don’t count humans out. Our eyes have a whopping resolution of 576 megapixels, which is stronger than most animals and much better than your new 64 megapixel camera. This powerful sense drives us to visually scan our environment almost continually. Visual Search is what we do every day so why do we rely mainly on image search in our online experiences? What is image search and how does it differ from Visual Search?
Image search is primarily a textual or numeric based search for an image. If you ask someone to take a yellow coffee mug off the shelf, that would be seen as image search because you are using words to direct the search. One problem with this type of search is that linking similar objects can be a laborious and mostly manual process. To tag an image, by adding descriptions such as “yellow” and “coffee mug” and associate them with an image can consume enormous amounts of time. If you run a store with thousands of images that you and you want to show all coffee mugs that are yellow, then your team will have to tag each image with the required words. If you want coffee filters to show up as well, then that brings about more tagging and more complication.
From this, it is evident that we are transforming a visual journey into a textual one and information is lost on the way.
When you enter a search term such as “yellow mug” your search engine looks at the text surrounding any image and matches it with your query for a yellow mug. If the query matches, the corresponding linked image is retrieved. This is what is used by most search engines. Of course, some search engines have moved to more complicated relationship-driven search using features used to represent an image, page-rank of pages containing an image, relevance, indexed database size available with search engine and other other factors as well.
Some companies wanted to do better. If you decided to just hold up a yellow card that is the same color of yellow, that is also a type of image search. This is a rough analogy to what the innovative company, Tin Eye does. They reduce an image to a few features and look for the same. This is useful if you only want to find the same (not similar) image but this does not help you search and navigate in a fully visual manner, but is an attempt to move beyond text only.
Eventually the research took us in a new direction – Visual Search.
Visual Search is more different and richer with possibility. Here we use an image to search for another image. If you held up a yellow coffee mug from across the room and motioned to me to bring the same over to you, that is a Visual Search experience. If you then held up a tall yellow cup and a small yellow cup, I could point to my preference. This is a natural and simple way to communicate your preferences. You are able to start with one image and narrow down to the one you wish to see without using text of language.
Image search uses a type of perceptual hashtag but the real breakthrough was made by scientists who wanted to mimic what an eye does and the actual networks the human brain uses. Eventually, they tried to identify objects within images by finding the edges of images, by labeling lines and calculating probabilities using some advanced geometric modelling. It is a bit more complicated in practice but in summary, one can think of it as a pixel by pixel study of an image to identify discontinuities, which are places where the brightness of the pixels or value changes. By finding a group of pixels that should belong together you can draw a box around these pixels. This is called a bounding box and this is how many models find images.
The object in the box then needs to be named, this is where the probabilities come to work. The image in the box is compared to many other images and if it matches another image, it gets the name of the match! There is more to be said including ditching the boxes and finding the outlines of objects (segmentation) but that can be left for another post.
Pinterest or GrokStyle (acquired by Ikea) are examples of companies that have gone far in transforming Visual Search into a new type of experience. By integrating Visual Search into their offerings, they have moved users to experience the apps of these companies in a more natural and intuitive manner. This allows a user to shoot a picture with a camera or choose an image and shop the images that are shown as a result of some fantastic technology.
Point, shoot and shop. A simple but intuitive way to navigate your world! This method of engagement is powerful because it draws your customer into the virtual store in a way only possible with this new technology. Your customer may know what they want but because of the limitations of language they may not be able to find their desired product. By showing visually similar items, the shopper can naturally and iteratively wander and narrow their search to the perfect match. That is what we all want from our online experiences, a frictionless, natural and enjoyable search.
Get in touch with our team today to understand how you can utilize Visual Search within your business.
Want to keep updated on the latest happenings in AI and Programmatic? Subscribe to our updates and stay in the loop