The art, science & commerce of image search

(Image credit: Image Credit: Flickr / Hugo Chinchilla)

One of the biggest challenges I’ve faced since starting Shutterstock has been building a robust search mechanism that can scale. A system that allows a customer to search through millions of images quickly and find exactly what they are looking for.

Shutterstock’s image library of over 225 million images is now catalogued in a way that ensures every image a contributor uploads is given a title and tagged with words that describe the image. However, language is imperfect when describing images.

Even 15 years ago, I knew that the most accurate way to discover images would be if a computer could ‘see’. If it could learn what a mountain is or a cat, then it could search at light-speed through our collection to find the perfect image to meet the customers requirements.

Image search tools need to tackle this exact problem. This piece will outline key points on deep learning, the problem with language and how we taught computers to see. – something I recently spoke about at the AI Summit in New York.

What is deep learning

As a subset of AI, deep learning enables computers to learn without being explicitly programmed. The purpose of deep learning is to create a function which is designed and trained to predict something. It should help minimise cost and maximise value. By giving a machine millions of examples of questions and answers, it will be able to produce a model that will enable it to answer a question on its own, even if it has never seen that question before. It learns to answer questions that would be nearly impossible for a human to answer.

Deep learning has taken the industry by storm, changing many aspects of our lives and dominating the tech landscape.

In a Wall Street Journal article, Eric Bellman asserts that instead of typing searches and emails, ‘the next billion’ will be made from avoiding text, instead using voice activation and communicating with images. Deep learning is at the forefront of the technologies that are helping these communications become reality.

At Shutterstock our deep learning and computer vision models have allowed us to solve some of the traditional search problems.

The problem with language

At one level, the subjective interpretation of an image is a challenge. How one person may describe a football will be different to the person they are sitting next to. More keywords do not necessarily help improve the accuracy. In fact, about 65 per cent of searches on Shutterstock are just one or two keywords. This does not give us a lot to work from to find the perfect image the customer is looking for. Images may come up which are relevant to the keywords, but don’t actually look like the image the customer wants.

Keywording is also not as accurate as we would like. Sometimes, it could come down to pure translation – we have contributors from all over the world and they need to keyword their images in English, which is often their first barrier. Alternatively, the contributor may tag their images with words they know will help them sell their image, rather than words that are relevant to the image.

Words can also have multiple meanings. If you search for a ‘jaguar’, you will receive images of both cats and cars. This is a difficult problem that requires deep understanding of what the customer is actually looking for.

When a computer can ‘see’

At Shutterstock, we have developed tools using computer vision and deep learning and to make the process of searching for an image as easy and accurate as possible, some of our tools include:

  • Reverse Image Search was built using computer logic that uses pixels to identify image content and find similar images. When an image is uploaded that fits within the criteria listed, the system will return a list of visually similar images. This technology is not based on keywords or metatags; it “sees” the image and dynamically searches for others that are similar.
  • Reveal is a Google Chrome extension allowing users to select any image online and find a similar photo, vector or illustration within our collection of more than 225 million licensable and ready to use high-quality images.
  • Refine is a tool that allows users, from the first page of search results, to select those images most similar to what they are looking for, and our technology will surface other images that have a similar style and other commonalities to the selected images.
  • If you are a marketeer, you will often be looking for images which provide copy space. However, finding the perfect image can be a time-consuming task. Our tool, Copy Space, allows customers to search specifically for images with copy space to be wherever they may need it. The tool does this by measuring the amount of activity in each square of the image and only returning results that have less than a certain level of activity in the chosen squares as copy space.
  • Composition Aware Search allows customers to break down what they are looking for into a set of anchors. Each anchor specifies a query, as well as a position. For example, if you were to search for an image of a lamp and a chair, through using Composition Aware Search you are able to define further whether you would like the chair on the left, right above the chair. Each time you change the positioning of the objects, the set of images update.

At Shutterstock, we have been dedicated to providing our customers with smart, easy to use tools and technology that are seamlessly integrated into their daily workflow. We continue to invest in building an innovative platform for our users around the world and investing in computer vision research and machine learning to improve the customer experience. It's exciting to be able to allow our customers to immerse themselves into the AI technology we have been working on. I look forward to enhancing our AI capabilities and innovating further within the tech sector.

Jon Oringer founder Shutterstock
Image Credit: Flickr / Hugo Chinchilla