Skip to main content

Unlocking the potential of smart cameras with deep learning

(Image credit: Intel)

An object in motion looks fundamentally different from an object at rest — especially to a computer. To get a better idea of this concept, let’s imagine a film strip of a sprinter running: The person and pose in one frame look drastically different from the next frame, right?

Making sense of dynamic objects is taking on new importance as cities begin incorporating IoT devices like smart cameras to streamline municipal life. The town of Yuma, Arizona, is a great example of this. The city recently installed cameras on streetlights that can detect when cars, bicycles, and pedestrians travel through intersections, and it uses that data to optimise signal switching.

Athena Security is pioneering another interesting application of moving-video analysis: The company sells software that uses artificial intelligence to detect when people are fighting, fleeing, or lurking to determine whether crimes are being committed (or are imminent). Unsurprisingly, everyone from municipal police departments to Fortune 500 companies is interested in this AI application.

The applications are endless for IoT devices like smart cameras that analyse moving video. Fortunately, this technology has now reached a point where almost anything is possible.

Solving mysteries in moving video

Using computers to analyse video isn’t exactly a new concept. However, there’s one problem hampering the development of video analysis: Moving video is full of dynamic variables that can confuse even the smartest computers.

Objects look completely different in low light compared to bright light, for instance, which can lead to false analyses. Perspective offers up another challenge: Think about how different a car looks when it’s traveling parallel and then perpendicular to a relative point.

Other issues that might be confusing for a machine’s analysis of video include moving shadows, complex backgrounds, obscured objects, unexpected movements, and a camera’s technical limitations. For all these reasons, moving-video analysis has historically had a lot of potential — but not too many practical applications.

That’s all changing with advances in deep learning systems, which are often referred to as neural networks. Today, computing has advanced to a point where systems can learn from past data to get better at understanding future data.

The ability to learn and adapt is crucial for computers that need to make sense of the ever-changing data coming from moving video — and different combinations of neural networks could provide a solution. With convolutional neural networks, for example, computers model space in three dimensions to better predict the trajectory of objects within that space.

Deep neural networks can help cancel out background images so cameras can focus explicitly on moving objects. There are also recurrent neural networks that excel at pattern recognition. Each of these networks has strengths and weaknesses, but using them in the right combination makes moving-video analysis highly accurate in almost any setting.

Connectivity and the future of smart cameras

My company recently worked on a project that demonstrates how far connected devices like smart cameras have come, as well as the challenges they still face. For this particular project, a client in Israel asked us to develop a program to detect kicking motions in live soccer matches televised at 20 frames per second.

From the start, the project endured two obstacles: First, our team had to distinguish between an actual “kick” and a swinging leg motion that looked quite similar. Second, we needed to do that at 20 frames per second. That’s a higher resolution than most surveillance footage, and it’s packed with much more data to analyse.

Initially, we tried developing an algorithm that would create two “bounding boxes” around both a player’s foot and the soccer ball, and then register when those two boxes met. In practice, however, detecting kicks became extremely inaccurate when players were clumped together (and we know that happens a lot in soccer).

The solution? Tweak the deep learning element. We adjusted how the underlying neural networks were configured so we could accelerate object detection. Then, we created a data set using 500 frames taken from 20 seconds of a soccer match. Our team manually annotated this data to identify kicks and “non-kicks,” and we used it to “teach” our algorithms to make that distinction.

Our program eventually identified 58 per cent of real kicks; improving the numbers was possible through feeding the program data from more matches and different sets of players.

That’s because it proved that, with the right configurations, deep learning can make sense of all the complexities within the moving video within the connected device. While achieving these ends might take a ton of reference data, the technology has proved its usefulness and has finally made moving-video analysis a reality.

This kind of technology can be applied in many areas, from IoT surveillance systems to self-driving cars. And if one thing is certain, there’s not much further to go before moving-video analysis begins transforming our lives.

Dennis Turpitka, founder and CEO, Apriorit

Dennis Turpitka is founder and CEO of Apriorit (, a software development company that provides engineering services globally to tech companies, including Fortune 500 tech giants.