Skip to main content

Understanding and mitigating AI and ML risks when integrated into common household IoT

IoT
(Image credit: Image source: Shutterstock/everything possible)

There is no irony lost in the fact that smart home security cameras and video doorbells can be easily hacked. Few who watched the footage of a voice telling a child—alone in her room—that he was Santa Claus will forget the strange conversation and music the invisible intruder played while telling her she should mess up her room and break the TV. That is just one of the many terrifying scenarios that have transpired in the years since Internet of Things (IoT) home devices and appliances were introduced to consumers. As new technologies continue to emerge, safety often takes a back seat to speed-to-market timelines and budgets, which likely do not prioritize hardening security during the development cycle. But security or lack of it comes with a much bigger price tag—usually to the customer before it impacts the enterprise.

Recent research was done on a smart doorbell with a camera that offers an app through which the user controls it. As soon as someone rings the doorbell, the app sends a notification along with a photo, or in some cases, a video. The user can talk to the visitor in real time or via a pre-recorded message. This doorbell can also connect to common house control hubs. From the very beginning, insecurity issues were found. At the first connection, for instance, the device checks for firmware updates. However, even with the presence of SSL communications, it was easy to apply a "man-in-the-middle" attack. This kind of attack enabled direct access to the firmware inside the device, which was void of any digital signatures or encryption, thus making it easy to install modifications that gave full access to the device within a few days.

Even though the device architecture is cloud-based, the AI features, including video, are stored and applied on the device, which has 4GB of storage. They are encrypted using a 16-byte (128-bit) AES key that's generated during initialization. The AES key's encryption is RSA-1024, with an RSA public key. Neither of the keys is stored on the device. However, when we reversed and looked at the algorithm in use during the AES key initialization, we found a vulnerability: The srand()/rand() implements a deterministic PRNG (Pseudo Random Number Generator), which returns identical sequences of values when initialized with the same seed. Time(0) is easily guessable, as it returns the current time—and almost the same time should be present within file name and frame timestamps. All that was needed was to consecutively brute-force the initial time of recording, initialize PRNG with that value, generate an AES key, and try to decrypt data with the key to see if something meaningful appears.

Artificial intelligence and machine learning use cases increase along with risk

Despite concerns, artificial intelligence (AI) and machine learning (ML) technologies are available for a wide range of use cases to solve business problems, from managing and automating IT infrastructure to gathering new customer information, identifying and responding to cyber threats, determining medical decisions and improving hiring processes. Increasingly more integral to business processes around us,  AI-based systems usually include a combination of open-source libraries (sklearn, Tensorflow, Pytorch) and proprietary code developed by those lacking security engineering expertise. In addition, there are no industry-standard best practices for writing secure AI algorithms. Given the lack of security experts and data science experts, having someone who's an expert in both is exceedingly rare. This lack of initial security planning in a new product's design results in extensive costs when retrofitting after its release. Even when the software architecture is sound, algorithms based on data are never 100 percent accurate.

What does an error mean for a machine learning algorithm? It is not a typical problem within software engineering. If an application crashes, then it exits, and it's not possible to continue using the application to perform another task. On the other hand, an AI system can make an incorrect decision during the course of its operation and then continue running. This makes it difficult to handle immediately. For example, credit scoring issues a loan to a client who is not trustworthy. Often, the mistake is not discovered until long after it was initially made.

False positives and negatives

False positives and false negatives are considered to be "misclassification" errors within an AI algorithm. There are many real-world examples of the impact of misclassifications, particularly with facial recognition systems, which in one case matched a perpetrator with an innocent person. Usually, testing of machine learning models occurs within a static environment, with accuracy dependent upon the amount of data provided when initially training the model. There's an expectation that these models cover all of the potential real-world situations that will be encountered. But, do they really?

Attackers have an interest in causing an AI model to make incorrect decisions. Therefore, their goal is to find as many vectors as possible that generate the wrong results. Ideally, quality assurance handles, or at least minimizes, the impact of AI miscalculations to limit their impact. But it is exceedingly hard to test AI models. Often the data distribution changes, and machine learning model developers have to think about a continuous process for updating. It's even easier to make mistakes under those conditions.

For example, it is possible to fool biometric systems. User's biometric parameters should gradually update according to minor changes in their appearance, such as aging. This is a natural process that must be taken into account within the design of a good biometric system. However, an attacker benefits from being able to feed arbitrary data back into the system. An attacker can subtly influence the learning process until, eventually, the model learns to accept the appearance of the attacker.

Using cryptography correctly is key for security

Smart doorbells now have features like facial recognition. To implement this, the developers had to extend the hardware on the doorbells to support a much more resource-consuming software process. Now, these doorbells have an SD-card and sometimes even a special smart-hub that connects to the owner's internal Wi-Fi for processing all AI-features on-premises rather than in the cloud. Even if the doorbell doesn't utilize ML/AI features, it's now saving all user videos/photoshoots on its SD-card and uploading to the cloud on a regular basis. The snapshots are accessible to the smartphone application. Every 15 minutes, the doorbell connects to the API server, asks for an upload key and URL, and pushes a fresh camera snapshot to the return URL.

While the vendor in this case thought about privacy and security by choosing to implement modern cryptography with AES-128 and other military-grade standards, there's more to security than implementation. It's essential to use cryptography with connected home devices but vitally important to use it correctly. A few archetypal examples were explained eight years ago, which are still relevant today. When implemented incorrectly, hackers can decrypt data in a matter of seconds, accessing the private life of the device's owner. When done right, it's the best defense against hackers. And the best way to keep homeowners safe inside.

Timur Yunusov, Head of Offensive Security Research, Cyber R&D Lab