It’s a given that internet companies gather titbits of our private lives in exchange for free services, but how much do we really know about what happens to our personal data?
Researchers at Columbia University have warned it is a mistake to gloss over the details we reveal online and describe the web as an “opaque black box” leveraging our personal info without our knowledge or control.
Sensitive information including locations, search histories, emails, posts and photos are constantly being collected, analysed and used by web services including Google, Amazon, Facebook to target us with adverts, prices and products.
XRay - described as the "first fine-grained, robust, and scalable personal data tracking system for the web" - can reveal which data in a web account, such as emails, searches, or viewed products, is being used to target which outputs, such as ads, recommended products, or prices.
Its developers say the prototype system is a reverse-engineering machine that uses black-box correlation of data inputs and outputs to detect data use. It models the correlations made by web services and will increase awareness about how your data is being used, as well as provide tools for researchers and investigators, to keep that use under scrutiny.
XRay has been made available under an open-source licence and works with Gmail, Amazon and YouTube. In testing, XRay was used to look at the kinds of adverts shown to Gmail users based on the content of their email messages; the product recommendations Amazon shows customer based on their wish lists; and the video suggestions made by YouTube based on the videos people have viewed previously.
Roxana Geambasu, assistant professor of computer science at Columbia Engineering and co-developer of XRay, said: "If we leave it unchecked, big data’s exciting potential could become a breeding ground for data abuses, privacy vulnerabilities, and unfair or deceptive business practices. We see XRay as an important first step in exposing how websites are using your personal data".
The team describe how it is possible to target sensitive topics in an internet user's Inbox with tailored adverts - citing an example of how someone who sends a pregnancy-related email is then strongly targeted by adverts related to baby shower invitations, maternity products and general purpose clothing.
Personal data being used to tailor adverts is nothing new and perhaps you don't mind web companies knowing all about your private life. But for web users who care about which piece of their data has been targeted and when, XRay looks like a promising tool to watch.