Just three songs from a playlist can be used to identify its curator from a lineup, a new study claims.
Researchers in Israel conducted experiments to see if song choices could be linked to their curator, even without knowing their music tastes.
The researchers found undergraduate students were able to identify others from three pieces of music – but the experts are not sure how they did it.
The findings are of concern because streaming giants could potentially identify anonymised users by listening habits, which ‘constitutes a significant threat to privacy disclosure’, they say.
According to the researchers, three songs from a playlist are enough to identify the person who chose the songs. Hence, companies like YouTube and Spotify (pictured) can accumulate a great deal of information about their users based only on their music choices
SPOTIFY USERS FURIOUS AFTER UK PRICE HIKE
Spotify will increase the prices of its premium subscriptions in the UK from April 30.
Several frustrated users took to Twitter to discuss the price increase this afternoon, with one claiming that Spotify is getting ‘too big for its boots.’
A Spotify spokesperson said the service has 70 million tracks and 2.2 million podcasts, providing listeners ‘greater value than ever before.’
Spotify price hikes:
– Premium Student – was £4.99/month, increasing to £5.99/month
– Premium Duo – was £12.99/month, increasing to £13.99/month
– Premium Family – was £14.99/month, increasing to £16.99/month
The new study, published in Telematics and Informatics, was conducted by Dr Ori Leshman at Tel Aviv University and Dr Ron Hirschprung at Ariel University.
‘Music can become a form of characterisation, and even an identifier,’ the authors say.
‘It provides commercial companies like Google and Spotify with additional and more in-depth information about us as users of these platforms.
‘In the digital world we live in today, these findings have far-reaching implications on privacy violations, especially since information about people can be inferred from a completely unexpected source, which is therefore lacking in protection against such violations.’
The team’s study included about 150 young people, all of whom were undergraduate students, divided in four groups of about 35 people each.
The group members did not know each other well – they had the ‘slightest acquaintance’ – and had no prior knowledge of each other’s musical tastes.
In each group, five participants were asked to anonymously select three songs or musical pieces they liked or ‘were touched by’ from their favourite playlist.
The rest of the participants in each group had to identify these five people based on only these three songs.
‘They could see each other, but the subjects identified were separated from the group, and could not provide facial clues for example,’ Dr Hirschprung told MailOnline.
Graphical abstract from the researchers’ paper. The study shows users can be re-identified by these records even if anonymised
So participants knew what each other looked like, but they weren’t allowed to talk to each other beforehand.
The variety of music was diverse, ranging from classic rock and pop, including the Beatles, Pink Floyd, Beyonce and Ariana Grande, as well as old and new Israeli music and international hip hop, like Kendrick Lamar and Eminem.
Results showed participants were able to identify each other according to their musical taste at a very high level of between 80 and 100 per cent.
Researchers are unsure, however, how they did this – although MailOnline suggested the participants’ appearance might have been a strong factor.
In the current era we consume music mainly on-demand by streaming services like Spotify, Apple Music and YouTube (pictured), which is owned by Google
‘Defining the factors used in the identification process is a very good question which we have no answer for it at the moment,’ Dr Hirschprung said.
‘It was our hypothesis that musical selections may lead to identification, and our research proofed it.
‘In further research which we are conducting right now, we try to make more progress and address this issue.’
The researchers accounted for the probability of a successful arbitrary guess.
This, they say, ‘provides conclusive evidence that re-identification based on music selection is feasible, thus the threat is real’.
A common way to protect our identity when we use online services is to remove certain identifiers, such as name or address, from the records – something called ‘anoymisation’.
However, this approach is ‘naive’, the study authors say, as in many cases re-identification is enabled based on what they call ‘quasi-identifiers’.
In a medical record, for example, even if the dataset is anonymised, there’s a risk of re-identification of the individual based on fields like age and blood pressure.
‘In this research we examine an interesting and unexpected new quasi-identifier – music selections of an individual which represents their musical preferences,’ the authors say.
Spotify, one of the streaming giants mentioned in the study, anonymises its users’ personal data if they request it.
‘Streaming agents can cross data (with Google, Facebook) and this way achieve de-anonimisation,’ said Dr Hirschprung.
‘This technique has been widely demonstrated and used in other domains; thus, our claim that music selections are quasi identifiers is revolutionary.’
MailOnline has contacted Apple, Spotify and Google, which owns YouTube, for comment regarding the study.
ANONYMISING DATA ‘NOT ENOUGH TO PROTECT PRIVACY’
The UK General Data Protection Regulation (GDPR) does not apply to personal data that has been anonymised.
Under GDPR rules, organisations can only sell personal data by ‘anonymising’ it.
This means the principles of data protection ‘should therefore not apply to anonymous information’, according to recital 26 of the regulation.
So, making data anonymous is a valuable tool that allows data to be shared.
Data privacy laws requiring the anoymisation of a person’s data are failing to stop people being identified, a 2019 study found.
Companies now often sell anonymised data to third parties for a variety of uses, including for analytics and reviewing audience participation.
That is done by stripping the data of identifying characteristics like names and email addresses, so that individuals cannot, in theory, be identified.
After this process, the data’s no longer subject to data protection regulations, so it can be freely used and sold.
But researchers from Imperial College London and the University of Louvain in Belgium showed machine-learning could be used to reverse this process.