Public notice for Twitter and Reddit web scraping

Interdisciplinary Research Group in Socio-technical Cybersecurity

Public notice for Twitter and Reddit web scraping

scraping
Photo by Markus Spiske on Unsplash
Public notice for Twitter and Reddit web scraping

As part of the research project Deceptive patterns online (Decepticon), we, researchers at the Interdisciplinary Center of Security, Reliability and Trust (SnT) of the University of Luxembourg, are collecting social media posts that refer to dark patterns.

Dark patterns are “design choices that benefit an online service by coercing, steering or deceiving users into making decisions that, if fully informed and capable of selecting alternatives, they would not make”. For example, interface design elements can nudge users to accept privacy-invasive practices, like consent to extensive online tracking for advertising purposes.

The University of Luxembourg is the controller of the personal data that we gather and process to conduct this research project, in the meaning of the General Data Protection Regulation (GDPR). The University of Luxembourg has its official address at:
2, avenue de l’Université
L-4365 Esch-sur-Alzette
Luxembourg.

We collect social media posts via web scraping: we use tools that automatically extract information from publicly available data on social media. From Twitter, we download the tweets containing the hashtag #darkpattern or #darkpatterns. From Reddit, we download the posts published on the subreddits https://www.reddit.com/r/darkpatterns/ and https://www.reddit.com/r/assholedesign/. We use the information shared by social media users to build a database of examples of dark patterns and opinions about them. We use such database to analyze dark patterns and create solutions, like applications that can automatically detect dark patterns.

Specifically, we collect:

  • Social media posts (like tweets and Reddit posts) about dark patterns. The posts can contain personal opinions or references to consumer habits, hobbies and interests;
  • Images and screenshots of examples of dark patterns that are associated with the posts. We do not collect profile pictures;
  • The name and location of the users who share the posts and imags;
  • The date of publication of the posts.

If sensitive data (e.g. religious belief, political or philosophical views) is collected unwillingly, it will be erased in due delay and no later than 72 hours and will not be used in any way for the project.

We download personal data only for purposes of scientific research. We do not sell, commercialize or monetize the data that we gather. Our data processing is necessary for the performance of a task carried out in the public interest (this is our legal basis). The mission of the University is research and education, as described in its law.

Our data management plan follows best practices for scientific research and personal data management.

We minimize the collection of information: we only gather the minimal amount of information we really need.

We pseudonymize the information that we collect: we separate the identity of the users (like usernames) from the rest of the data (posts, images, etc.) and we replace real identities with fake names (also called pseudonyms). In this way, only we can re-identify the users to whom the data refers. In case of release of the dataset, we are going to anonymize the data, in order to protect the identity of the users who shared the data.

The data we download is stored securely on the servers and computers of the University of Luxembourg. All of our equipment is protected with passwords and only appointed researchers can access the research data. Moreover, both the data and the communications (e.g., between computers and the university’s servers) are encrypted. This increases their security because only appointed researchers with the secret code (the decryption key) can read the data.

Following the research ethics guidelines of the University of Luxembourg, we keep the data for 10 years after the project funding has come to an end or after the last publication. We must keep the data for reasons of scientific integrity: if other researchers wish to reproduce the analyses that we carry out, they must be able to use the same dataset.

We intend to share some relevant data with selected researchers, consumer organizations and supervisory authorities that investigate dark patterns, since they are often illegal in the European Union. We will share some of the data we collect with a selected community on the platform MISP. We will anonymize the data that we share, so that the users to whom the data belong are not identifiable anymore. We do not transfer the data to a third country or international organisation.

  • You have the right to know which information we hold about you (right of access, Art. 15 GDPR).
  • You have the right to ask us to rectify the data we hold about you, if incomplete or inaccurate (right to rectification, Art. 16 GDPR).
  • You can object to the processing (right to object to processing, Art. 21 GDPR), which would stop or prevent us from using your data.
  • You have the right to ask us to delete your data (right to erasure, Art. 17 GDPR).
  • Finally, you can ask us to limit how we use your data (right to restrict processing, Art. 18 GDPR).

If you have questions, doubts or complaints about the way we process your personal data, please contact the Data Protection Officer (dpo@uni.lu). She will engage with us to answer your questions. We will do our best to answer quickly and address your concerns.
If you consider that our processing of your personal data infringes the GDPR, you  have the right to lodge a complaint  with the relevant supervisory authority in Luxembourg, the Commission Nationale pour la Protection des Données.

  1. Manual analysis carried out by the members of our research group: we intend to identify interesting use cases among the dark pattern examples that we collect. The selection happens through a manual inspection of individual tweets stored in our database.
  2. Manual analysis done by the members of our MISP instance: we intend to invite a selected group of experts (academic researchers, members of consumer protection organizations and supervisory authorities) that investigate dark patterns. In 2022, we will invite them to a specific MISP instance. Members of our research group will share selected anonymized social media posts and images.
  3. Automated analysis on images of dark patterns, with the goal of training algorithms. We intend to develop machine learning techniques to enable automated recognition and classification of new instances of dark patterns found on other websites or in other databases.
  4. Automated textual analysis of social media posts to determine the feelings (negative, positive, neutral) shared by users about dark patterns.

The researchers responsible for this research activity are:

Principal investigator:
Professor Gabriele Lenzini
e-mail address: gabriele.lenzini@uni.lu
Tel. no.: (+352) 46 66 44 5778

Research associate:
Arianna Rossi
e-mail address: arianna.rossi@uni.lu
Tel. no.: (+352) 46 66 44 5791

You can also address your questions related to personal data processing to the Data Protection Officer, which is the person appointed by the University of Luxembourg for managing data protection matters:
Sandrine Munoz
e-mail address: dpo@uni.lu
Tel. no.: (+352) 46 66 44 9813

Get in touch with us

SnT – Interdisciplinary Centre for Security, Reliability and Trust
Maison du Nombre, 6, avenue de la Fonte L-4364 Esch-sur-Alzette
info-irisc-lab@uni.lu