As human lives are becoming more digitised now than ever, we see new forms of data, especially social media data and streaming data, increasingly incorporated in research endeavours. While there are standard practices regarding dealing with traditional research data, no such standards exist for these human-related dynamic digital data. This calls for a new data governance framework to guide researchers or anyone working with digital data.
According to DAMA International’s Body of Knowledge, data governance is defined as "the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets". The purpose of data governance is to ensure that data is managed properly and according to policies and best practices.
Because of their human-related nature, using dynamic digital data for research purposes inevitably entails many technical, ethical, and legal challenges. Therefore, this data governance framework aims to provide researchers with some guiding principles to navigate these challenges when working with dynamic digital data.
The framework outlines the general data considerations relevant to each stage of the research data lifecycle (i.e., collection, storage, pre-processing, analysis, sharing, publishing, and archiving). In doing so, we attempt to describe the current best practices, underpinned by the FAIR and CARE principles, that are aligned with the research community’s goals and emerging directions.
The FAIR principles: data resources, tools, vocabularies, and infrastructures should be Findable, Accessible, Interoperable, and Reusable. The FAIR principles may be adhered to in any combination and incrementally - it is not an 'all or nothing' framework.
The CARE principles propose how research data are used to foster the wellbeing of people. The concerns related to the purpose of data are Collective benefit, Authority to control, Responsibility, and Ethics.
A challenge for researchers working with human-related dynamic digital data is navigating the tensions inherent in this type of data:
Respect for data subject’s privacy rights versus Compliance with platforms’ terms of service.
Methodological transparency for research excellence versus Black box algorithms run by technology platforms.
Preservation of ephemeral data (e.g., digital trace data) that reflects cultural and social legacy versus Users’ privacy rights and the right to be forgotten (this tension relates to the merits of collecting as much of these transient data because it may hold answers to research questions that are yet to be determined).
Indiscriminate collection of data (i.e., collecting all we can just because we can) versus Targeted collection of data to answer specific research questions.
Harmonization of multi-platform terms of service demands and expectations.
In addition to data governance considerations, in Australia there are also legal and ethical implications around the use of human data. For example, mishandling of personal data might lead to violation of the Privacy Act 1988. Also, depending on the geographical background(s) of the subject(s) of your research, you might have to comply with data regulations in different regions such as the General Data Protection Regulation (GDPR). As a researcher, it is advisable to be aware of this ethical and legal landscape. Read more about the different legislation and ethical guidelines that might be relevant to Australian researchers:
Good data governance requires considering data implications throughout the entire research data lifecycle: collecting, storing, analysing, sharing, and archiving data. Below are some concerns which researchers should take into account at each phase of the data lifecycle.
The research study design and planning should specify the type of analyses of the data. These analyses should consider the following:
Sharing research data is in line with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles and is often required to comply with the requirements of many research funding agencies and publishers. However, sharing of dynamic digital data such as social media data is fraught with challenges. Sharing data in ways that are legal and ethical is accepted best practice. Some of the key legal and ethical challenges are:
Other considerations are:
Archiving of research data for future use and preservation is the domain of data archives. However, for the duration of the project, the following questions should be considered:
These fact sheets contain high-level information about a number of platforms, including their APIs*, usage policy and terms of services, as well as some useful open-source tools that can be used. The fact sheets aim to be a starting point for researchers interested in working with these platforms.
*API stands for Application Programming Interface. You can learn more about APIs by watching this video or reading this article.