ETH Library 17:15 Kolloquium

Plan for tomorrow today: a model for data stewardship

November 21, 2024

Meet a data steward

Meet a data steward

I have:

  • 10+ years work experience (5 in research)
  • empathy, compassion, patience, persistance
  • an affinity for IT
  • teaching experience
  • learned how people learn

I don’t have:

  • a doctoral degree
  • a qualification in computer science
  • a qualification in statistics
  • a lot of time

10 learnings from 3 years

#1 Technology is not on our side

Meet a Professor

The Modern Academic’s Challenges

  • Overflowing email inboxes
  • Browsers with hundreds of tabs
  • Files on stored on Desktops
  • MS Teams, Slack, Element, NAS, Google Drive, …
  • Credentials, Passwords, OTPs, 2FAs, PATs, …

#2 ETH wants reproducibility

ETH RDM Guidelines

FAIR data sharing principles

FAIR data sharing principles

  • Technical in nature
  • Require data management strategy to establish workflows
  • Not a checkbox, but a process

Findable
Accessible
Interoperable
Reusable

#3 Data management is project management

GHE Student Wiki (public)

  • Grading criteria
  • Communication expectations
  • Data storage and data management guidelines
  • Presentation standards
  • Proposal and thesis writing requirements

Grading rubric & data publication

Four areas of evaluation with 31 sub-areas

  • 40/100: Research competence
  • 40/100: Thesis report
  • 10/100: Colloquium
  • 10/100: Examination

‘Data Management’ under ‘Research Competence’

6: Data is fully documented, organized, easy to reproduce, and publication ready. Everything is stored on Google Drive.

But, data publication requirement

Obtaining a 6 from all sub-areas but not publishing the data in the form of a repository will result in a maximum allowed grade of 5.75.

ETH Board Open Research Data position

ETH Board Open Research Data position

#4 Predictability wins

Structure & naming conventions

GHE Google Shared Drive

  • ghe-supervision
    • archive
    • bachelors
    • masters
      • msc-sem-proj
      • msc-thesis
        • 2024-msc-thesis-lschoebitz
    • phds

Convention

  • YYYY-degree-type-ethzid

A unqiue identifier for each student (and staff) that is used in several places.

#5 Low IT affinity is not a lack of aptitude

Safe learning environments

Growth-mindset for better learning outcomes

  • Fixed mindset: ‘I’m not good’
  • Growth mindset: ‘I can learn’

Create safe learner environments

  • Regular 1:1 research data management meetings
  • Bi-monthly half day team events
  • Yearly retreat

#6 Data != Data

Disclaimer: Data at GHE

  • small (few MBs)
  • tabular
  • non-sensitive
  • topics
    • waste management
    • sanitation
    • air quality
    • etc.

Three terms for three stages

Three terms for three stages

term explanation file format
unprocessed raw data data that is not processed and remains in its original form and file type often XLSX, also CSV and others

Three terms for three stages

term explanation file format
unprocessed raw data data that is not processed and remains in its original form and file type often XLSX, also CSV and others
processed analysis-ready data data that is processed to prepare for an analysis and is exported in its new form as a new file CSV, R data package

Three terms for three stages

term explanation file format
unprocessed raw data data that is not processed and remains in its original form and file type often XLSX, also CSV and others
processed analysis-ready data data that is processed to prepare for an analysis and is exported in its new form as a new file CSV, R data package
final data underlying a publication data that is the result of an analysis (e.g descriptive statistics or data visualization) and shown in a publication, but then also exported in its new form as a new file CSV

#7 Data management is a process, not a checkbox

#8 Findable: Publish for humans and computers

Automation from ETH Research Collection

Automation from Zenodo

Automation from GitHub

Open Source

made for collaboration

Automation from GitHub

made for humans

#9 9 to 5 is possible

Meet a professor

  • Plans for ‘tomorrow’
  • Is ready for increasing requirements
  • Dedicates financial resources to data stewardship

#10 Funding for Open Research Data exists

Funding schemes

swissuniversities

  • 2021 - 2024: swissuniversities - Open Science I
  • 2025 - 2028: swissuniversities - Open Science II (~ CHF 10 to 30 million)
  • Watch: Action Line B5.2 - Professionalisation of ORD specialists and related services
  • Newsletter sign-up: https://sympa.ethz.ch/sympa/subscribe/isci

Funding schemes

ETH Board

  • Global Health Engineering was awarded 2 Contribute and 3 Explore projects worth 500’000 CHF
  • 2021 - 2024: ~ 100 projects funded (~ CHF 10 million in total)
  • Newsletter sign-up: https://open-research-data-portal.ch/ (bottom of page)

10 take-aways from 30 minutes

  • #1 Technology is not on our side
  • #2 ETH wants reproducibility
  • #3 Data management is project management
  • #4 Predictability wins
  • #5 Low IT affinity is not a lack of aptitude
  • #6 Data != Data
  • #7 Data management is a process, not a checkbox
  • #8 Findable: Publish for humans and computers
  • #9: 9 to 5 is possible
  • #10 Funding for Open Research Data exists

Thanks! 🌻

Slides created via revealjs and Quarto: https://quarto.org/docs/presentations/revealjs/

Slide background image taken from Danielle Navarro

Access slides as PDF on GitHub

All material is licensed under Creative Commons Attribution Share Alike 4.0 International.