Hello, I am a Data Steward

ReproducibiliTea Journal Club, University of Zurich

May 28, 2026

Meet a data steward

Meet a data steward

I have:

  • 10+ years work experience (5 in research, at Eawag)
  • empathy, compassion, patience, persistence
  • an affinity for IT
  • teaching experience
  • learned how people learn

I don’t have:

  • a doctoral degree
  • a qualification in computer science
  • a qualification in statistics
  • a lot of time

8 learnings from 4 years

#1 Technology is not on our side

The Modern Academic’s Challenges

  • Overflowing email inboxes
  • Browsers with hundreds of tabs
  • Files on stored on Desktops
  • MS Teams, Slack, Element, NAS, Google Drive, …
  • Credentials, Passwords, OTPs, 2FAs, PATs, …

#2 ETH wants reproducibility

ETH RDM Guidelines (2022)

FAIR data sharing principles (2016)

Me (2022)

FAIR data sharing principles

  • Technical in nature
  • Require data management strategy to establish workflows
  • Not a checkbox, but a process

Findable
Accessible Interoperable Reusable

ETH FAIR Coalition (2025)

Newly established institution-wide initiative

  • Launched by the VPs for Research and Infrastructure
  • Coalition Charter open for researchers and staff to sign
  • Research units invited to formally join in spring 2026
  • FAIR summit and three further initiatives launching summer 2026

#3 Data management is project management

GHE Student Wiki (public)

  • Grading criteria
  • Communication expectations
  • Data storage and data management guidelines
  • Presentation standards
  • Proposal and thesis writing requirements

Grading rubric & data publication

Four areas of evaluation with 31 sub-areas

  • 40/100: Research competence
  • 40/100: Thesis report
  • 10/100: Colloquium
  • 10/100: Examination

‘Data Management’ under ‘Research Competence’

6: Data is fully documented, organized, easy to reproduce, and publication ready. Everything is stored on Google Drive.

But, data publication requirement

Obtaining a 6 from all sub-areas but not publishing the data in the form of a repository will result in a maximum allowed grade of 5.75.

#4 Low IT affinity is not a lack of aptitude

Safe learning environments

Growth-mindset for better learning outcomes

  • Fixed mindset: ‘I’m not good’
  • Growth mindset: ‘I can learn’

Create safe learner environments

  • Regular 1:1 research data management meetings
  • Bi-monthly half day team events
  • Yearly retreat

#5 Data != Data

Disclaimer: Data at GHE

  • small (few MBs)
  • tabular
  • non-sensitive
  • topics
    • waste management
    • sanitation
    • air quality
    • etc.

Three terms for three stages

Three terms for three stages

term explanation file format
unprocessed raw data data that is not processed and remains in its original form and file type often XLSX, also CSV and others

Three terms for three stages

term explanation file format
unprocessed raw data data that is not processed and remains in its original form and file type often XLSX, also CSV and others
processed analysis-ready data data that is processed to prepare for an analysis and is exported in its new form as a new file CSV, R data package

Three terms for three stages

term explanation file format
unprocessed raw data data that is not processed and remains in its original form and file type often XLSX, also CSV and others
processed analysis-ready data data that is processed to prepare for an analysis and is exported in its new form as a new file CSV, R data package
final data underlying a publication data that is the result of an analysis (e.g descriptive statistics or data visualization) and shown in a publication, but then also exported in its new form as a new file CSV

#6 Data management is a process, not a checkbox

#7 Findable: Publish for humans and computers

Automation from ETH Research Collection

Automation from Zenodo

Automation from GitHub

Open Source

made for collaboration

Automation from GitHub

made for humans

#8 Research intelligence with ghedata

Making our own work visible

ghedata: an R data package

  • Openly shared collection of GHE’s operational data
  • Supervision table, LinkedIn statistics, Zenodo metadata, GitHub usage
  • Treated together, these give us “research intelligence”, actionable insights for supervision, publishing, and Open Science investments

Why?

  • Measure the impact of embedded data stewardship
  • Inform strategic decisions
  • Practice what we preach: open by default

What we are building toward

  • Data stewards belong inside research groups, not only in central services
  • Cultural shift toward research that is open, transparent, and reproducible
  • Designing metrics for the value of data stewardship services

8 take-aways

  • #1 Technology is not on our side
  • #2 ETH wants reproducibility
  • #3 Data management is project management
  • #4 Low IT affinity is not a lack of aptitude
  • #5 Data != Data
  • #6 Data management is a process, not a checkbox
  • #7 Findable: Publish for humans and computers
  • #8 Research intelligence with ghedata

Thanks!

Slides created via revealjs and Quarto: https://quarto.org/docs/presentations/revealjs/

Slide background image taken from Danielle Navarro

Access slides as PDF on GitHub or on https://ghe-open.ch/slides/

All material is licensed under Creative Commons Attribution Share Alike 4.0 International.