Hello, I am a Data Steward

Research Software Engineering and Data Stewardship Career Talk

June 17, 2025

Meet a data steward

Meet a data steward

I have:

  • 10+ years work experience (5 in research, at Eawag)
  • empathy, compassion, patience, persistence
  • an affinity for IT
  • teaching experience
  • learned how people learn

I don’t have:

  • a doctoral degree
  • a qualification in computer science
  • a qualification in statistics
  • a lot of time

8 learnings from 4 years

#1 Technology is not on our side

The Modern Academic’s Challenges

  • Overflowing email inboxes
  • Browsers with hundreds of tabs
  • Files on stored on Desktops
  • MS Teams, Slack, Element, NAS, Google Drive, …
  • Credentials, Passwords, OTPs, 2FAs, PATs, …

#2 ETH wants reproducibility

ETH RDM Guidelines

FAIR data sharing principles

FAIR data sharing principles

  • Technical in nature
  • Require data management strategy to establish workflows
  • Not a checkbox, but a process

Findable
Accessible
Interoperable
Reusable

#3 Data management is project management

GHE Student Wiki (public)

  • Grading criteria
  • Communication expectations
  • Data storage and data management guidelines
  • Presentation standards
  • Proposal and thesis writing requirements

Grading rubric & data publication

Four areas of evaluation with 31 sub-areas

  • 40/100: Research competence
  • 40/100: Thesis report
  • 10/100: Colloquium
  • 10/100: Examination

‘Data Management’ under ‘Research Competence’

6: Data is fully documented, organized, easy to reproduce, and publication ready. Everything is stored on Google Drive.

But, data publication requirement

Obtaining a 6 from all sub-areas but not publishing the data in the form of a repository will result in a maximum allowed grade of 5.75.

#4 Low IT affinity is not a lack of aptitude

Safe learning environments

Growth-mindset for better learning outcomes

  • Fixed mindset: ‘I’m not good’
  • Growth mindset: ‘I can learn’

Create safe learner environments

  • Regular 1:1 research data management meetings
  • Bi-monthly half day team events
  • Yearly retreat

#5 Data != Data

Disclaimer: Data at GHE

  • small (few MBs)
  • tabular
  • non-sensitive
  • topics
    • waste management
    • sanitation
    • air quality
    • etc.

Three terms for three stages

Three terms for three stages

term explanation file format
unprocessed raw data data that is not processed and remains in its original form and file type often XLSX, also CSV and others

Three terms for three stages

term explanation file format
unprocessed raw data data that is not processed and remains in its original form and file type often XLSX, also CSV and others
processed analysis-ready data data that is processed to prepare for an analysis and is exported in its new form as a new file CSV, R data package

Three terms for three stages

term explanation file format
unprocessed raw data data that is not processed and remains in its original form and file type often XLSX, also CSV and others
processed analysis-ready data data that is processed to prepare for an analysis and is exported in its new form as a new file CSV, R data package
final data underlying a publication data that is the result of an analysis (e.g descriptive statistics or data visualization) and shown in a publication, but then also exported in its new form as a new file CSV

#6 Data management is a process, not a checkbox

#7 Findable: Publish for humans and computers

Automation from ETH Research Collection

Automation from Zenodo

Automation from GitHub

Open Source

made for collaboration

Automation from GitHub

made for humans

#8 Funding for Open Research Data exists existed

Funding schemes

swissuniversities

Funding schemes

Open Research Data Program of the ETH Board

  • 2021 - 2024: ~ 96 projects funded (~ CHF 15 million budget in total)
  • Global Health Engineering was awarded 2 Contribute and 3 Explore projects worth 500’000 CHF
  • All 96 projects and newsletter sign-up: https://open-research-data-portal.ch/ (bottom of page)

8 take-aways from 30 minutes

  • #1 Technology is not on our side
  • #2 ETH wants reproducibility
  • #3 Data management is project management
  • #4 Low IT affinity is not a lack of aptitude
  • #5 Data != Data
  • #6 Data management is a process, not a checkbox
  • #7 Findable: Publish for humans and computers
  • #8 Funding for Open Research Data exists existed

Thanks! 🌻

Slides created via revealjs and Quarto: https://quarto.org/docs/presentations/revealjs/

Slide background image taken from Danielle Navarro

Access slides as PDF on GitHub or on https://ghe-open.ch/slides/

All material is licensed under Creative Commons Attribution Share Alike 4.0 International.

References

Massari, Nicolo, Lars Schöbitz, and Elizabeth Tilley. 2025. “Ethord: ETH Board Open Research Data (ORD) Program Project Metadata and Report Data.” https://doi.org/10.5281/zenodo.15554776.