Tom Sühr is a doctoral candidate at the Max Planck Institute for Intelligent Systems in the Human Aspects of Machine Learning group lead by Samira Samadi. He completed his Msc. in Information Systems Management at TU Berlin while being a research assistant at Harvard Business School. He is currently working on research projects with Harvard University and the University of Tübingen. During his undergraduate studies he was a research intern at the Max Planck Institute for Software Systems advised by Krishna Gummadi and Asia Biega as well as a student researcher at the chair of complex distributed systems (CIT) advised by Meike Zehlike and Carlos Castillo. He is broadly interested in machine learning and human-AI collaboration.

Currently thinking about

  • Dynamic systems of humans and AI
  • Latent traits of large language models
  • Performative prediction
  • Human-AI collaboration with label noise
  • Human-AI collaboration with unidentifiable ground truth
  • Decision-making for ML deployment/training 
  • Measurement theory for benchmarks
  • Psychometrics/latent variable modelling for AI
  • Item response theory and test theory for benchmarks


  • New preprint “A Dynamic Model of Performative Human-ML Collaboration: Theory and Empirical Evidence” available on arxiv and here. This was joint work with my advisor Samira Samadi and my co-advisor Chiara Farronato.
  • New preprint “Challenging the Validity of Personality Tests for LLMs“available on arxiv and here. This was joint work with Florian Dorner, Samira Samadi and Augustin Kelava.
  • I updated the arxiv version of our workshop paper “Do personality Tests generalize to Large Language Models?” with the extended full paper version. You can still download the workshop paper version here.
  • I started a “News” section 😉


Tom Sühr, Samira Samadi, Chiara Farronato 
In submission
Tom Sühr, Florian E. Dorner, Samira Samadi, Augustin Kelava
In submission
Do personality Tests generalize to Large Language Models?
Florian E. Dorner, Tom Sühr, Samira Samadi, Augustin Kelava
Socially Responsible Language Modelling Research (SoLaR) 2023 at NeurIPS 2023
Does fair ranking improve minority outcomes? understanding the interplay of human and algorithmic biases in online hiring
Tom Sühr, Sophie Hilgard, Himabindu Lakkaraju
Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 989-999

Fair Top-k Ranking with multiple protected groups
Meike Zehlike, Tom Sühr, Ricardo Baeza-Yates, Francesco Bonchi, Carlos Castillo, Sara Hajian
Information Processing & Management 59 (1), 102707

A Note on the Significance Adjustment for FA* IR with Two Protected Groups
Meike Zehlike, Tom Sühr, Carlos Castillo
arXiv preprint arXiv:2012.12795

Fairsearch: A tool for fairness in ranked search results
Meike Zehlike, Tom Sühr, Carlos Castillo, Ivan Kitanovski
Companion Proceedings of the Web Conference 2020, 172-175

Two-sided fairness for repeated matchings in two-sided markets: A case study of a ride-hailing platform
Tom Sühr, Asia J. Biega, Meike Zehlike, Krishna P. Gummadi, Abhijnan Chakraborty
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining