Shaden Shaar

PhD Candidate @ Cornell University

A PhD Candidate in Computer Science at Cornell University, advised by Prof. Claire Cardie, working on long-form generation in multi-modal settings — specifically long-form question-answering and summarization over videos. In parallel, I collaborate with NewYork-Presbyterian Hospital on clinical NLP for heart failure and heart transplant patients. Previously, at the Qatar Computing Research Institute (QCRI, HBKU), I worked with Prof. Preslav Nakov on automated fact-checking, propaganda detection, and COVID-19 misinformation.

Research Interests

Multi-modal long-form generation . Long-form question-answering and summarization over videos and other multi-modal narratives, with an emphasis on coherence across extended outputs.
Clinical NLP . Using LLMs to surface clinical decision patterns from unstructured medical narratives, with a focus on heart failure and heart transplant care.
Automated fact-checking prior . Detecting, verifying, and justifying claim veracity across multi-modal, variable-length inputs.
Propaganda detection prior . Identifying propaganda and persuasion techniques across multi-modal inputs in news articles, memes, and other media.
COVID-19 misinformation prior . Analyzing misinformation, vaccine-related fake news, and the broader infodemic on social platforms during the COVID-19 pandemic.

Highlights

h-index 24 with ~2,858 citations on Google Scholar.
Best Demo Award (Honorable Mention) at ACL 2020 for Prta, a system for analyzing propaganda techniques in news.
University Fellowship at Cornell University (2021) for exceptional preparation and promise.
University Honors at Carnegie Mellon University (2019); 50% Academic Merit Scholarship at CMU (2015); five-time Dean’s List awardee.
Co-organized multiple shared tasks at CLEF–CheckThat! (2020, 2021) and SemEval (2021) on automated fact-checking and propaganda detection.

Work Experience

May 2025 — Aug 2025
Applied Scientist Intern

Zillow Group · Remote, USA
- Built conversational assistive AI agents for Zillow's real-estate platform using reinforcement-learning methods.
- Designed reward-modeling and evaluation pipelines for grounded multi-turn property-search dialogue.
Jan 2025 — May 2025
Machine Learning Research Engineer Intern

ScaleAI, Inc. · New York City, NY, USA
- Built an Arabic-language AI assistant for the Qatari judicial department to aid judges reviewing active cases.
- Implemented dense and sparse retrieval over Arabic cassation and supreme-court rulings for relevant-precedent lookup.
- Designed Arabic case-summarization and event-extraction pipelines feeding the judge-facing platform.
May 2022 — Aug 2022
AI/ML Research Intern

Apple · Seattle, WA, USA
- Worked with Dr. Alex Churchill on zero-shot multi-turn conversation data generation.
- Built a model that generates conditioned multi-turn dialogue datasets for downstream training.
Jul 2019 — Aug 2021
Research Assistant

Qatar Computing Research Institute (QCRI), HBKU · Doha, Qatar
- Worked with Dr. Preslav Nakov and Prof. Giovanni Da San Martino on fact-checking and propaganda detection.
- Introduced the task of detecting previously fact-checked claims; methods adopted by leading fact-checking organizations.
- Built fact-checking and analysis pipelines for COVID-19 misinformation across social platforms.
- Co-organized shared tasks at CLEF–CheckThat! (2020, 2021) and SemEval–2021 Task 6 on persuasion in memes.
- Built and shipped Prta, a public propaganda-analysis demo (ACL 2020 Best Demo, Honorable Mention).

Selected Publications

See all 32 publications →

Selected recent and impactful publications. Full list on the publications page or Google Scholar.

Thematic Analysis of Accepted Exception Requests for Heart Transplant Candidates Using a Large Language Model

J. Frye, Shaden Shaar, C. Cardie, E. DeFilippis, D. Estrin, G. Sayer, N. Uriel, et al.

JHLT · 2026 · Journal

Uses an LLM to perform thematic analysis of accepted exception requests for heart transplant candidates, surfacing the clinical rationales that drive decisions in a setting where manual review at scale is infeasible.

Clinical
MovieRecapsQA: A Multimodal Open-Ended Video Question-Answering Benchmark

Shaden Shaar, B. Thymes, S. Chaixanien, C. Cardie, B. Hariharan

CVPR · 2026 · Conference

An open-ended video-QA benchmark built from movie recaps that stress-tests whether models can reason over long-form narrative, not just short clips. Paired with baselines that expose a large gap between human and model performance on grounded, cross-modal questions.

Long-form Generation
Are Triggers Needed for Document-Level Event Extraction?

Shaden Shaar, W. Chen, M. Chatterjee, B. Wang, W. Zhao, C. Cardie

TACL · 2025 · Journal · 2 citations

Revisits a long-standing assumption in event extraction — that explicit trigger annotations are required — and shows that trigger-free formulations can match or exceed trigger-based pipelines at the document level.

Events
Assisting the Human Fact-Checkers: Detecting All Previously Fact-Checked Claims in a Document

Shaden Shaar, N. Georgiev, F. Alam, G. Da San Martino, A. Mohamed, P. Nakov

EMNLP · 2022 · Findings · Conference · 45 citations

Scales fact-checked-claim detection from isolated sentences to full documents, where each claim must be located and matched jointly. Introduces a document-level dataset and retrieval+ranking system tuned for real fact-checker workflows.

Fact-checking
Prta: A System to Support the Analysis of Propaganda Techniques in the News

G. Da San Martino, Shaden Shaar, Y. Zhang, S. Yu, A. Barrón-Cedeño, P. Nakov

ACL · 2020 · Conference · 93 citations

An end-to-end system for highlighting 18 propaganda techniques in news articles, paired with a public web interface. Recognized with an Honorable Mention for Best Demo at ACL 2020.

Propaganda
ACL 2020

That Is a Known Lie: Detecting Previously Fact-Checked Claims

Shaden Shaar, G. Da San Martino, N. Babulkov, P. Nakov

ACL · 2020 · Conference · 241 citations

Formalizes "previously fact-checked claim detection" as a ranking task and releases the first dataset for it, showing that reusing existing fact-checks is a practical alternative to verifying every claim from scratch.

Fact-checking

Education

Aug. 2021 — Present

Ph.D. in Computer Science

Cornell University · Ithaca, NY, USA

Minor: Applied Mathematics.
Aug. 2021 — May 2024

M.S. in Computer Science

Cornell University · Ithaca, NY, USA
Aug. 2015 — May 2019

B.S. in Computer Science

Carnegie Mellon University · Doha, Qatar

Minor in Mathematics; University Honors

Teaching Experience

Cornell University

Fall 2022

CS 4740: Natural Language Processing
Spring 2022

CS 5780: Introduction to Machine Learning

Carnegie Mellon University

Fall 2018 & Spring 2019

11-785: Introduction to Deep Learning
Spring 2018 & Spring 2019

15-251: Great Theoretical Ideas in Computer Science
Fall 2017

15-213: Introduction to Computer Systems
Fall 2016 & Fall 2017

15-112: Fundamentals of Programming
Spring 2016

21-127: Concepts of Mathematics

Academic Services

2021

Co-organizer. SemEval–2021 Task 6: Detection of Persuasion Techniques in Texts and Images.
2021

Co-organizer. CLEF–2021 CheckThat! Lab on detecting check-worthy and previously fact-checked claims (Bucharest, Romania).
2020

Co-organizer. CLEF–2020 CheckThat! Lab on automatic identification and verification of claims.
2020

Reviewer. ACL, EMNLP, NAACL, EACL, COLING, AAAI, CIKM (Program Committee).

Contact

Happy to hear from potential collaborators or anyone curious about the work. The best way to reach me is by email: sshaar31@gmail.com. A printable CV is available as a PDF.

Shaden Shaar

Research Interests

Highlights

Work Experience

May 2025 — Aug 2025

Jan 2025 — May 2025

May 2022 — Aug 2022

Jul 2019 — Aug 2021

Selected Publications

Education

Aug. 2021 — Present

Aug. 2021 — May 2024

Aug. 2015 — May 2019

Teaching Experience

Academic Services

Contact