Shaden Shaar
PhD Candidate @ Cornell University
A PhD Candidate in Computer Science at Cornell University, advised by Prof. Claire Cardie, working on long-form generation in multi-modal settings — specifically long-form question-answering and summarization over videos. In parallel, I collaborate with NewYork-Presbyterian Hospital on clinical NLP for heart failure and heart transplant patients. Previously, at the Qatar Computing Research Institute (QCRI, HBKU), I worked with Prof. Preslav Nakov on automated fact-checking, propaganda detection, and COVID-19 misinformation.
Research Interests
- Multi-modal long-form generation . Long-form question-answering and summarization over videos and other multi-modal narratives, with an emphasis on coherence across extended outputs.
- Clinical NLP . Using LLMs to surface clinical decision patterns from unstructured medical narratives, with a focus on heart failure and heart transplant care.
- Automated fact-checking prior . Detecting, verifying, and justifying claim veracity across multi-modal, variable-length inputs.
- Propaganda detection prior . Identifying propaganda and persuasion techniques across multi-modal inputs in news articles, memes, and other media.
- COVID-19 misinformation prior . Analyzing misinformation, vaccine-related fake news, and the broader infodemic on social platforms during the COVID-19 pandemic.
Highlights
- h-index 24 with ~2,858 citations on Google Scholar.
- Best Demo Award (Honorable Mention) at ACL 2020 for Prta, a system for analyzing propaganda techniques in news.
- University Fellowship at Cornell University (2021) for exceptional preparation and promise.
- University Honors at Carnegie Mellon University (2019); 50% Academic Merit Scholarship at CMU (2015); five-time Dean’s List awardee.
- Co-organized multiple shared tasks at CLEF–CheckThat! (2020, 2021) and SemEval (2021) on automated fact-checking and propaganda detection.
Work Experience
-
May 2025 — Aug 2025
Applied Scientist InternZillow Group · Remote, USA- Built conversational assistive AI agents for Zillow's real-estate platform using reinforcement-learning methods.
- Designed reward-modeling and evaluation pipelines for grounded multi-turn property-search dialogue.
-
Jan 2025 — May 2025
Machine Learning Research Engineer InternScaleAI, Inc. · New York City, NY, USA- Built an Arabic-language AI assistant for the Qatari judicial department to aid judges reviewing active cases.
- Implemented dense and sparse retrieval over Arabic cassation and supreme-court rulings for relevant-precedent lookup.
- Designed Arabic case-summarization and event-extraction pipelines feeding the judge-facing platform.
-
May 2022 — Aug 2022
AI/ML Research InternApple · Seattle, WA, USA- Worked with Dr. Alex Churchill on zero-shot multi-turn conversation data generation.
- Built a model that generates conditioned multi-turn dialogue datasets for downstream training.
-
Jul 2019 — Aug 2021
Research AssistantQatar Computing Research Institute (QCRI), HBKU · Doha, Qatar- Worked with Dr. Preslav Nakov and Prof. Giovanni Da San Martino on fact-checking and propaganda detection.
- Introduced the task of detecting previously fact-checked claims; methods adopted by leading fact-checking organizations.
- Built fact-checking and analysis pipelines for COVID-19 misinformation across social platforms.
- Co-organized shared tasks at CLEF–CheckThat! (2020, 2021) and SemEval–2021 Task 6 on persuasion in memes.
- Built and shipped Prta, a public propaganda-analysis demo (ACL 2020 Best Demo, Honorable Mention).
Selected Publications
See all 32 publications →Selected recent and impactful publications. Full list on the publications page or Google Scholar.
-
J. Frye, Shaden Shaar, C. Cardie, E. DeFilippis, D. Estrin, G. Sayer, N. Uriel, et al.JHLT · 2026 · JournalUses an LLM to perform thematic analysis of accepted exception requests for heart transplant candidates, surfacing the clinical rationales that drive decisions in a setting where manual review at scale is infeasible.
-
Shaden Shaar, B. Thymes, S. Chaixanien, C. Cardie, B. HariharanCVPR · 2026 · ConferenceAn open-ended video-QA benchmark built from movie recaps that stress-tests whether models can reason over long-form narrative, not just short clips. Paired with baselines that expose a large gap between human and model performance on grounded, cross-modal questions.
-
Shaden Shaar, W. Chen, M. Chatterjee, B. Wang, W. Zhao, C. CardieTACL · 2025 · Journal · 2 citationsRevisits a long-standing assumption in event extraction — that explicit trigger annotations are required — and shows that trigger-free formulations can match or exceed trigger-based pipelines at the document level.
-
Shaden Shaar, N. Georgiev, F. Alam, G. Da San Martino, A. Mohamed, P. NakovEMNLP · 2022 · Findings · Conference · 45 citationsScales fact-checked-claim detection from isolated sentences to full documents, where each claim must be located and matched jointly. Introduces a document-level dataset and retrieval+ranking system tuned for real fact-checker workflows.
-
G. Da San Martino, Shaden Shaar, Y. Zhang, S. Yu, A. Barrón-Cedeño, P. NakovACL · 2020 · Conference · 93 citationsAn end-to-end system for highlighting 18 propaganda techniques in news articles, paired with a public web interface. Recognized with an Honorable Mention for Best Demo at ACL 2020.
- ACL 2020Shaden Shaar, G. Da San Martino, N. Babulkov, P. NakovACL · 2020 · Conference · 241 citations
Formalizes "previously fact-checked claim detection" as a ranking task and releases the first dataset for it, showing that reusing existing fact-checks is a practical alternative to verifying every claim from scratch.
Education
-
Aug. 2021 — Present
Ph.D. in Computer ScienceCornell University · Ithaca, NY, USAMinor: Applied Mathematics. -
Aug. 2021 — May 2024
M.S. in Computer ScienceCornell University · Ithaca, NY, USA -
Aug. 2015 — May 2019
B.S. in Computer ScienceCarnegie Mellon University · Doha, QatarMinor in Mathematics; University Honors
Teaching Experience
Cornell University
- Fall 2022CS 4740: Natural Language Processing
- Spring 2022CS 5780: Introduction to Machine Learning
Carnegie Mellon University
- Fall 2018 & Spring 201911-785: Introduction to Deep Learning
- Spring 2018 & Spring 201915-251: Great Theoretical Ideas in Computer Science
- Fall 201715-213: Introduction to Computer Systems
- Fall 2016 & Fall 201715-112: Fundamentals of Programming
- Spring 201621-127: Concepts of Mathematics
Academic Services
- 2021Co-organizer. SemEval–2021 Task 6: Detection of Persuasion Techniques in Texts and Images.
- 2021Co-organizer. CLEF–2021 CheckThat! Lab on detecting check-worthy and previously fact-checked claims (Bucharest, Romania).
- 2020Co-organizer. CLEF–2020 CheckThat! Lab on automatic identification and verification of claims.
- 2020Reviewer. ACL, EMNLP, NAACL, EACL, COLING, AAAI, CIKM (Program Committee).
Contact
Happy to hear from potential collaborators or anyone curious about the work.
The best way to reach me is by email:
sshaar31@gmail.com.
A printable CV is available as a PDF.