TACL

Red Teaming Language Model Detectors with Language Models
Kai-Wei Chang; Zhouxing Shi; Yihan Wang; Fan Yin; Xiangning Chen; Cho-Jui Hsieh

Semantics of Multiword Expressions in Transformer-Based Models: A Survey
Filip Miletić; Sabine Schulte im Walde

ConvoSense: Overcoming Monotonous Commonsense Inferences for Conversational AI
Sarah E Finch; Jinho Choi

Revisiting Meta-evaluation for Grammatical Error Correction
Masamune Kobayashi; Masato Mita; Mamoru Komachi

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models
Ansong Ni; Pengcheng Yin; Yilun Zhao; Martin Riddell; Troy Feng; Rui Shen; Stephen Yin; Ye Liu, Semih Yavuz; Caiming Xiong; Shafiq Joty; Yingbo Zhou; Dragomir Radev; Arman Cohan

AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning
Han Zhou; Xingchen Wan; Ivan Vulić; Anna Korhonen

ARN: Analogical Reasoning on Narratives
Zhivar Sourati; Filip Ilievski; Pia Sommerauer; Yifan Jiang

Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
Anders Søgaard; Jiaang Li Li; Yova Kementchedjhieva Kementchedjhieva; Constanza Fierro

Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization
Nikolaos Aletras; Zhixue Zhao; George Chrysostomou; Miles Williams

Context-Aware Machine Translation with Source Coreference Explanation
Huy Hien Vu; Hidetaka Kamigaito; Taro Watanabe

Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design
Lindia Tjuatja; Valerie Chen; Tongshuang Wu; Ameet Talwalkar; Graham Neubig

Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding
Ukyo Honda; Tatsushi Oka; Peinan Zhang; Masato Mita

Grammatical Gender’s Influence on Distributional Semantics: A Causal Perspective
Karolina Ewa Stanczak; Kevin Du; Adina Williams; Isabelle Augenstein; Ryan Cotterell

Retrieval-Pretrained Transformer: Long-range Language Modeling with Self-retrieval
Ohad Rubin; Jonathan Berant

Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers
Melanie Subbiah; Sean Zhang; Lydia Chilton; Kathleen McKeown

Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in LLMs
Tanise Ceron; Neele Falk; Ana Barić; Dmitry Nikolaev; Sebastian Padó

Are Language Models More Like Libraries or Like Librarians? Bibliotechnism, the Novel Reference Problem, and the Attitudes of LLMs
Kyle Mahowald; Harvey Lederman

Self-supervised Topic Taxonomy Discovery in the Box Embedding Space
Yuyin Lu; Hegang Chen; Pengbo Mao; Yanghui Rao; Haoran Xie; Fu Lee Wang; Qing Li

Hierarchical Indexing for Retrieval-Augmented Opinion Summarization
Tom Hosking; Hao Tang; Mirella Lapata

SPIRIT-LM: Interleaved Spoken and Written Language Model
Tu Anh Nguyen; Benjamin Muller; Bokai Yu; Marta Costa-jussà; Maha Elbayadb; Sravya Popuri; Paul-Ambroise Duquenne; Robin Algayres; Ruslan Mavlyutov; Itai Gat; Gabriel Synnaeve; Juan Pino; Benoît Sagot; Emmanuel Dupoux; Christophe Ropers; Mary Williamson

FINCH: Key-Value Cache Compression for Large Language Model’s Semantic Memory
Giulio Corallo; Paolo Papotti

Holmes Benchmark Linguistic Knowledge in Language Models
Andreas Waldis; Yotam Perlitz; L eshemChoshen; Yufang Hou; Iryna Gurevych

Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?
Vagrant Gautam; Eileen Bingert; Dawei Zhu; Anne Lauscher; Dietrich Klakow

Conformal Prediction for Natural Language Processing: A Survey
Margarida M. Campos; António Farinhas; Chrysoula Zerva; Mário A. T. Figueiredo; André F. T. Martins

Toward Robust RALMs: Revealing the Impact of Imperfect Retrieval on Retrieval-Augmented Language Models
Jay Lee; Seongil Park

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Ryo Kamoi; Yusen Zhang; Nan Zhang; Jiawei Han; Rui Zhang

Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence
Abhinav Patil; Jaap Jumelet; Yu Ying Chiu; Andy Lapastora; Peter Shen; Lexie Wang

Investigating Critical Period Effects in Language Acquisition through Neural Language Models
Alex Wastadt; Ionut Constantinescu; Tiago Pimentel; Ryan Cotterell

Rescue Conversations from Dead-ends: Efficient Exploration for Task-oriented Dialogue Policy Optimization
Yangyang Zhao; Mehdi Dastani; Jinchuan Long; Zhenyu Wang; Shihan Wang

IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces
Fajri Koto; Rahmad Mahendra; Nurul Aisyah Timothy Baldwin

DOLOMITES: Domain-Specific Long-Form Methodical Tasks
Chaitanya Malaviya; Priyanka Agrawal; Kuzman Ganchev; Pranesh Srinivasan; Fantine Huot; Jonathan Berant; Mark Yatskar; Dipanjan Das; Mirella Lapata; Chris Alberti

CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems
Sara Rosenthal; Avirup Sil; Radu Florian; Salim Roukus