Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics
arXiv cs.CL (Computation and Language)
75 items · Foundation Models & Frontier AI Labs · site ↗
Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning
Improving Heart-Focused Medical Question Answering in LLMs via Variance-Aware Rubric Rewards with GRPO
Generic Triple-Latent Compression with Gated Associative Retrieval
PEFT of SLM for Telecommunications Customer Support: A Comparative Study of LoRA Configurations with Energy Consumption Analysis
MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models
Efficient Punctuation Restoration via Weighted Lookahead Scoring Method for Streaming ASR Systems
From Scoring to Explanations: Evaluating SHAP and LLM Rationales for Rubric-based Teaching Quality Assessment
Multi-Granularity Reasoning for Natural Language Inference
LANTERN: Layered Archival and Temporal Episodic Retrieval Network for Long-Context LLM Conversations
The Granularity Gap: A Multi-Dimensional Longitudinal Audit of Sycophancy in Gemini Models
LoRi: Low-Rank Distillation for Implicit Reasoning
A Model of Multi-turn Human Persuadability Using Probabilistic Belief Tracing
Self-supervised User Profile Generation for Personalization
Trajectory Dynamics in Language Model Hidden States Predict Human Processing Costs Beyond Surprisal
POLARIS: Guiding Small Models to Write Long Stories
Discourse-Role Labels as Presentation-Time Variables for Context Use in Language Models
Computational conceptual history of scientific concepts: From early digital methods to LLMs
SaliMory: Orchestrating Cognitive Memory for Conversational Agents
When Retrieval Doesn't Help: A Large-Scale Study of Biomedical RAG
Expert-Aware Refusal Steering
A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models
ACAT: A Collaborative Platform for Efficient Aspect-Based Sentiment Dataset Annotation
Cross-Prompt Generalization in Detecting AI-Generated Fake News Using Interpretable Linguistic Features
MM-BizRAG: Rethinking Multimodal Retrieval-Augmented Generation for General Purpose Enterprise Q&A
Supportive Token Revealing for Fast Diffusion Language Model Decoding
Can I Take Another Dose? Evaluating LLM Decision-Making Under Temporal Uncertainty in OTC Dosing QA
Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit
Using Text-Based Causal Inference to Disentangle Factors Influencing Online Review Ratings
LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding
IdiomX A Multilingual Benchmark for Idiom Understanding, Retrieval, and Interpretation
Greener Than Humans? Environmental Attitudes in Large Language Models
On the Persistent Effects of Lexicality in Large Language Mod
Topics as Proxies for Sociodemographics: How Conversational Context Affects LLM Answers
Do Value Vectors in Deep Layers Need Context from the Residual Stream?
Translating Classical Poetry into Modern Prose
Fixing FOLIO and MALLS: Verified Annotations and an LLM-assisted Framework to Focus Human Relabeling
Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions
Adaptive Latent Agentic Reasoning
Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States
WRIT: Write-Read Intensive Trajectory Synthesis for Multi-Turn User-Facing Agents
The Ghost Annotator: a Framework to Explore Human Label Variation in Content Moderation through Conformal Prediction
Linguistic Productivity in Large Language Models: Models Coerce, but do not Preempt
Fast-dLLM++: Fr\'{e}chet Profile Decoding for Faster Diffusion LLM Inference
EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction
DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset
Toward Robust In-Context Learning: Leveraging Out-of-distribution Proxies for Target Inaccessible Demonstration Retrieval
AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection
CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards
SENSE: Semantic Embedding Navigation with Soft-gated Evaluation for Retrieval-based Speculative Decoding
lmfaoooo at SemEval-2026 Task 1: Humor Is an Audience. Preference Modeling for Constrained Humor Generation
TrustLDM: Benchmarking Trustworthiness in Language Diffusion Models
ART: Attention Run-time Termination for Efficient Large Language Model Decoding
Cognitive-Linguistic Indicators of Depression in Online Communities: Analysed by DistilBERT and Holographic Reduced Representation
A Multi-Domain Red Teaming Framework for Safety, Robustness, and Fairness Evaluation of Medical Large Language Models
TCAR-Gen: Temporal Graph Retrieval with Evidence Fusion for Knowledge-Grounded Generation
LLMs for Cardiovascular Risk Prediction from Structured Clinical Data
Graph-Augmented Retrieval for Cross-Entity Financial Sentiment Analysis: A Comparative Study
DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models
Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why
Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow
Exploring Autonomous Agentic Data Engineering for Model Specialization
Domain Adaptation and Reasoning Frameworks in Language Models: A Controlled Experiment with Historical Cosmology
Cross-Lingual Steering for Figurative Language Generation
Can LLM Teams Play What? Where? When?
Knowledge Graph-Enhanced Zero-Shot Topic Classification: A Multi-Strategy Comparative Study
Your Multimodal Speech Model Says I Have a Face for Radio
When English Rewrites Local Knowledge: Global Narrative Dominance in Large Language Models
Configurable Reward Model for Balanced Safety Alignment
CanLegalRAGBench: Evaluating Retrieval-Augmented Generation on Canadian Case Law
Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs
Auditing LLM Benchmarks with Item Response Theory
Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs
Generalistic or Specific Embeddings, Which is Better? An Empirical Study on Search for Clinical Coding in Non-English Languages
Refining Word-Based Grammatical Error Annotation for L2 Korean