How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment
arXiv cs.AI (Artificial Intelligence)
75 items · Knowledge Graphs, Ontologies & Intent/Semantic Modeling · site ↗
What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems
I Know What You Meme, Even If it Emerged Today: Understanding Evolving Memes through Open-World Knowledge Acquisition
GITCO: Gated Inference-Time Context Optimization in TSFMs
Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory
SentinelBench: A Benchmark for Long-Running Monitoring Agents
An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)
Synthetic Contrastive Reasoning for Multi-Table Q&A
Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges
Residual Modeling for High-Fidelity Learned Compression of Scientific Data
LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization
Harnessing Generalist Agents for Contextualized Time Series
Agents' Last Exam
Mutation Without Variation: Convergence Dynamics in LLM-Driven Program Evolution
A Motivational Architecture for Conversational AGI
Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification
Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection
Thinking Through Signs: PEEL as a Semiotic Scaffolding for Epistemically Accountable AI-Enabled Research
SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models
Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal
VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark
StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis
Can Generalist Agents Automate Data Curation?
Characterizing initial human-AI proof formalization workflows
The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents
Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline
The Digital Apprentice: A Framework for Human-Directed Agentic AI Development
Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval
Not All Errors Are Equal: Consequence-Aware Reasoning Compute Allocation
Trivium: Temporal Regret as a First-Class Objective for Causal-Memory Controllers
Visual Graph Scaffolds for Structural Reasoning in Large Language Models
AURA: Action-Gated Memory for Robot Policies at Constant VRAM
Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins
BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces
ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning
Traj-Evolve: A Self-Evolving Multi-Agent System for Patient Trajectory Modeling in Lung Cancer Early Detection
An Exploration of Collision-based Enemy Morphology Generation
Thinking Past the Answer: Evaluating Harmful Overthinking in Large Reasoning Models
Toward a Modular Architecture for Embedded AI Agent Systems at the Edge
Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems
When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning
Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks
Large AI Models in Dental Healthcare: From General-Purpose Systems to Domain-Specific Foundation Models
What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents
WISE-HAR: A Generalizable Ensemble Deep Learning Framework for WiFi-Based Human Activity Recognition
Position Paper: Post-Solve Robustness in Decision Engines: Feasible Regions and Smoothness Under Perturbations
Emergent Collaborative Deliberation in Multi-Model AI Systems: A BFT-Derived Protocol for Epistemic Synthesis
Deliberative Curation: A Protocol for Multi-Agent Knowledge Bases
Agents on a Tree: Pathwise Coordination for Multi-Objective Molecular Optimization
Optimal Transport-based Permutation-Invariant Bayesian Optimization of Offshore Wind Farm Layouts
MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution
Universal Quantum Transformer
Grokers: Bottom-Up Inductive Comprehension and Write-Time Intelligence over Typed Knowledge Graphs
Product-Aware Deep Autoencoders for Robust Process Monitoring in Multi-Product Cyber-Physical Systems
On the evolution of the concept of probability as a mirror of the evolution of reason
Evaluating Interactive Reasoning in Large Language Models: A Hierarchical Benchmark with Executable Games
A Multi-AI-agent Framework Enabling End-to-end Finite Element Analysis for Solid Mechanics Problems
CAST: Non-Privileged Clipped Asymmetric Self-Teaching with Advantage Flipping for GRPO
TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation
MindZero: Learning Online Mental Reasoning With Zero Annotations
PhyDrawGen: Physically Grounded Diagram Generation from Natural Language
Physically Viable World Models: A Case for Query-Conditioned Embodied AI
Transforming and Encoding FTS for SAT Solving: What Helps, What Hurts (Extended Version)
Procedural Generation of First Person Shooter Maps using Map-Elites
Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving
Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents
EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs
Structure-Induced Information for Rerooting Levin Tree Search
Healthcare Mechanisms from Policy-as-Code Search under Strategic Provider Response
MAVEN: Improving Generalization in Agentic Tool Calling
Generating Graph-like Rules for Knowledge Graph Reasoning via Diffusion Models
Learning Agent-Compatible Context Management for Long-Horizon Tasks
PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges
Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward
SLAT: Segment-Level Adaptive Trimming for Efficient CoT Reasoning