arXiv — Sound (cs.SD)

75 items · Game Audio Craft & Adaptive-Audio Tech · site ↗

Task-Vector Arithmetic for Emotional Expressivity Control in Language-Model-Based Text-to-Speech

arXiv — Sound (cs.SD) 8h

nnAudio 2: Overcoming Dynamic Compilation Barriers and Transform Inconsistencies

arXiv — Sound (cs.SD) 8h

Exploring LLMs for South Asian Music Understanding and Generation

arXiv — Sound (cs.SD) 8h

Probing Spatial Structure in Pretrained Audio Representations

arXiv — Sound (cs.SD) 8h

Sound Effects Dataset Unification With the Universal Category System

arXiv — Sound (cs.SD) 8h

SB-RF: Schr\"odinger Bridge Rectified Flow for One-Step Robust Speech Enhancement

arXiv — Sound (cs.SD) 8h

Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

arXiv — Sound (cs.SD) 8h

Do speech foundation models perceive speaker similarity as humans do?

arXiv — Sound (cs.SD) 8h

SagnacAssisted Enhanced OTDR for Distributed Acoustic Sensing: A Standardized Benchmark and Engineering Evaluation Framework

arXiv — Sound (cs.SD) 8h

UniVoice: A Unified Model for Speech and Singing Voice Generation

arXiv — Sound (cs.SD) 8h

GLASS: GRPO-Trained LoRA for Acoustic Style Steering in Zero-Shot Text-to-Speech

arXiv — Sound (cs.SD) 8h

Beyond WER: A Paired Acoustic Stress Test for Ambient Clinical Scribes

arXiv — Sound (cs.SD) 8h

DBHN-Net: Dual-Branch Hybrid Neural Network For Low-Complexity Monaural Speech Enhancement

arXiv — Sound (cs.SD) 8h

SpeechJBB: Probing Safety Alignment and Comprehension in Large Audio Language Models under Code-Switched Speech

arXiv — Sound (cs.SD) 8h

Learning Emotion-discriminative Representations for Zero-Shot Cross-lingual Speech Emotion Recognition

arXiv — Sound (cs.SD) 8h

Channel-Oriented Design for EEG-to-Music Reconstruction

arXiv — Sound (cs.SD) yest

The Differentiable Auditory Loop (DAL): An ML Framework for Hyper-Personalized Hearing Aids

arXiv — Sound (cs.SD) yest

Feasibility of Time-Domain DNN-Based Speech Enhancement on Embedded FPGA for Hearing Aid

arXiv — Sound (cs.SD) yest

Gauss Circle Lattices with Geometric Convolutions for Synthesizing High Dimensional Image-Source Room Impulse Responses

arXiv — Sound (cs.SD) yest

CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding

arXiv — Sound (cs.SD) yest

A Second-Order Cepstral Signature of Contact-Vibration Sounds Reproduced by Laptop Loudspeakers: A Synthetic Case Study

arXiv — Sound (cs.SD) yest

Flow-HOA: Generative Joint Optimization for Ambisonics Encoding via Flow Matching

arXiv — Sound (cs.SD) yest

SHB-AE: Spherical harmonic beamforming based Ambisonics encoding and upscaling method for smartphone microphone array

arXiv — Sound (cs.SD) yest

Drift-Augmented Scoring: Text-Derived Noise Robustness for Zero-Shot Audio-Language Classification

arXiv — Sound (cs.SD) yest

SURF: Separation via Unsupervised Remixing Flow

arXiv — Sound (cs.SD) yest

FoeGlass: Simple In-Context Learning Is Enough for Red Teaming Audio Deepfake Detectors

arXiv — Sound (cs.SD) yest

Audio Interaction Model

arXiv — Sound (cs.SD) yest

Beyond Text Following: Repairable Arbitration Reversals in Audio-Language Models

arXiv — Sound (cs.SD) yest

DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities

arXiv — Sound (cs.SD) yest

Representation Matters in Randomized Smoothing for Audio Classification

arXiv — Sound (cs.SD) yest

SegTune: Structured and Fine-Grained Control for Song Generation

arXiv — Sound (cs.SD) Jun 3

EntangleCodec: A Unified Discrete Audio Tokenizer via Semantic-Acoustic Entanglement

arXiv — Sound (cs.SD) Jun 3

A Training-Efficient Transformer-Based Anti-Spoofing Network for Logical Access in ASVspoof 5

arXiv — Sound (cs.SD) Jun 3

Audio Spotforming via Post-Filtering Using Cross-Array Non-target Estimates

arXiv — Sound (cs.SD) Jun 3

SketchSong: Hierarchical Song Generation with Sketch Planning and Fine-Grained Multi-Track Modeling

arXiv — Sound (cs.SD) Jun 3

Speech Emotion Recognition using Attention-based LSTM-Network with Residual Connection

arXiv — Sound (cs.SD) Jun 3

Tonal parsimony in chord-sequence analysis: combining modulation cost and tonal vocabulary

arXiv — Sound (cs.SD) Jun 3

Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack Generation

arXiv — Sound (cs.SD) Jun 3

LiveBand: Live Accompaniment Generation in the Audio Domain

arXiv — Sound (cs.SD) Jun 3

FSA-GRPO: Teaching Auditory LLMs to Use Few-shot Demonstrations

arXiv — Sound (cs.SD) Jun 3

Wavelet as Tokenizer: Preliminary Results on a Shared Wavelet Token Schema for Natural Signals

arXiv — Sound (cs.SD) Jun 3

SVHalluc: Benchmarking Speech-Vision Hallucination in Audio-Visual Large Language Models

arXiv — Sound (cs.SD) Jun 3

Before Fusion, Ask What to Keep: Contextual Calibration of Multimodal Signals

arXiv — Sound (cs.SD) Jun 3

A Comparison of Generative and Discriminative Methods for Speech Enhancement: Robustness, Complexity, and Hallucination

arXiv — Sound (cs.SD) Jun 3

AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following

arXiv — Sound (cs.SD) Jun 3

DUET: Unified Dual-Space Emotion Control for Diffusion and Flow-Matching Driven Text-to-Speech

arXiv — Sound (cs.SD) Jun 2

Quality Audio Prototyping: a prototype system for unified sound retrieval and procedural generation

arXiv — Sound (cs.SD) Jun 2

Beyond the Mouth: Upper-Face Affective Cues in Audiovisual Sentence Recognition under Acoustic Uncertainty

arXiv — Sound (cs.SD) Jun 2

Sympatheia: Emotionally Adaptive Voice Assistant with Continuous Affect Conditioning

arXiv — Sound (cs.SD) Jun 2

MelT: GEMM-Native NDFT for Efficient Single-Stage Audio Frontends on Modern Accelerators

arXiv — Sound (cs.SD) Jun 2

A Lightweight Slot-Attention Framework for Multi-Instrument Multi-Pitch Estimation

arXiv — Sound (cs.SD) Jun 2

UniVocal: Unified Speech-Singing Code-Switching Synthesis

arXiv — Sound (cs.SD) Jun 2

HAIM: Human-AI Music Datasets for AI Music Production Tracking Benchmark

arXiv — Sound (cs.SD) Jun 2

JenBridge: Adaptive Long-Form Video Soundtracking across Scene Transitions

arXiv — Sound (cs.SD) Jun 2

MOSS-Audio Technical Report

arXiv — Sound (cs.SD) Jun 2

Echo: A Joint-Embedding Predictive Architecture for Speaker Diarization and Speech Recognition in a Shared Latent Space

arXiv — Sound (cs.SD) Jun 2

C2GA: A Class-Controllable Generative Augmentation Framework for Respiratory Sound Classification

arXiv — Sound (cs.SD) Jun 2

Parameter-efficient Dual-encoder Architecture with Differentiable Choquet Integral Fusion for Underwater Acoustic Classification

arXiv — Sound (cs.SD) Jun 2

DAStatFormer: A Hybrid Multibranch Transformer with Statistical Feature Integration for DAS-Based Pattern Recognitions

arXiv — Sound (cs.SD) Jun 2

Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection

arXiv — Sound (cs.SD) Jun 2

Mental Damage: Caption Poisoning Attacks on Retrieval-Augmented Text-to-Music Generation

arXiv — Sound (cs.SD) Jun 1

3DAE: Binaural Quality Assessment for Audio Novel View Synthesis with Spatial Maps and Benchmark

arXiv — Sound (cs.SD) Jun 1

Chatterbox-Flash: Prior-Calibrated Block Diffusion for Streaming Zero-Shot TTS

arXiv — Sound (cs.SD) Jun 1

AnchorSteer: Self-Discovered Concept Injection for Structure-Preserving Music Editing

arXiv — Sound (cs.SD) Jun 1

Sound effects in media:A comparative analysis of recorded and synthetic samples in live-action and animation

arXiv — Sound (cs.SD) Jun 1

MindVoice: Reconstructing Intelligible Speech from Non-invasive Neural Signals with Pretrained Priors

arXiv — Sound (cs.SD) Jun 1

Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation

arXiv — Sound (cs.SD) Jun 1

Escaping the Linearity Trap: Manifold Detours for Black-Box Adversarial Attacks on Singing Audio Deepfake Detection

arXiv — Sound (cs.SD) Jun 1

Audio Pirates: Black-box Audio Watermark Removal via Diffusion Priors

arXiv — Sound (cs.SD) Jun 1

GaMi: Geometry-Agnostic Material Identification via Cross-Modal Subtractive Disentanglement

arXiv — Sound (cs.SD) Jun 1

A Unified and Reproducible Experimentation Framework for Speech Understanding

arXiv — Sound (cs.SD) Jun 1

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

arXiv — Sound (cs.SD) Jun 1

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs

arXiv — Sound (cs.SD) Jun 1

Scaling Conversational Hungarian ASR: The BEA-Dialogue+ Corpus

arXiv — Sound (cs.SD) Jun 1

UniAudio-Token: Empowering Semantic Speech Tokenizers with General Audio Perception

arXiv — Sound (cs.SD) Jun 1

arXiv — Sound (cs.SD)

Keyboard