arXiv — Sound (cs.SD)

75 items · Game Audio Craft & Adaptive-Audio Tech · site ↗

Task-Vector Arithmetic for Emotional Expressivity Control in Language-Model-Based Text-to-Speech arXiv — Sound (cs.SD) 8h
nnAudio 2: Overcoming Dynamic Compilation Barriers and Transform Inconsistencies arXiv — Sound (cs.SD) 8h
Exploring LLMs for South Asian Music Understanding and Generation arXiv — Sound (cs.SD) 8h
Probing Spatial Structure in Pretrained Audio Representations arXiv — Sound (cs.SD) 8h
Sound Effects Dataset Unification With the Universal Category System arXiv — Sound (cs.SD) 8h
SB-RF: Schr\"odinger Bridge Rectified Flow for One-Step Robust Speech Enhancement arXiv — Sound (cs.SD) 8h
Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition arXiv — Sound (cs.SD) 8h
Do speech foundation models perceive speaker similarity as humans do? arXiv — Sound (cs.SD) 8h
SagnacAssisted Enhanced OTDR for Distributed Acoustic Sensing: A Standardized Benchmark and Engineering Evaluation Framework arXiv — Sound (cs.SD) 8h
UniVoice: A Unified Model for Speech and Singing Voice Generation arXiv — Sound (cs.SD) 8h
GLASS: GRPO-Trained LoRA for Acoustic Style Steering in Zero-Shot Text-to-Speech arXiv — Sound (cs.SD) 8h
Beyond WER: A Paired Acoustic Stress Test for Ambient Clinical Scribes arXiv — Sound (cs.SD) 8h
DBHN-Net: Dual-Branch Hybrid Neural Network For Low-Complexity Monaural Speech Enhancement arXiv — Sound (cs.SD) 8h
SpeechJBB: Probing Safety Alignment and Comprehension in Large Audio Language Models under Code-Switched Speech arXiv — Sound (cs.SD) 8h
Learning Emotion-discriminative Representations for Zero-Shot Cross-lingual Speech Emotion Recognition arXiv — Sound (cs.SD) 8h
Channel-Oriented Design for EEG-to-Music Reconstruction arXiv — Sound (cs.SD) yest
The Differentiable Auditory Loop (DAL): An ML Framework for Hyper-Personalized Hearing Aids arXiv — Sound (cs.SD) yest
Feasibility of Time-Domain DNN-Based Speech Enhancement on Embedded FPGA for Hearing Aid arXiv — Sound (cs.SD) yest
Gauss Circle Lattices with Geometric Convolutions for Synthesizing High Dimensional Image-Source Room Impulse Responses arXiv — Sound (cs.SD) yest
CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding arXiv — Sound (cs.SD) yest
A Second-Order Cepstral Signature of Contact-Vibration Sounds Reproduced by Laptop Loudspeakers: A Synthetic Case Study arXiv — Sound (cs.SD) yest
Flow-HOA: Generative Joint Optimization for Ambisonics Encoding via Flow Matching arXiv — Sound (cs.SD) yest
SHB-AE: Spherical harmonic beamforming based Ambisonics encoding and upscaling method for smartphone microphone array arXiv — Sound (cs.SD) yest
Drift-Augmented Scoring: Text-Derived Noise Robustness for Zero-Shot Audio-Language Classification arXiv — Sound (cs.SD) yest
SURF: Separation via Unsupervised Remixing Flow arXiv — Sound (cs.SD) yest
FoeGlass: Simple In-Context Learning Is Enough for Red Teaming Audio Deepfake Detectors arXiv — Sound (cs.SD) yest
Audio Interaction Model arXiv — Sound (cs.SD) yest
Beyond Text Following: Repairable Arbitration Reversals in Audio-Language Models arXiv — Sound (cs.SD) yest
DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities arXiv — Sound (cs.SD) yest
Representation Matters in Randomized Smoothing for Audio Classification arXiv — Sound (cs.SD) yest
SegTune: Structured and Fine-Grained Control for Song Generation arXiv — Sound (cs.SD) Jun 3
EntangleCodec: A Unified Discrete Audio Tokenizer via Semantic-Acoustic Entanglement arXiv — Sound (cs.SD) Jun 3
A Training-Efficient Transformer-Based Anti-Spoofing Network for Logical Access in ASVspoof 5 arXiv — Sound (cs.SD) Jun 3
Audio Spotforming via Post-Filtering Using Cross-Array Non-target Estimates arXiv — Sound (cs.SD) Jun 3
SketchSong: Hierarchical Song Generation with Sketch Planning and Fine-Grained Multi-Track Modeling arXiv — Sound (cs.SD) Jun 3
Speech Emotion Recognition using Attention-based LSTM-Network with Residual Connection arXiv — Sound (cs.SD) Jun 3
Tonal parsimony in chord-sequence analysis: combining modulation cost and tonal vocabulary arXiv — Sound (cs.SD) Jun 3
Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack Generation arXiv — Sound (cs.SD) Jun 3
LiveBand: Live Accompaniment Generation in the Audio Domain arXiv — Sound (cs.SD) Jun 3
FSA-GRPO: Teaching Auditory LLMs to Use Few-shot Demonstrations arXiv — Sound (cs.SD) Jun 3
Wavelet as Tokenizer: Preliminary Results on a Shared Wavelet Token Schema for Natural Signals arXiv — Sound (cs.SD) Jun 3
SVHalluc: Benchmarking Speech-Vision Hallucination in Audio-Visual Large Language Models arXiv — Sound (cs.SD) Jun 3
Before Fusion, Ask What to Keep: Contextual Calibration of Multimodal Signals arXiv — Sound (cs.SD) Jun 3
A Comparison of Generative and Discriminative Methods for Speech Enhancement: Robustness, Complexity, and Hallucination arXiv — Sound (cs.SD) Jun 3
AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following arXiv — Sound (cs.SD) Jun 3
DUET: Unified Dual-Space Emotion Control for Diffusion and Flow-Matching Driven Text-to-Speech arXiv — Sound (cs.SD) Jun 2
Quality Audio Prototyping: a prototype system for unified sound retrieval and procedural generation arXiv — Sound (cs.SD) Jun 2
Beyond the Mouth: Upper-Face Affective Cues in Audiovisual Sentence Recognition under Acoustic Uncertainty arXiv — Sound (cs.SD) Jun 2
Sympatheia: Emotionally Adaptive Voice Assistant with Continuous Affect Conditioning arXiv — Sound (cs.SD) Jun 2
MelT: GEMM-Native NDFT for Efficient Single-Stage Audio Frontends on Modern Accelerators arXiv — Sound (cs.SD) Jun 2
A Lightweight Slot-Attention Framework for Multi-Instrument Multi-Pitch Estimation arXiv — Sound (cs.SD) Jun 2
UniVocal: Unified Speech-Singing Code-Switching Synthesis arXiv — Sound (cs.SD) Jun 2
HAIM: Human-AI Music Datasets for AI Music Production Tracking Benchmark arXiv — Sound (cs.SD) Jun 2
JenBridge: Adaptive Long-Form Video Soundtracking across Scene Transitions arXiv — Sound (cs.SD) Jun 2
MOSS-Audio Technical Report arXiv — Sound (cs.SD) Jun 2
Echo: A Joint-Embedding Predictive Architecture for Speaker Diarization and Speech Recognition in a Shared Latent Space arXiv — Sound (cs.SD) Jun 2
C2GA: A Class-Controllable Generative Augmentation Framework for Respiratory Sound Classification arXiv — Sound (cs.SD) Jun 2
Parameter-efficient Dual-encoder Architecture with Differentiable Choquet Integral Fusion for Underwater Acoustic Classification arXiv — Sound (cs.SD) Jun 2
DAStatFormer: A Hybrid Multibranch Transformer with Statistical Feature Integration for DAS-Based Pattern Recognitions arXiv — Sound (cs.SD) Jun 2
Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection arXiv — Sound (cs.SD) Jun 2
Mental Damage: Caption Poisoning Attacks on Retrieval-Augmented Text-to-Music Generation arXiv — Sound (cs.SD) Jun 1
3DAE: Binaural Quality Assessment for Audio Novel View Synthesis with Spatial Maps and Benchmark arXiv — Sound (cs.SD) Jun 1
Chatterbox-Flash: Prior-Calibrated Block Diffusion for Streaming Zero-Shot TTS arXiv — Sound (cs.SD) Jun 1
AnchorSteer: Self-Discovered Concept Injection for Structure-Preserving Music Editing arXiv — Sound (cs.SD) Jun 1
Sound effects in media:A comparative analysis of recorded and synthetic samples in live-action and animation arXiv — Sound (cs.SD) Jun 1
MindVoice: Reconstructing Intelligible Speech from Non-invasive Neural Signals with Pretrained Priors arXiv — Sound (cs.SD) Jun 1
Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation arXiv — Sound (cs.SD) Jun 1
Escaping the Linearity Trap: Manifold Detours for Black-Box Adversarial Attacks on Singing Audio Deepfake Detection arXiv — Sound (cs.SD) Jun 1
Audio Pirates: Black-box Audio Watermark Removal via Diffusion Priors arXiv — Sound (cs.SD) Jun 1
GaMi: Geometry-Agnostic Material Identification via Cross-Modal Subtractive Disentanglement arXiv — Sound (cs.SD) Jun 1
A Unified and Reproducible Experimentation Framework for Speech Understanding arXiv — Sound (cs.SD) Jun 1
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer arXiv — Sound (cs.SD) Jun 1
DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs arXiv — Sound (cs.SD) Jun 1
Scaling Conversational Hungarian ASR: The BEA-Dialogue+ Corpus arXiv — Sound (cs.SD) Jun 1
UniAudio-Token: Empowering Semantic Speech Tokenizers with General Audio Perception arXiv — Sound (cs.SD) Jun 1

Keyboard

j / k
move between items
Space
expand / collapse
o
open original
s
save / unsave
m
mark read
/
focus search
?
this help