arXiv eess.AS (Audio and Speech Processing)

75 items · Generative Audio & Music Models · site ↗

Age-Aware Adapter Tuning for Children's Speech Recognition arXiv eess.AS (Audio and Speech Processing) 8h
Enhancing Audio Captioning with Auxiliary AudioSet Semantics arXiv eess.AS (Audio and Speech Processing) 8h
M2S-AVSR: Modality-aware Multi-view Self-supervised Representation for Robust Audio-Visual Speech Recognition arXiv eess.AS (Audio and Speech Processing) 8h
An Ultra-Low-Bitrate Neural Speech Codec with Plain-to-Pseudo Synergistic Vector Quantization arXiv eess.AS (Audio and Speech Processing) 8h
VoCodec: A Low-bitrate Streamable Neural Speech Codec with Voicing-driven Quantization arXiv eess.AS (Audio and Speech Processing) 8h
CoSTA: Cognitive-State-Conditioned TTS Data Augmentation Using ASR Transcripts for Alzheimer's Disease Detection arXiv eess.AS (Audio and Speech Processing) 8h
Revisiting Lexicon Evaluation in Unsupervised Word Discovery arXiv eess.AS (Audio and Speech Processing) 8h
USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding arXiv eess.AS (Audio and Speech Processing) 8h
MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models arXiv eess.AS (Audio and Speech Processing) 8h
Task-Vector Arithmetic for Emotional Expressivity Control in Language-Model-Based Text-to-Speech arXiv eess.AS (Audio and Speech Processing) 8h
nnAudio 2: Overcoming Dynamic Compilation Barriers and Transform Inconsistencies arXiv eess.AS (Audio and Speech Processing) 8h
Exploring LLMs for South Asian Music Understanding and Generation arXiv eess.AS (Audio and Speech Processing) 8h
Probing Spatial Structure in Pretrained Audio Representations arXiv eess.AS (Audio and Speech Processing) 8h
Domain-Aware Mispronunciation Detection and Diagnosis Using Language-Specific Statistical Graphs arXiv eess.AS (Audio and Speech Processing) 8h
Sound Effects Dataset Unification With the Universal Category System arXiv eess.AS (Audio and Speech Processing) 8h
Representation Matters in Randomized Smoothing for Audio Classification arXiv eess.AS (Audio and Speech Processing) yest
Masked Wavelet Scattering Transform Neural Field for Sound Field Reconstruction arXiv eess.AS (Audio and Speech Processing) yest
Read What You Hear: Reference-Free Hypotheses Evaluation with Acoustic Discrepancy arXiv eess.AS (Audio and Speech Processing) yest
UAT: Unified Audio-Text Diffusion for Audio Generation, Editing, and Captioning arXiv eess.AS (Audio and Speech Processing) yest
Differentiable Articulatory Copy-Synthesis of Biphonic Singing arXiv eess.AS (Audio and Speech Processing) yest
Channel-Oriented Design for EEG-to-Music Reconstruction arXiv eess.AS (Audio and Speech Processing) yest
The Differentiable Auditory Loop (DAL): An ML Framework for Hyper-Personalized Hearing Aids arXiv eess.AS (Audio and Speech Processing) yest
Feasibility of Time-Domain DNN-Based Speech Enhancement on Embedded FPGA for Hearing Aid arXiv eess.AS (Audio and Speech Processing) yest
Gauss Circle Lattices with Geometric Convolutions for Synthesizing High Dimensional Image-Source Room Impulse Responses arXiv eess.AS (Audio and Speech Processing) yest
CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding arXiv eess.AS (Audio and Speech Processing) yest
Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention arXiv eess.AS (Audio and Speech Processing) yest
Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026 arXiv eess.AS (Audio and Speech Processing) yest
SURF: Separation via Unsupervised Remixing Flow arXiv eess.AS (Audio and Speech Processing) yest
Audio Interaction Model arXiv eess.AS (Audio and Speech Processing) yest
A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References arXiv eess.AS (Audio and Speech Processing) yest
FSA-GRPO: Teaching Auditory LLMs to Use Few-shot Demonstrations arXiv eess.AS (Audio and Speech Processing) Jun 3
Wavelet as Tokenizer: Preliminary Results on a Shared Wavelet Token Schema for Natural Signals arXiv eess.AS (Audio and Speech Processing) Jun 3
SVHalluc: Benchmarking Speech-Vision Hallucination in Audio-Visual Large Language Models arXiv eess.AS (Audio and Speech Processing) Jun 3
A Comparison of Generative and Discriminative Methods for Speech Enhancement: Robustness, Complexity, and Hallucination arXiv eess.AS (Audio and Speech Processing) Jun 3
AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following arXiv eess.AS (Audio and Speech Processing) Jun 3
SpeakerCard-1M: An Evidence-Grounded Speaker Card Corpus for In-the-Wild Speaker Verification arXiv eess.AS (Audio and Speech Processing) Jun 3
WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling arXiv eess.AS (Audio and Speech Processing) Jun 3
Stable Hybrid Cross-Attention Fusion for Audio-Visual Event Recognition arXiv eess.AS (Audio and Speech Processing) Jun 3
In-the-Loop Training of Deep Feedback Cancellation for Hearing Aids arXiv eess.AS (Audio and Speech Processing) Jun 3
SegTune: Structured and Fine-Grained Control for Song Generation arXiv eess.AS (Audio and Speech Processing) Jun 3
Before Fusion, Ask What to Keep: Contextual Calibration of Multimodal Signals arXiv eess.AS (Audio and Speech Processing) Jun 3
EntangleCodec: A Unified Discrete Audio Tokenizer via Semantic-Acoustic Entanglement arXiv eess.AS (Audio and Speech Processing) Jun 3
CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning arXiv eess.AS (Audio and Speech Processing) Jun 3
Inference-Time Scaling for Joint Audio-Video Generation arXiv eess.AS (Audio and Speech Processing) Jun 3
Benchmarking Speech-to-Speech Translation Models arXiv eess.AS (Audio and Speech Processing) Jun 3
Privacy-preserving Prosody Representation Learning arXiv eess.AS (Audio and Speech Processing) Jun 2
Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection arXiv eess.AS (Audio and Speech Processing) Jun 2
Context-aware child-directed speech detection from long-form recordings arXiv eess.AS (Audio and Speech Processing) Jun 2
Description and Discussion on DCASE 2026 Challenge Task 2: Noise-aware Unsupervised Anomalous Sound Detection for Machine Condition Monitoring arXiv eess.AS (Audio and Speech Processing) Jun 2
RRP-Voice: A Longitudinal Dataset and Benchmark for Recurrent Respiratory Papillomatosis Detection arXiv eess.AS (Audio and Speech Processing) Jun 2
Kinship Verification Using Voice arXiv eess.AS (Audio and Speech Processing) Jun 2
SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing arXiv eess.AS (Audio and Speech Processing) Jun 2
Advancing Electrolaryngeal Speech Enhancement Through Speech-Text Representation Learning arXiv eess.AS (Audio and Speech Processing) Jun 2
Localizing broadband noise sources using the Lo\`eve spectrum and a 2.5D approach arXiv eess.AS (Audio and Speech Processing) Jun 2
Domain-Agnostic Incremental Learning for Sound Classification. A DCASE 2026 Challenge task arXiv eess.AS (Audio and Speech Processing) Jun 2
Breaking the Pair: Evaluating Dyadic Interaction via Speaker Switching arXiv eess.AS (Audio and Speech Processing) Jun 2
SiamCTC: Learning Speech Representations through Monotonic Temporal Alignment arXiv eess.AS (Audio and Speech Processing) Jun 2
Exploiting Noise Inseparability for Weakly-Supervised Discriminative Speech Denoising Using Noisy Targets arXiv eess.AS (Audio and Speech Processing) Jun 2
SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription arXiv eess.AS (Audio and Speech Processing) Jun 2
DUET: Unified Dual-Space Emotion Control for Diffusion and Flow-Matching Driven Text-to-Speech arXiv eess.AS (Audio and Speech Processing) Jun 2
Extracting accent features in spoken Brazilian Portuguese without sociolinguistic labels arXiv eess.AS (Audio and Speech Processing) Jun 1
FiPA-SR -- FiLM-Conditioned Perceptually Informed Audio Super-Resolution arXiv eess.AS (Audio and Speech Processing) Jun 1
OpenSTBench: Beyond Semantic Evaluation for Speech Translation arXiv eess.AS (Audio and Speech Processing) Jun 1
A Unified and Reproducible Experimentation Framework for Speech Understanding arXiv eess.AS (Audio and Speech Processing) Jun 1
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer arXiv eess.AS (Audio and Speech Processing) Jun 1
ImmersiveTTS: Environment-Aware Text-to-Speech with Multimodal Diffusion Transformer and Domain-Specific Representation Alignment arXiv eess.AS (Audio and Speech Processing) Jun 1
SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue arXiv eess.AS (Audio and Speech Processing) Jun 1
On the Use of Dereverberation for Acoustic Feedback Cancellation arXiv eess.AS (Audio and Speech Processing) Jun 1
Improving acoustic drone detection generalization through pretraining and data augmentation arXiv eess.AS (Audio and Speech Processing) Jun 1
UNISON: A Unified Sound Generation and Editing Framework via Deep LLM Fusion arXiv eess.AS (Audio and Speech Processing) Jun 1
Mental Damage: Caption Poisoning Attacks on Retrieval-Augmented Text-to-Music Generation arXiv eess.AS (Audio and Speech Processing) Jun 1
Escaping the Linearity Trap: Manifold Detours for Black-Box Adversarial Attacks on Singing Audio Deepfake Detection arXiv eess.AS (Audio and Speech Processing) Jun 1
Chatterbox-Flash: Prior-Calibrated Block Diffusion for Streaming Zero-Shot TTS arXiv eess.AS (Audio and Speech Processing) Jun 1
Scaling Conversational Hungarian ASR: The BEA-Dialogue+ Corpus arXiv eess.AS (Audio and Speech Processing) Jun 1
Acoustic Simulation Framework for Multi-channel Replay Speech Detection arXiv eess.AS (Audio and Speech Processing) Jun 1

Keyboard

j / k
move between items
Space
expand / collapse
o
open original
s
save / unsave
m
mark read
/
focus search
?
this help