arXiv eess.AS (Audio and Speech Processing)

75 items · Generative Audio & Music Models · site ↗

Age-Aware Adapter Tuning for Children's Speech Recognition

arXiv eess.AS (Audio and Speech Processing) 8h

Enhancing Audio Captioning with Auxiliary AudioSet Semantics

arXiv eess.AS (Audio and Speech Processing) 8h

M2S-AVSR: Modality-aware Multi-view Self-supervised Representation for Robust Audio-Visual Speech Recognition

arXiv eess.AS (Audio and Speech Processing) 8h

An Ultra-Low-Bitrate Neural Speech Codec with Plain-to-Pseudo Synergistic Vector Quantization

arXiv eess.AS (Audio and Speech Processing) 8h

VoCodec: A Low-bitrate Streamable Neural Speech Codec with Voicing-driven Quantization

arXiv eess.AS (Audio and Speech Processing) 8h

CoSTA: Cognitive-State-Conditioned TTS Data Augmentation Using ASR Transcripts for Alzheimer's Disease Detection

arXiv eess.AS (Audio and Speech Processing) 8h

Revisiting Lexicon Evaluation in Unsupervised Word Discovery

arXiv eess.AS (Audio and Speech Processing) 8h

USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding

arXiv eess.AS (Audio and Speech Processing) 8h

MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models

arXiv eess.AS (Audio and Speech Processing) 8h

Task-Vector Arithmetic for Emotional Expressivity Control in Language-Model-Based Text-to-Speech

arXiv eess.AS (Audio and Speech Processing) 8h

nnAudio 2: Overcoming Dynamic Compilation Barriers and Transform Inconsistencies

arXiv eess.AS (Audio and Speech Processing) 8h

Exploring LLMs for South Asian Music Understanding and Generation

arXiv eess.AS (Audio and Speech Processing) 8h

Probing Spatial Structure in Pretrained Audio Representations

arXiv eess.AS (Audio and Speech Processing) 8h

Domain-Aware Mispronunciation Detection and Diagnosis Using Language-Specific Statistical Graphs

arXiv eess.AS (Audio and Speech Processing) 8h

Sound Effects Dataset Unification With the Universal Category System

arXiv eess.AS (Audio and Speech Processing) 8h

Representation Matters in Randomized Smoothing for Audio Classification

arXiv eess.AS (Audio and Speech Processing) yest

Masked Wavelet Scattering Transform Neural Field for Sound Field Reconstruction

arXiv eess.AS (Audio and Speech Processing) yest

Read What You Hear: Reference-Free Hypotheses Evaluation with Acoustic Discrepancy

arXiv eess.AS (Audio and Speech Processing) yest

UAT: Unified Audio-Text Diffusion for Audio Generation, Editing, and Captioning

arXiv eess.AS (Audio and Speech Processing) yest

Differentiable Articulatory Copy-Synthesis of Biphonic Singing

arXiv eess.AS (Audio and Speech Processing) yest

Channel-Oriented Design for EEG-to-Music Reconstruction

arXiv eess.AS (Audio and Speech Processing) yest

The Differentiable Auditory Loop (DAL): An ML Framework for Hyper-Personalized Hearing Aids

arXiv eess.AS (Audio and Speech Processing) yest

Feasibility of Time-Domain DNN-Based Speech Enhancement on Embedded FPGA for Hearing Aid

arXiv eess.AS (Audio and Speech Processing) yest

Gauss Circle Lattices with Geometric Convolutions for Synthesizing High Dimensional Image-Source Room Impulse Responses

arXiv eess.AS (Audio and Speech Processing) yest

CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding

arXiv eess.AS (Audio and Speech Processing) yest

Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention

arXiv eess.AS (Audio and Speech Processing) yest

Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026

arXiv eess.AS (Audio and Speech Processing) yest

SURF: Separation via Unsupervised Remixing Flow

arXiv eess.AS (Audio and Speech Processing) yest

Audio Interaction Model

arXiv eess.AS (Audio and Speech Processing) yest

A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References

arXiv eess.AS (Audio and Speech Processing) yest

FSA-GRPO: Teaching Auditory LLMs to Use Few-shot Demonstrations

arXiv eess.AS (Audio and Speech Processing) Jun 3

Wavelet as Tokenizer: Preliminary Results on a Shared Wavelet Token Schema for Natural Signals

arXiv eess.AS (Audio and Speech Processing) Jun 3

SVHalluc: Benchmarking Speech-Vision Hallucination in Audio-Visual Large Language Models

arXiv eess.AS (Audio and Speech Processing) Jun 3

A Comparison of Generative and Discriminative Methods for Speech Enhancement: Robustness, Complexity, and Hallucination

arXiv eess.AS (Audio and Speech Processing) Jun 3

AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following

arXiv eess.AS (Audio and Speech Processing) Jun 3

SpeakerCard-1M: An Evidence-Grounded Speaker Card Corpus for In-the-Wild Speaker Verification

arXiv eess.AS (Audio and Speech Processing) Jun 3

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

arXiv eess.AS (Audio and Speech Processing) Jun 3

Stable Hybrid Cross-Attention Fusion for Audio-Visual Event Recognition

arXiv eess.AS (Audio and Speech Processing) Jun 3

In-the-Loop Training of Deep Feedback Cancellation for Hearing Aids

arXiv eess.AS (Audio and Speech Processing) Jun 3

SegTune: Structured and Fine-Grained Control for Song Generation

arXiv eess.AS (Audio and Speech Processing) Jun 3

Before Fusion, Ask What to Keep: Contextual Calibration of Multimodal Signals

arXiv eess.AS (Audio and Speech Processing) Jun 3

EntangleCodec: A Unified Discrete Audio Tokenizer via Semantic-Acoustic Entanglement

arXiv eess.AS (Audio and Speech Processing) Jun 3

CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning

arXiv eess.AS (Audio and Speech Processing) Jun 3

Inference-Time Scaling for Joint Audio-Video Generation

arXiv eess.AS (Audio and Speech Processing) Jun 3

Benchmarking Speech-to-Speech Translation Models

arXiv eess.AS (Audio and Speech Processing) Jun 3

Privacy-preserving Prosody Representation Learning

arXiv eess.AS (Audio and Speech Processing) Jun 2

Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection

arXiv eess.AS (Audio and Speech Processing) Jun 2

Context-aware child-directed speech detection from long-form recordings

arXiv eess.AS (Audio and Speech Processing) Jun 2

Description and Discussion on DCASE 2026 Challenge Task 2: Noise-aware Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

arXiv eess.AS (Audio and Speech Processing) Jun 2

RRP-Voice: A Longitudinal Dataset and Benchmark for Recurrent Respiratory Papillomatosis Detection

arXiv eess.AS (Audio and Speech Processing) Jun 2

Kinship Verification Using Voice

arXiv eess.AS (Audio and Speech Processing) Jun 2

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

arXiv eess.AS (Audio and Speech Processing) Jun 2

Advancing Electrolaryngeal Speech Enhancement Through Speech-Text Representation Learning

arXiv eess.AS (Audio and Speech Processing) Jun 2

Localizing broadband noise sources using the Lo\`eve spectrum and a 2.5D approach

arXiv eess.AS (Audio and Speech Processing) Jun 2

Domain-Agnostic Incremental Learning for Sound Classification. A DCASE 2026 Challenge task

arXiv eess.AS (Audio and Speech Processing) Jun 2

Breaking the Pair: Evaluating Dyadic Interaction via Speaker Switching

arXiv eess.AS (Audio and Speech Processing) Jun 2

SiamCTC: Learning Speech Representations through Monotonic Temporal Alignment

arXiv eess.AS (Audio and Speech Processing) Jun 2

Exploiting Noise Inseparability for Weakly-Supervised Discriminative Speech Denoising Using Noisy Targets

arXiv eess.AS (Audio and Speech Processing) Jun 2

SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription

arXiv eess.AS (Audio and Speech Processing) Jun 2

DUET: Unified Dual-Space Emotion Control for Diffusion and Flow-Matching Driven Text-to-Speech

arXiv eess.AS (Audio and Speech Processing) Jun 2

Extracting accent features in spoken Brazilian Portuguese without sociolinguistic labels

arXiv eess.AS (Audio and Speech Processing) Jun 1

FiPA-SR -- FiLM-Conditioned Perceptually Informed Audio Super-Resolution

arXiv eess.AS (Audio and Speech Processing) Jun 1

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

arXiv eess.AS (Audio and Speech Processing) Jun 1

A Unified and Reproducible Experimentation Framework for Speech Understanding

arXiv eess.AS (Audio and Speech Processing) Jun 1

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

arXiv eess.AS (Audio and Speech Processing) Jun 1

ImmersiveTTS: Environment-Aware Text-to-Speech with Multimodal Diffusion Transformer and Domain-Specific Representation Alignment

arXiv eess.AS (Audio and Speech Processing) Jun 1

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

arXiv eess.AS (Audio and Speech Processing) Jun 1

On the Use of Dereverberation for Acoustic Feedback Cancellation

arXiv eess.AS (Audio and Speech Processing) Jun 1

Improving acoustic drone detection generalization through pretraining and data augmentation

arXiv eess.AS (Audio and Speech Processing) Jun 1

UNISON: A Unified Sound Generation and Editing Framework via Deep LLM Fusion

arXiv eess.AS (Audio and Speech Processing) Jun 1

Mental Damage: Caption Poisoning Attacks on Retrieval-Augmented Text-to-Music Generation

arXiv eess.AS (Audio and Speech Processing) Jun 1

Escaping the Linearity Trap: Manifold Detours for Black-Box Adversarial Attacks on Singing Audio Deepfake Detection

arXiv eess.AS (Audio and Speech Processing) Jun 1

Chatterbox-Flash: Prior-Calibrated Block Diffusion for Streaming Zero-Shot TTS

arXiv eess.AS (Audio and Speech Processing) Jun 1

Scaling Conversational Hungarian ASR: The BEA-Dialogue+ Corpus

arXiv eess.AS (Audio and Speech Processing) Jun 1

Acoustic Simulation Framework for Multi-channel Replay Speech Detection

arXiv eess.AS (Audio and Speech Processing) Jun 1

Keyboard

j / k: move between items
Space: expand / collapse
o: open original
s: save / unsave
m: mark read
/: focus search
?: this help