Researchers explore whether task-vector arithmetic—a technique for controlling emotional intensity in text-to-speech systems—can be applied to large language model-based TTS systems like Qwen3-TTS. They test this through systematic experiments using LoRA fine-tuning and codec embeddings to manipulate emotional expressivity at different levels of the model.
If task-vector arithmetic proves effective for fine-grained emotional control in LM-based TTS, it could enable Daedalus's audio agent to generate more nuanced character voice variations and dynamic dialogue delivery with minimal additional training overhead.