Researchers explore whether task-vector arithmetic—a technique for controlling emotional intensity in speech synthesis—can be applied to large language model-based text-to-speech systems. They test the approach on Qwen3-TTS by systematically evaluating different operand types (model weights, embeddings, tokens, speaker parameters) to understand how emotional expressivity control transfers to LM-based TTS architectures.
If task-vector arithmetic proves effective for fine-grained emotional control in LM-TTS, Daedalus's audio agent could generate more nuanced character voice variations during game iteration without retraining full models.