🎧 Audio-Omni

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

Project PageCodearXivModel

Understanding

Multi-turn conversation with the model. Each turn can freely combine Text / Audio / Video inputs. Text history is preserved across turns for context.