RAVDESS Emotion Recognition - Dual Model

Multimodal emotion recognition using two models:

  1. HuBERT + Wav2Vec2: Uses raw audio with transformer architecture
  2. Attention Fusion: Uses mel-spectrograms with CNN architecture

Both models use MobileNetV2 for visual features and cross-modal attention for fusion.

Emotions: neutral, calm, happy, sad, angry, fearful, disgust, surprised