IEEE SPCup'22 - 5th Place

  | #Deep Learning#Signal Processing#Adversarial Learning

Description

In the challenging domain of SPCup'22, our mission was to distinguish synthetically generated human voices from authentic human voice clips, a critical task with far-reaching implications. Notably, we dealt with voice signals heavily processed by effects such as reverb, dry-wet reflection, and echoing, making the distinction even more intricate. Our journey began with the implementation of a transformer model for an initial classification of these complex signals. As we delved deeper into the project, we harnessed the power of Fast Fourier Transformation-based CNN modeling to further enhance our model's performance, ensuring that the line between authentic and synthetic human voices was crystal clear.

Key Achievements

  • Effective Signal Classification: We successfully employed a transformer model for the initial classification of voice signals, providing a solid foundation for our project.
  • Robust Signal Processing: Dealing with heavily processed audio signals underscored the importance of robust signal processing techniques.
  • Model Versatility: Combining transformer models with FFT and CNNs illustrated the power of diverse methodologies in tackling complex audio classification tasks.