IEEE SPCup'22 - 5th Place
Description
In the challenging domain of SPCup'22, our mission was to distinguish synthetically generated human voices from authentic human voice clips, a critical task with far-reaching implications. Notably, we dealt with voice signals heavily processed by effects such as reverb, dry-wet reflection, and echoing, making the distinction even more intricate. Our journey began with the implementation of a transformer model for an initial classification of these complex signals. As we delved deeper into the project, we harnessed the power of Fast Fourier Transformation-based CNN modeling to further enhance our model's performance, ensuring that the line between authentic and synthetic human voices was crystal clear.
Technologies




Key Achievements
- Effective Signal Classification: We successfully employed a transformer model for the initial classification of voice signals, providing a solid foundation for our project.
- Robust Signal Processing: Dealing with heavily processed audio signals underscored the importance of robust signal processing techniques.
- Model Versatility: Combining transformer models with FFT and CNNs illustrated the power of diverse methodologies in tackling complex audio classification tasks.