In the domain of audio signal processing, the accurate and efficient diarization of conversational speech remains a challenging task, particularly in environments with significant speaker overlap and diverse acoustic scenarios. This paper introduces a comprehensive speaker diarization pipeline that substantially improves both performance and efficiency in processing conversational speech. Our pipeline comprises several key components: Voice Activity Detection (VAD), Speaker Overlap Detection (SOD), Speaker Separation models, robust speaker embedding, clustering algorithms, and sophisticated post-processing techniques. Beginning with Voice Activity Detection (VAD), the pipeline efficiently discriminates between speech and non-speech segments, effectively reducing processing overhead. Following VAD, the Speaker Overlap Detection (SOD) component identifies segments featuring speaker overlap. |