Cardiovascular diseases remain the leading cause of global mortality.
Driving research into accessible and efficient diagnostic technologies, such as electrocardiograms (ECGs). This study investigates the feasibility of combining Vision Transformers with Convolutional Neural Networks for arrhythmia classification of ECG signals. Two lightweight approaches were developed: CNN_ViT uses a customized CNN with a self-attention layer, while the second approach, named MobCCT, uses MobileNetV2 with a Compact Convolutional Transformer. The proposed approaches employ an imbalanced ECG paper-based dataset for three distinct classification tasks. A 70/10/20 split was employed for the training, validation, and test sets, respectively. The proposed models achieved maximum accuracies of 99.5% in binary classification between COVID-19 and normal cases, 99.4% in three-class classification between COVID-19, normal, and abnormal heartbeats, and 98.3% in four-class classification distinguishing between COVID-19, normal, myocardial infarction (MI), and abnormal heartbeats (AHB). The experimental results indicate that the proposed approaches are appropriate options for integration with portable ECG devices or mobile application platforms. Additionally, the robust model's generalization demonstrated on a second independent dataset, exceeding prior studies with an accuracy of 98.41% in four-class cardiac classification. Results indicate that the proposed approaches are both accurate and robust. |