Materials and Methods
Histopathological Breast Cancer Classification Using a Hybrid Swin Transformer and ConvNeXt Architecture
Methodology
Dataset
The open-access BreaKHis (Breast Cancer Histopathological Image Classification) dataset (Spanhol et al., 2016) was utilized, consisting of 7,909 histopathological images stained with hematoxylin and eosin (H&E) collected from 82 patients across four magnification levels (40X, 100X, 200X, 400X). Images were stored in PNG format with 700×460 pixel dimensions in RGB color space and rescaled to 224×224 pixels for model compatibility.
Preprocessing: Binary classification employed normalization parameters of mean 0.5 and standard deviation 0.5 for each color channel.
Hybrid Model Architecture
A hybrid deep learning model was developed by combining Swin Transformer and ConvNeXt architectures to leverage their complementary strengths. ConvNeXt, a modernized CNN architecture, integrates Transformer elements including depthwise convolution, layer normalization, and large kernel structures with residual connections, 7×7 convolutional layers, LayerNorm, and GELU activations for capturing spatial dependencies through parameterized convolutional filters.
Swin Transformer addresses Vision Transformer scalability limitations through Window-based Multi-head Self-Attention (W-MSA), achieving linear computational complexity via the Shifted Window technique that enables information exchange between neighboring windows while learning both local and global contextual representations.
ConvNeXt Block Structure
The underlying mathematical structure applies weighted convolutions to multi-channel input data, followed by nonlinear activations, where LN denotes Layer Normalization, DWConv is Depthwise Convolution, and GELU is the Gaussian Error Linear Unit activation function.
Window-based Multi-head Self-Attention (W-MSA)
The window-based attention mechanism operates by computing weighted contributions based on the similarity between query, key, and value vectors of each patch, where H represents the number of attention heads.
Cross-Entropy Loss Function
This function minimizes the discrepancy between the predicted class distribution (ŷ) and the true labels (y) for binary classification, where y represents the true class label (0 or 1).
Step Decay Learning Rate Scheduler
This scheduling method reduces the learning rate by a factor of γ after every k epochs, where η0 is the initial learning rate and ηt represents the learning rate at epoch t.
Performance Evaluation
The performance of the proposed hybrid model was evaluated for binary classification using standard evaluation metrics including accuracy, precision, recall, and F1-score (Equations 5, 6, 7, and 8). The model demonstrated exceptional performance in distinguishing benign from malignant breast tumors.
Accuracy
Precision
Recall
F1-Score
The hybrid model achieved outstanding results on the test dataset with an overall accuracy of 98.86%. The precision was calculated as 99.07%, recall as 99.25%, and F1-score as 99.16%, indicating the model's reliable ability to distinguish between benign and malignant tumors with high confidence. The confusion matrix revealed very low misclassification rates, with rapid learning convergence and stable performance observed throughout the training epochs.