Transformer-Based Approaches in Steganalysis with Architectural Trends and Advantages


Konyar M. Z.

10th International Conference on Natural and Engineering Sciences, 20 - 21 Aralık 2025, ss.1-2, (Özet Bildiri)

  • Yayın Türü: Bildiri / Özet Bildiri
  • Sayfa Sayıları: ss.1-2
  • Kocaeli Üniversitesi Adresli: Evet

Özet

Steganalysis is a field of analysis aimed at detecting the presence of hidden information in digital media such as images, audio, or video, with steganography methods. The primary goal of the steganalysis is to determine whether hidden data (stego) is embedded within a dataset by examining deviations from its natural statistical and structural characteristics. In most cases, steganalysis does not aim to understand the content of the hidden message. But only reveal its existence; therefore, it focuses on distinguishing embedding traces with low signal-to-noise ratios and those that are intentionally obscured. These characteristics make steganalysis a technically challenging area of research. In this context, steganalysis is a critical information security discipline that combines signal processing, statistical modeling, and, in recent years, deep learning-based methods to achieve reliable detection under different embedding algorithms and payload levels. The development of adaptive steganography methods, in particular, has necessitated more complex and globally contextualized analytical approaches.

In recent years, Transformer-based models and Vision Transformer (ViT) hybrids have significantly accelerated the development of steganalysis by integrating global context modeling capabilities through attention mechanisms. This study systematically examines Transformer-based steganalysis approaches, analyzing the architectural trends, strengths, and limitations of existing methods from a holistic perspective. The study's fundamental contribution to the literature is its presentation of Transformer-based steganalysis methods under a structured classification based on architectural families, domain-specific components, and efficiency-focused modules.

In this study, CVTStego Net, which combines convolutional residual preprocessing with ViT-based attention mechanisms, is considered a representative method that stands out with its high accuracy rates for WOW and S-UNIWARD algorithms on the BOSSbase dataset. SwT-SN is noteworthy for its Swin Transformer-based structure, its support for arbitrary-sized images, its shorter training time, and its high detection performance. Specifically in JPEG steganalysis, SF3Net, which integrates spatial-frequency fusion with Swin blocks, offers superior performance with fewer parameters, significantly contributing to architectural efficiency. Furthermore, the Auxiliary U-Net + Attention approach demonstrates the impact of auxiliary networks and attention mechanisms through its significant performance improvements in advanced adaptive steganography methods. As a domain-specific example, GTSCT-Net reports the lowest detection error on medical images, demonstrating the generalizability of Transformer-based steganalysis approaches to different data domains.

Overall, this study clearly demonstrates the gains offered by Transformer-based and hybrid steganalysis methods compared to traditional CNN-based approaches in terms of global dependency modeling, local-global feature fusion. The paper also highlights open research problems such as computational cost, strong preprocessing requirements, and sensitivity to embedding methods. Therefore, the study provides a comprehensive and guiding reference for future research into the development of efficient, generalizable, and domain-specific Transformer architectures in the field of steganalysis..