10th International Conference on Natural and Engineering Sciences, 20 - 21 Aralık 2025, ss.1-2, (Özet Bildiri)
Steganalysis is a field of analysis aimed at detecting the presence of
hidden information in digital media such as images, audio, or video, with
steganography methods. The primary goal of the steganalysis is to determine
whether hidden data (stego) is embedded within a dataset by examining
deviations from its natural statistical and structural characteristics. In most
cases, steganalysis does not aim to understand the content of the hidden
message. But only reveal its existence; therefore, it focuses on distinguishing
embedding traces with low signal-to-noise ratios and those that are
intentionally obscured. These characteristics make steganalysis a technically
challenging area of research. In this context, steganalysis is a critical
information security discipline that combines signal processing, statistical
modeling, and, in recent years, deep learning-based methods to achieve reliable
detection under different embedding algorithms and payload levels. The
development of adaptive steganography methods, in particular, has necessitated
more complex and globally contextualized analytical approaches.
In recent years, Transformer-based models and Vision Transformer (ViT)
hybrids have significantly accelerated the development of steganalysis by
integrating global context modeling capabilities through attention mechanisms.
This study systematically examines Transformer-based steganalysis approaches,
analyzing the architectural trends, strengths, and limitations of existing
methods from a holistic perspective. The study's fundamental contribution to
the literature is its presentation of Transformer-based steganalysis methods
under a structured classification based on architectural families,
domain-specific components, and efficiency-focused modules.
In this study, CVTStego Net, which combines convolutional residual
preprocessing with ViT-based attention mechanisms, is considered a
representative method that stands out with its high accuracy rates for WOW and
S-UNIWARD algorithms on the BOSSbase dataset. SwT-SN is noteworthy for its Swin
Transformer-based structure, its support for arbitrary-sized images, its
shorter training time, and its high detection performance. Specifically in JPEG
steganalysis, SF3Net, which integrates spatial-frequency fusion with Swin
blocks, offers superior performance with fewer parameters, significantly
contributing to architectural efficiency. Furthermore, the Auxiliary U-Net +
Attention approach demonstrates the impact of auxiliary networks and attention
mechanisms through its significant performance improvements in advanced
adaptive steganography methods. As a domain-specific example, GTSCT-Net reports
the lowest detection error on medical images, demonstrating the
generalizability of Transformer-based steganalysis approaches to different data
domains.
Overall, this study clearly demonstrates the gains offered by
Transformer-based and hybrid steganalysis methods compared to traditional
CNN-based approaches in terms of global dependency modeling, local-global
feature fusion. The paper also highlights open research problems such as
computational cost, strong preprocessing requirements, and sensitivity to
embedding methods. Therefore, the study provides a comprehensive and guiding
reference for future research into the development of efficient, generalizable,
and domain-specific Transformer architectures in the field of steganalysis..