In this paper, we propose a clustering algorithm based on a two-phased neural network architecture. We combine the strength of an autoencoderlike network for unsupervised representation learning with the discriminative power of a support vector machine (SVM) network for fine-tuning the initial clusters. The first network is referred as prototype encoding network, where the data reconstruction error is minimized in an unsupervised manner. The second phase, i.e., SVM network, endeavors to maximize the margin between cluster boundaries in a supervised way making use of the first output. Both the networks update the cluster centroids successively by establishing a topology preserving scheme like self-organizing map on the latent space of each network. Cluster fine-tuning is accomplished in a network structure by the alternate usage of the encoding part of both the networks. In the experiments, challenging data sets from two popular repositories with different patterns, dimensionality, and the number of clusters are used. The proposed hybrid architecture achieves comparatively better results both visually and analytically than the previous neural network-based approaches available in the literature.