DISCRETE NONNEGATIVE SPECTRAL CLUSTERING

 

ABSTRACT

Spectral clustering has been playing a vital role in various research areas. Most traditional spectral clustering algorithmscomprise two independent stages (e.g., first learning continuous labels and then rounding the learned labels into discrete ones), whichmay cause unpredictable deviation of resultant cluster labels from genuine ones, thereby leading to severe information loss andperformance degradation. In this work, we study how to achieve discrete clustering as well as reliably generalize to unseen data. Wepropose a novel spectral clustering scheme which deeply explores cluster label properties, including discreteness, nonnegativity anddiscrimination, as well as learns robust out-of-sample prediction functions. Specifically, we explicitly enforce a discrete transformationon the intermediate continuous labels, which leads to a tractable optimization problem with a discrete solution. Besides, we preservethe natural nonnegative characteristic of the clustering labels to enhance the interpretability of the results. Moreover, to furthercompensate the unreliability of the learned clustering labels, we integrate an adaptive robust module with lloss to learn predictionfunction for grouping unseen data. We also show that the out-of-sample component can inject discriminative knowledge into thelearning of cluster labels under certain conditions. Extensive experiments conducted on various data sets have demonstrated thesuperiority of our proposal as compared to several existing clustering approaches.2,p

EXISTING SYSTEM:

It is well-known that optimizing the spectral clustering modelswill lead to an NP-hard problem due to the discrete constraint onthe clustering labels. To achieve a feasible approximate solution,most spectral clustering algorithms follow a common practicalparadigm. It first relaxes the discrete constraint to allow theclustering label matrix to be continuous-valued and performseigenvalue decomposition on the specific Laplacian matrix togenerate an approximate indicator with continuous values. Then,we can discretize the clustering label matrix by employing certainindependent technique, such as k-means. Furthermore, to enableclustering new unseen data, one may learn an additional predictionfunction in an independent stage (module). In, the out-ofsampleproblem is addressed by introducing a regression learningmodule, and discriminative information is injected into theconstruction of the similarity matrix to improve clustering performance.Although existing spectral clustering has been applied inpractice widely, they may easily achieve poor performance due tothe following drawbacks:• High risk of severe deviation of approximate solution fromthe genuine discrete clustering labels;• Information loss a separate independent stages, e.g.,continuous label generation, label discretization and predictionfunction learning; and• Unreliability of the predicted cluster labels leading to poorprediction functions.

PROPOSED SYSTEM:

we propose anovel Discrete Nonnegative spectral clustering (DNC) schemewhich directly learns discrete clustering labels and robust outof-sampleprediction functions in a unified manner. Specifically,in order to alleviate the influence caused by the informationloss during the relaxation of traditional spectral clustering, wedeliberately recover the abandoned discrete constraint with asmooth transformation (e.g., rotation) from the relaxed continuousclustering labels to a discrete solution. In this sense, thecontinuous clustering label just serves as an intermediate product.We integrate a discrete rotation functionality, which guarantees atractable optimization problem with a discrete solution. Besides,in order to enhance the interpretability, we extend our previouswork  by preserving the natural nonnegative characteristicof cluster labels. The nonnegative constraint can further narrowdown the gap between relaxed solutions and the genuine labels.Moreover, to further compensate the unreliability of the learnedlabels, we integrate an adaptive robust module to learn predictionfunction for unseen data. In particular, we devise a novel noisemodelling approach by utilizing an effective lloss term overthe prediction error residual to capture unreliability of clusteringlabels in a more adaptive way. The l2,p2,ploss is capable of inducingsample-wise sparsity, which naturally identifies unreliable predictedlabels. Besides, different choices of p enable sufficient controlon unpredictable conditions of label noise. Due to the existenceof the nonnegative constraint in the formulation, we develop aneffective algorithm to optimize the overall model. We furtherprove that the out-of-sample component can actually introduceadditional discriminative knowledge into the learning of clusterlabels under certain conditions.

CONCLUSION

In this work, we coped with the problem existing in mosttraditional spectral clustering algorithms, e.g., relaxing discreteconstraints to continuous one, which consists of two independentstages (e.g., first learning continuous labels and then roundingthe learned labels into discrete ones). In order to reduce informationloss and performance degradation, we proposed a unifiedspectral clustering approach to directly learn discrete clusteringlabels and robust out-of-sample prediction functions. To be morespecific, our proposed approach can explicitly rotate continuouslabels to discrete ones. Meanwhile, we also deliberately preservenonnegative property of cluster label matrix to get closer tothe genuine partition of data. To the end of handling the noisyclustering labels, we integrated an adaptive robust module to learnprediction function for unseen data. Extensive experiments on sixdata sets demonstrated the promising performance of our proposalas compared to existing clustering approaches.

REFERENCE:

[1] A. Jain, M. Murty, and P. Flynn, “Data clustering: a review,” ACMComputing Surveys, vol. 31, no. 3, pp. 264–323, 1999.

[2] A. Rodriguez and A. Laio, “Clustering by fast search and find of densitypeaks,” Science, vol. 344, no. 6191, pp. 1492–1496, 2014.

[3] T. B¨uhler and M. Hein, “Spectral clustering based on the graph plaplacian,”in ICML, 2009, pp. 81–88.

[4] M. Belkin, L. Rademacher, and J. Voss, “The hidden convexity of spectralclustering,” AAAI, 2016.

[5] X. Zhu, C. C. Loy, and S. Gong, “Constrained clustering with imperfectoracles,” IEEE TNNLS, 2014.

[6] P. Felzenszwalb and D. Huttenlocher, “Efficient graph-based imagesegmentation,” IJCV, vol. 59, no. 2, pp. 167–181, 2004.

[7] D. Jiang, C. Tang, and A. Zhang, “Cluster analysis for gene expressiondata: A survey,” IEEE TKDE, vol. 16, no. 11, pp. 1370–1386, 2004.

[8] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm,theory, and applications,” IEEE TPAMI, vol. 35, no. 11, pp. 2765–2781,2013.

[9] S. Gordon, H. Greenspan, and J. Goldberger, “Applying the informationbottleneck principle to unsupervised clustering of discrete and continuousimage representations,” in CVPR, 2003, pp. 370–377.

[10] X. Wang, L. Zhang, X. Li, and W. Ma, “Annotating images by miningimage search results,” IEEE TPAMI, vol. 30, no. 11, pp. 1919–1932,2008.

[11] S. Yang, Z. Yi, X. He, and X. Li, “A class of manifold regularizedmultiplicative update algorithms for image clustering,” IEEE TIP, 2015.