Data mining (DM) based on Bayesian neural networks (BNN) is popular for exploring consistent patterns and/or systematic relationships of variables in chemometrics. A combination of the unsupervised principal component similarity (PCS) analysis with the random-centroid optimization for site-directed mutagenesis of amino acid sequences (RCG) is proposed to correlate the sequence data with functions of proteins. Principal component similarity based on similarity was superior to classifications based on dissimilarity represented by multidimensional distances. Important factors (independent variables) to be used in the optimization could be selected through PCS processing of data to improve reliability of function prediction. Dimensionality reduction using PCS by eliminating minor factors and/or trend-line drawing on response surface maps in RCG to determine the direction of search shift toward the global optimum are useful for approximating the underlying response surfaces. Application of the sequence PCS was useful in elucidating usually unknown mechanisms of underlying functions of a protein based on its amino acid sequence. The functions in question may be predicted using a modern version of neural networks.