It has been demonstrated, that random similarity of sequences or sequence sections can impede phylogenetic ananlyses or the identification of gene homology. Additionally, randomly similar sequences or ambiguously aligned sequence sections can negatively interfere with the estimation of substitution model parameters. Phylogenetic studies have shown, that bisases in model estimation and tree reconstructions do not disappear even with large data, but in fact can become pronounced. It is therefore important to identify possible random similarity within sequence alignments in advance of model estimation and tree reconstructions.
Different approaches have been already suggested to identify and treat problematic alignment sections. We propose an alternative method, which can identify random similarity within multiple sequence alignments based on Monte Carlo resampling within a sliding window. The method infers similarity profiles from pairwise sequence comparisons and subsequently calculates a consensus profile. In consequence, consensus profiles identify dominating patterns of non-random similarity or randomness within sections of multiple sequence alignments.