资 源 简 介
The python codes parses a FASTA file for existing gene sequence, then computes the minimum Levenshtein distance of different gene sequences. It also computes a Hidden Markov Model for each gene sequence using both Viterbi and Posterior decoding method, and detects the position of CpG island.
ps1.py:
The function readSeqFromFile(filename) parses a FASTA file and detects the genome sequences present in the file. We use the expression "[^ACGT][
][ACTG
]+ACTG[
]" to detect the genome sequences, and then remove the newline characters from the genomes.
The function translateSeq(seq) translates a gene sequence into its amino acid sequence. The function readFrame(seq) is the function which detects genes from the genome sequence. Any sequence which is a substring of a longer sequence is discarded.
To run the program, please give the name of the FASTA or any file as the argument, e.g.,
"python3.3 ps1.py filename