The ICE database of human splice site aims to provide a resource which gives users information of the coordinates of a gene's exonic-intronic boundaries on its respective genomic contig. In other words, a user may obtain the primary structure of a gene on the genome. The determination of the coordinates of the splice site on the contig is achieved through the alignment of mRNA/cDNA sequence data on the genomic contigs. Thus, unlike other splice site databases, our determination of the individual splice site is not based on short EST sequences.
It is common to find more than one mRNA sequence in GenBank for one particular gene because of the individual and separate efforts of different labs to clone and sequence a gene. In addition, certain 'full-length' (sometimes uncharacterized) mRNA/cDNA transcripts from the NEDO human cDNA sequencing project (denoted by the prefix "FLJ", generated by the "oligo-capping" method), large cDNAs (> 4 kb) of the Kazusa human cDNA sequencing project (denoted by the prefix "KIAA", generated by conventional methods), cDNA sequences from the German Cancer Research Center (DKFZ) and those from the IMAGE Consortium have been also made available in GenBank. All these mRNA sequences are aligned to the genomic contigs in NCBI's LocusLink. We have, thus, created a program similar to FIE2 to extract the coordinates of the exonic boundaries (splice sites) of the genes and also, provide a 82-nucleotide sequence around each splice site.