Genes that code for proteins expressed at high and low levels in plants were classified into separate data sets. The two data sets were analysed to identify the conserved nucleotide sequences that may characterize genes with contrasting levels of expression. The AUG context that characterized the highly expressed genes is (A/C)N2AAN3(A/T)T(A/C) AACAATGGCTNCC(T/A)CNA(C/T)(A/C). The data set of highly expressed genes shows overrepresentation of codons for alanine at the second position and serine at the third and fourth positions after the translation initiation codon. The characteristic transcription initiation site in the highly expressed genes is CAN(A/C)(A/C)(C/A)C(C/A)N2A(C/A). The promoter region is characterized by two tandemly repeated TATA elements, sometimes with one and rarely with two point mutations in the highly expressed genes. Besides the two tandemly repeated TATA elements, the promoter context in the highly expressed genes is overrepresented by C, C and G at the -3, -1 and+9 positions respectively. The characteristic TATA motif in the highly expressed plant genes is (T/C)(T/A)N2TCACTATATATAG. Most of these features are not present in the genes ubiquitously expressed at low levels in plants.