Sequences that facilitate high fidelity of pairing by RecA: A model

G. Karthikeyan and Basuthkar J. Rao

Department of Biological Sciences, Tata Institute of Fundamental
Research, Colaba, Mumbai 400 005, India

Homologues of E. coli RecA in eucaryotes (Rad 51) are conserved during evolution in their structural and physical properties. They form structurally similar presynaptic filaments on single-stranded DNA. These proteins bind to certain sequences that are G- and T-rich with higher affinity. Hot-spots of recombination in E. coli are embedded in GT-rich stretches. The DNA bases in the presynaptic filament show a high degree of promiscuous pairing excepting the C residue, which is paired with a high degree of fidelity. A model is proposed in this study, suggesting that the binding preference and pairing fidelity are two separate parameters that might together ensure proper recombinational pairing in hot-spots.

RECOMBINATION hot-spots is well characterized in
E. coli and S. cerevisiae at the genetic and molecular level1,2. In higher eukaryotes such as mammals and plants, a few candidate sequence motifs are described as recombination hot-spots3–5. In spite of a wealth of information on hot-spots in E. coli and S. cerevisiae, there is no obvious consensus at the DNA level as to what makes a region ‘hot-spot’ for recombination. In this paper, we try to focus on this issue and propose a molecular model for the same. This proposal is based on our work on E. coli RecA as well as that published from Stephen Kowalczykowski’s lab6,7. A genetic hot-spot is characterized by extrinsic and intrinsic factors. The former includes accessibility to the recombination machinery and chromatin structure. Intrinsically, a ‘hot-spot’ should contain DNA sequences that might have higher affinity to RecA protein and thereby promote a relatively stable RecA nucleoprotein filament that initiates recombination at a higher frequency. It should also have DNA sequences that can pair well with homologous sequences. Recent work addresses the issue of RecA affinity6 whereas our results provide an insight on the pairing preferences of RecA. In this communication, we have focused on the intrinsic factors that influence recombination.

An in vitro selection was performed in a random pool of 1014 oligos which were 70-mers and a pool of 1011 18-mers to select sequences that have higher affinity for RecA binding6. Both selections were done with limiting concentrations of RecA. Several cycles (eight for the 70-mer pool and five for the 18-mer pool) of selection and PCR amplification yielded sequences that were substantially rich in G and T bases. The average base percentages of several such clones were: (from the 70-mer pool) %G = 33.5; %A = 17.9; %T = 30.7; %C = 17.9; (from the 18-mer pool) %G = 38.3; %A = 12.7; %T = 37.3; %C = 11.2. An independent selection from a random pool of 18-mers yielded single strands of similar sequence bias when yeast Rad 51 was used instead of E. coli RecA which underscores the universality of such sequence biases8. So, it is clear that E. coli RecA and its eukaryotic counterparts are evolutionarily designed to bind G and T residues more strongly than C and A residues (G = T > A = C).

We have been interested in quantitating the intrinsic ability of A, G, C and T bases in RecA-filament to choose their complementary base from a milieu of mispairs during homology search. Two observations prompted us to study two-stranded complementary pairing of RecA to address this issue. (i) E. coli recombination ensues in spite of high sequence divergence between recombining partners (such as in conjugational mating between E. coli and Salmonella typhimurium when mismatch repair genes are mutated)9 (ii) An in vitro counterpart of such a reaction that is also more tolerant to mismatches happens only when pairing leads to D-loop complexes10–12. D-loop complexes are essentially sustained by complementary pairing between the filament strand and its complement in the superhelical duplex. We monitored complementary pairing between an 83-mer oligo and a 33-mer (having equal preponderance of all the four bases) in an assay where we substituted a particular base for another at every position on the 33-mer. So all residues were changed to another at a time, thus giving rise to 12 different 33-mer tester sequences13. This leads to several specific mispairs on pairing with the 83-mer. In each case, RecA was coated on an 83-mer which is long enough to promote RecA binding in the presence of ATP. To elucidate the effects of specific sets of non-Watson–Crick base pairs, targeted recognition was monitored as a ligatable alignment between a tester and a reference tether immediately upstream of it13. The tether is a 25-mer, fully complementary to one end of the 83-mer. The tester is a 33-mer which carried base substitutions that reduced Watson–Crick complementarity between pairing substrates. In this way, we assessed the effects of all the 12 possible kinds of mispairs by quantitating the ligatable alignments on a denaturing polyacrylamide gel. To minimize the effects of blunt-ended ligations in the above assay, we used E. coli ligase in the present set of experiments as opposed to T4-ligase. The pairing hierarchy was expressed as the percentage of tester-oligo that was ligated due to targeted pairing (Figure 1, first base on the mispair is from the RecA-83-mer filament and the second base from the 33-mer tester). By and large, the hierarchy was similar to what was observed before and confirmed across two different sequence contexts13. However, E. coli ligase assay was more discriminatory than that of T4-ligase and was able to cap-

578.gif (15580 bytes)

 

Figure 1.  PhosphorImager analysis of targeted pairings in single changes as described13. Ligations were done for 30 min with E. coli DNA-ligase. WT = wildtype. The first base on the mispair is on the RecA presynaptic filament.

 

 

ture even small differences in steady state level of pairings across AG, AA, AC, GT and TG mispairs. This enabled us to detect the intrinsic fine hierarchy amongst this set which was earlier all clustered as equally good13. The overall hierarchy of ten out of twelve base mispairs measured by E. coli ligase assay was similar to that of T4-ligase assay, the two reversals being that of GG and CT. In E. coli ligase assay, GG was better than TT whereas CT was not as good as CA. However, this descrepancy has no bearing on our conclusions because the relative gradation of promiscuity among A, G, C and T bases is the main point of discussion here (see below). What is clear from this hierarchy is that A residues in the filament are the most promiscuous and C the least, while G and T fall in between. In other words, different bases in the filament have different degrees of pairing-promiscuity using which one could define a parameter called ‘promiscuity index’ (Table 1). The higher the promiscuity index, the higher is the tendency of that base to engage in biologically unproductive mispairing that would eventually get eliminated by mismatch repair proteins14. A filament that is richer with bases of high promiscuity index is more prone to get ‘bogged’ down with unproductive pairings. On the other hand, a filament of bases with low promiscuity index has higher chances of encountering the right base (complementary and productive pairing). Despite minor differences in the hierarchy-status of GG and CT in T4-ligase assay earlier13 vis-à-vis that by E. coli ligase assay here, the relative grading of promiscuity index of A, G, T and C bases in the filament remains the same in both assays.

Table 1.  Proximiscuity-index for a particular base is expressed as the ratio of the total percentage of promiscuous pairings involving that base to its wild type complementary pairing (as measured in the Targeted Ligation Assay13,
Figure 1)

579.gif (6149 bytes)

We hypothesize that E. coli recombination hot-spots could perhaps face two types of evolutionary pressures: one for better binding by RecA and the other for better chances of finding complementary pairs (those with a lower promiscuity index). RecA binds to some bases better, but has a higher fidelity of pairing with other bases. G, T richness in a hot-spot confers a much better binding affinity for RecA while C richness confers much better pairing success rate. A residues are selected out on both counts, namely poor binding as well as poor pairing ability.

In a recent paper, Kowalczykowski and his coworkers have analysed sequences flanking all 1009 chi sites (5'-GCTGGTGG-3') (E. coli hot-spots) and searched for statistically significant sequence bias around these sites7. The deviations from the genomic mean of A, C, G and T residues at 50 positions surrounding the 1009 aligned chi sites revealed a striking pattern: A residues are highly under-represented whereas G and T residues are highly over-represented in the entire vicinity of chi-tracts. C residues are distributed fairly equally excepting in a few positions in the immediate vicinity of chi.

Several enzymes play a role in initiating the recombination at chi sequences. The enzyme complex RecBCD initiates the process of recombination by processively degrading double-stranded DNA till it encounters a chi sequence where it attenuates its 3'-5' exonuclease15. The enzyme continues to degrade the other strand, now leaving a 3'-single stranded tail to which RecA binds16. It is also known that the RecBCD enzyme stimulates the preferential loading of chi and its adjoining sequences (3' to 5') by RecA protein in preference to E. coli SSB as well as to any other non-chi related sequences17. Such a facilitated loading of RecA on chi-sequences depends upon the simultaneous action of both RecA and RecBCD proteins and demonstrates a new level of coordination during the initiation of recombination17. At the DNA sequence level, there is a distinct under representation of C in a short stretch immediately following chi. This might provide a cue for the attenuation of the 3'-5' exonuclease and accentuation of 5'-3' exonuclease of the RecBCD enzyme. RecA coats the single strand forming a presynaptic complex and begins the process of homology search. However, the stretch of DNA flanking chi is over-represented with G and T residues in a triplet pattern of GGT7. This DNA is a good substrate for binding to RecA. However, the triplet arrangement poses a problem for homology search as several frames would exist with which this can pair in a complementary manner. It is here that the random and average distribution of C residues plays a role. C is the base with the lowest promiscuity index or in other words pairs with the highest fidelity. This would ensure that the right frame of alignment is fixed by the C residue amidst a sea of GGT triplets. Indeed, recent in vitro experiments involving RecA pairing with dinucleotide repeats has borne out this notion18. RecA-ss-DNA filaments encompassing continuous repeats of either GT or CA exhibit poorer efficiencies of stable joints and strand exchange products than that of mixed sequence controls. This happens in spite of the ability of repeat stretches to bind RecA measurably better than mixed sequence control18. A simple explanation for such an effect could be that among pure repeat sequences, RecA reaction cannot decide on the right frame of alignment leading to shorter joints that are unstable to deproteinization and are slow in strand exchange18. This result strongly underscores the need of interspersed bases as frame-fixers even in those sequences that bind RecA very well, to aid RecA-pairing in the right (productive) frame of alignment. And C residues, being the least promiscuous in pairing serve this function well in the milieu of GGT-repeats of chi-islands in E. coli7.

What are the implications of this argument? In simple organisms such as E. coli, molecular determinants that might have shaped hot-spots are much more recombinase-based (RecA, in this case). Extrinsic and higher order structural elements such as chromatin accessibility or nuclear matrix anchorages, etc. may play a much smaller role. In eukaryotes, where regulation is much more complex, extrinsic components become more important. Nevertheless, it is interesting to note that the
E. coli RecA homologue of yeast, Rad51, has binding preference of DNA bases very similar to that of RecA8 and perhaps has similar pairing preferences too.

 


  1. Lam, S. T., Stahl, M. M. McMilin, K. D. and Stahl, F. W.,
    Genetics, 1974, 77, 425–433.
  2. Nicolas, A., Treco, D., Schultes, N. P. and Szostak, J. W.
    Nature, 1989, 338, 35–39.
  3. Steinmetz, M., Stephan, D. and Lindahl, K. F., Cell, 1986, 44, 895–904.
  4. Nachman, M. W. and Churchill, G. A., Genetics, 1996, 142, 537–548.
  5. Patterson, G. I., Kubo, K. M., Shroyer, T. and Chandler, V. L. Genetics, 1995, 140, 1389–1406.
  6. Tracy, R. B. and Kowalczykowski, S. C., Genes Dev., 1996, 10, 1890–1903.
  7. Tracy, R. B., Chedin, F. and Kowalczykowski, S. C., Cell, 1997, 90, 205–206.
  8. Tracy, R. B., Baumohl, J. K. and Kowalczykowski, S. C., Genes Dev., 1997, 11, 3423–3431.
  9. Rayssiguier, C., Thaler, D. S. and Radman, M., Nature, 1989, 342, 396–401.
  10. Beattie, K. L., Wiegand, R. C. and Radding, C. M., J. Mol. Biol., 1977, 116, 783–803.
  11. Hsieh, P., Camerini-Otero, C. S. and Camerini-Otero, D., Proc. Natl. Acad. Sci., 1992, 89, 6492–6496.
  12. Adzuma, K., Genes Dev., 1992, 6, 1679–1694.
  13. Karthikeyan, G., Wagle, M. D. and Rao, B. J., FEBS Lett., 1998, 425, 45–51.
  14. Worth, Jr. L., Clark, S., Radman, M. and Modrich, P., Proc. Natl. Acad. Sci., 1994, 91, 3238–3241.
  15. Dixon, D. A. and Kowalczykowski, S. C., Cell, 1991, 66, 361–371.
  16. Dixon, D. A. and Kowalczykowski, S. C., Cell, 1993, 72, 87–96.
  17. Anderson, D. G. and Kowalczykowski, S. C., Cell, 1997, 90, 77–86.
  18. Dutreix, M., J. Mol. Biol., 1997, 273, 105–113.

 

 

Received 19 August 1998; revised accepted 1 December 1998