• PARTHA P. MAJUMDER

• Statistical analysis of family data on complex disorders in man

A genetic model is discussed for recessively inherited disorders that do not follow a single-locus Mendelian pattern of inheritance. Further complexity arising from variable age of onset is also discussed. Methods of statistical analysis of family data using the likelihood principle are described for such complex disorders. The methods are exemplified using data on families of prelingual deafness and vitiligo.

• Pedigree analysis of vitiligo: Further support for multilocus involvement

Vitiligo is a dermatological disorder in man that shows familial aggregation. We performed segregation analysis on data pertaining to vitiligo on members of 147 pedigrees each ascertained through a single proband, and tested various non-genetic, and one-locus and two-locus genetic models. Non-genetic and one-locus genetic models were rejected in favour of a two-locus model postulating epistatic interaction of recessive alleles in the aetiology of vitiligo. The present results show that vitiligo is not a single-locus disorder and substantiate our earlier inference, drawn on the basis of nuclear-family data, of multilocus involvement in the pathogenesis of vitiligo.

• How useful are microsatellite loci in recovering short-term evolutionary history?

Because microsatellite loci are abundant in the human genome and are highly polymorphic in most global populations, such loci have become very popular in studies on reconstructing evolutionary relationships among contemporary human populations. We have made an assessment of the efficiency of recovery of true evolutionary relationships using simulated data of microsatellite loci and a variety of distance measures. We find that allele frequency data on about 30 microsatellite loci and the use ofDA (Neiet al. 1983) orDc (Cavalli-Sforza and Edwards 1967) distance measures with UPGMA clustering algorithm can recover true short-term evolutionary relationships with a high degree of accuracy, unless the effective sizes of the populations or mutation rates or both are very small.

• Congruence of genomic and ethnolinguistic affinities among five tribal populations of Madhya Pradesh (India)

The central Indian state of Madhya Pradesh is home to a large number of tribal populations of diverse linguistic and ethnic backgrounds. With a view to examining how well genomic affinities among tribal populations of this state correspond with their ethnic and linguistic affinities, we analysed DNA samples of individuals drawn from five tribes with diverse, but reasonably well-documented, ethnohistorical and linguistic backgrounds. Each DNA sample was scored at 16 biallelic DNA marker loci. On the basis of these data, genomic affinities among these populations were estimated. We have found an extremely good correspondence between the genomic and ethnolinguistic affinities.

• An improved procedure of mapping a quantitative trait locus via the EM algorithm using posterior probabilities

Mapping a locus controlling a quantitative genetic trait (e.g. blood pressure) to a specific genomic region is of considerable contemporary interest. Data on the quantitative trait under consideration and several codominant genetic markers with known genomic locations are collected from members of families and statistically analysed to estimate the recombination fraction, θ, between the putative quantitative trait locus and a genetic marker. One of the major complications in estimating θ for a quantitative trait in humans is the lack of haplotype information on members of families. We have devised a computationally simple two-stage method of estimation of θ in the absence of haplotypic information using the expectation-maximization (EM) algorithm. In the first stage, parameters of the quantitative trait locus (QTL) are estimated on the basis of data of a sample of unrelated individuals and a Bayes’s rule is used to classify each parent into a QTL genotypic class. In the second stage, we have proposed an EM algorithm for obtaining the maximum-likelihood estimate of θ based on data of informative families (which are identified upon inferring parental QTL genotypes performed in the first stage). The purpose of this paper is to investigate whether, instead of using genotypically ‘classified’ data of parents, the use of posterior probabilities of QT genotypes of parents at the second stage yields better estimators. We show, using simulated data, that the proposed procedure using posterior probabilities is statistically more efficient than our earlier classification procedure, although it is computationally heavier.

• High-resolution analysis of Y-chromosomal polymorphisms reveals signatures of population movements from central Asia and West Asia into India

Linguistic evidence suggests that West Asia and Central Asia have been the two major geographical sources of genes in the contemporary Indian gene pool. To test the nature and extent of similarities in the gene pools of these regions we have collected DNA samples from four ethnic populations of northern India, and have screened these samples for a set of 18 Y-chromosome polymorphic markers (12 unique event polymorphisms and six short tandem repeats). These data from Indian populations have been analysed in conjunction with published data from several West Asian and Central Asian populations. Our analyses have revealed traces of population movement from Central Asia and West Asia into India. Two haplogroups, HG-3 and HG-9, which are known to have arisen in the Central Asian region, are found in reasonably high frequencies (41.7% and 14.3% respectively) in the study populations. The ages estimated for these two haplogroups are less in the Indian populations than those estimated from data on Middle Eastern populations. A neighbour-joining tree based on Y-haplogroup frequencies shows that the North Indians are genetically placed between the West Asian and Central Asian populations. This is consistent with gene flow from West Asia and Central Asia into India.

• A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

We have compared two statistical methods of estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences, which have been proposed by Templeton (1993) and Bandeltet al. (1995). Monte-Carlo simulations were used for generating DNA sequence data. Different evolutionary scenarios were simulated and the estimation procedures were evaluated. We have found that for both methods (i) the estimates are insensitive to demographic parameters and (ii) the standard deviations of the estimates are too high for these methods to be reliably used in practice.

• C. C. Li (1912–2003): His science and his spirit

• Patterns of nucleotide sequence variation in ICAM1 and TNF genes in twelve ethnic groups of India: roles of demographic history and natural selection

We have studied DNA sequence variation in and around the genes ICAM1 and TNF, which play functional and correlated roles in inflammatory processes and immune cell responses, in 12 diverse ethnic groups of India, with a view to investigating the relative roles of demographic history and natural selection in shaping the observed patterns of variation. The total numbers of single nucleotide polymorphisms (SNPs) detected at the ICAM1 and TNF loci were 29 and 12, respectively. Haplotype and allele frequencies differed significantly across populations. The site frequency spectra at these loci were significantly different from those expected under neutrality, and showed an excess of intermediate-frequency variants consistent with balancing selection. However, as expected under balancing selection, there was no significant reduction of $F_{ST}$ values compared to neutral autosomal loci. Mismatch distributions were consistent with population expansion for both loci. On the other hand, the phylogenetic network among haplotypes for the TNF locus was similar to expectations under population expansion, while that for the ICAM1 was as expected under balancing selection. Nucleotide diversity at the ICAM1 locus was an order of magnitude lower in the promoter region, compared to the introns or exons, but no such difference was noted for the TNF gene. Thus, we conclude that the pattern of nucleotide variation in these genes has been modulated by both demographic history and selection. This is not surprising in view of the known allelic associations of several polymorphisms in these genes with various diseases, both infectious and noninfectious.

• Lack of association of PTPN1 gene polymorphisms with type 2 diabetes in south Indians

• Comparative analyses of genetic risk prediction methods reveal extreme diversity of genetic predisposition to nonalcoholic fatty liver disease (NAFLD) among ethnic populations of India

Nonalcoholic fatty liver disease (NAFLD) is a distinct pathologic condition characterized by a disease spectrum ranging from simple steatosis to steato-hepatitis, cirrhosis and hepatocellular carcinoma. Prevalence of NAFLD varies in different ethnic groups, ranging from 12% in Chinese to 45% in Hispanics. Among Indian populations, the diversity in prevalence is high, ranging from 9% in rural populations to 32% in urban populations, with geographic differences as well. Here, we wished to find out if this difference is reflected in their genetic makeup. To date, several candidate genes and a few genomewide association studies (GWAS) have been carried out, and many associations between single nucleotide polymorphisms (SNPs) and NAFLD have been observed. In this study, the risk allele frequencies (RAFs) of NAFLD-associated SNPs in 20 Indian ethnic populations (376 individuals) were analysed. We used two different measures for calculating genetic risk scores and compared their performance. The correlation of additive risk scores of NAFLD for three Hapmap populations with their weighted mean prevalence was found to be high (𝑅2 = 0.93). Later we used this method to compare NAFLD risk among ethnic Indian populations. Based on our observation, the Indian caste populations have high risk scores compared to Caucasians, who are often used as surrogate and similar to Indian caste population in disease gene association studies, and is significantly higher than the Indian tribal populations.

• Detecting cognizable trends of gene expression in a time series RNA-sequencing experiment: a bootstrap approach

Study of temporal trajectory of gene expression is important. RNA sequencing is popular in genome-scale studies of transcription. Because of high expenses involved, many time-course RNA sequencing studies are challenged by inadequacy of sample sizes. This poses difficulties in conducting formal statistical tests of significance of null hypotheses. We propose a bootstrap algorithm to identify ‘cognizable’ ‘time-trends’ of gene expression. Properties of the proposed algorithm are derived using a simulation study. The proposed algorithm captured known ‘time-trends’ in the simulated data with a high probability of success, even when sample sizes were small (n<10). The proposed statistical method is efficient and robust to capture ‘cognizable’ ‘time-trends’ in RNA sequencing data.

