Über den Autor
Shizhong Xu, PhDUniversity of California, Department of Botany and Plant Sciences, Riverside, CA, USA
Part I Genetic Linkage Map1 Map Functions 1.1 Physical map and genetic map 1.2 Derivation of map functions 1.3 Haldane map function1.4 Kosambi map function 2 Recombination Fraction 2.1 Mating designs 2.2 Maximum likelihood estimation of recombination fraction2.3 Standard error and significance test2.4 Fisher's scoring algorithm for estimating 2.5 EM algorithm for estimating 3 Genetic Map Construction3.1 Criteria of optimality 3.2 Search algorithms 3.2.1 Exhaustive search 3.2.2 Heuristic search 3.2.3 Simulated annealing 3.2.4 Branch and bound 3.3 Bootstrap confidence of a map 4 Multipoint Analysis of Mendelian Loci 4.1 Joint distribution of multiple locus genotype4.1.1 BC design 4.1.2 F2 design 4.1.3 Four-way cross design 4.2 Incomplete genotype information 4.2.1 Partially informative genotype 4.2.2 BC and F2 are special cases of FW 4.2.3 Dominance and missing markers 4.3 Conditional probability of a missing marker genotype 4.4 Joint estimation of recombination fractions 4.5 Multipoint analysis for m markers 4.6 Map construction with unknown recombination fractions Part II Analysis of Quantitative Traits5 Basic Concepts of Quantitative Genetics 5.1 Gene frequency and genotype frequency 5.2 Genetic effects and genetic variance 5.3 Average effect of allelic substitution 5.4 Genetic variance components 5.5 Heritability 5.6 An F2 family is in Hardy-Weinberg equilibrium6 Major Gene Detection 6.1 Estimation of major gene effect 6.1.1 BC design 6.1.2 F2 design 6.2 Hypothesis tests 6.2.1 BC design 6.2.2 F2 design6.3 Scale of the genotype indicator variable 6.4 Statistical power 6.4.1 Type I error and statistical power 6.4.2 Wald-test statistic 6.4.3 Size of a major gene 6.4.4 Relationship between W-test and Z-test 6.4.5 Extension to dominance effect7 Segregation Analysis 7.1 Gaussian mixture distribution7.2 EM algorithm7.2.1 Closed form solution 7.2.2 EM steps7.2.3 Derivation of the EM algorithm 7.2.4 Proof of the EM algorithm 7.3 Hypothesis tests 7.4 Variances of estimated parameters7.5 Estimation of the mixing proportions8 Genome Scanning for Quantitative Trait Loci 8.1 The mouse data 8.2 Genome scanning 8.3 Missing genotypes8.4 Test statistics8.5 Bonferroni correction 8.6 Permutation test8.7 Piepho's approximate critical value 8.8 Theoretical consideration 9 Interval Mapping 9.1 Least squares method 9.2 Weighted least squares9.3 Fisher scoring9.4 Maximum likelihood method9.4.1 EM algorithm9.4.2 Variance-covariance matrix of ^¿ 9.4.3 Hypothesis test9.5 Remarks on the four methods of interval mapping 10 Interval Mapping for Ordinal Traits10.1 Generalized linear model 10.2 ML under homogeneous variance 10.3 ML under heterogeneous variance10.4 ML under mixture distribution 10.5 ML via the EM algorithm 10.6 Logistic analysis 10.7 Example 11 Mapping Segregation Distortion Loci 11.1 Probabilistic model 11.1.1 The EM Algorithm 11.1.2 Hypothesis test 11.1.3 Variance matrix of the estimated parameters11.1.4 Selection coefficient and dominance 11.2 Liability model 11.2.1 EM algorithm11.2.2 Variance matrix of estimated parameters 11.2.3 Hypothesis test 11.3 Mapping QTL under segregation distortion 11.3.1 Joint likelihood function11.3.2 EM algorithm11.3.3 Variance-covariance matrix of estimated parameters11.3.4 Hypothesis tests 11.3.5 Example 12 QTL Mapping in Other Populations 12.1 Recombinant inbred lines 12.2 Double haploids 12.3 Four-way crosses 12.4 Full-sib family 12.5 F2 population derived from outbreds 12.6 Example 13 Random Model Approach to QTL Mapping 13.1 Identity-by-descent (IBD) 13.2 Random effect genetic model 13.3 Sib-pair regression 13.4 Maximum likelihood estimation 13.4.1 EM algorithm13.4.2 EM algorithm under singular value decomposition13.4.3 Multiple siblings13.5 Estimating the IBD value for a marker13.6 Multipoint method for estimating the IBD value 13.7 Genome scanning and hypothesis tests 13.8 Multiple QTL model 13.9 Complex pedigree analysis 14 Mapping QTL for Multiple Traits 14.1 Multivariate model14.2 EM algorithm for parameter estimation 14.3 Hypothesis tests14.4 Variance matrix of estimated parameters14.5 Derivation of the EM algorithm14.6 Example 15 Bayesian Multiple QTL Mapping15.1 Bayesian regression analysis15.2 Markov chain Monte Carlo 15.3 Mapping multiple QTL15.3.1 Multiple QTL model 15.3.2 Prior, likelihood and posterior 15.3.3 Summary of the MCMC process 15.3.4 Post MCMC analysis 15.4 Alternative methods of Bayesian mapping 15.4.1 Reversible jump MCMC15.4.2 Stochastic search variable selection 15.4.3 Lasso and Bayesian Lasso15.5 Example: Arabidopsis data 16 Empirical Bayesian QTL Mapping 16.1 Classical mixed model 16.1.1 Simultaneous updating for matrix G16.1.2 Coordinate descent method16.1.3 Block coordinate descent method 16.1.4 Bayesian estimates of QTL effects 16.2 Hierarchical mixed model 16.2.1 Inverse chi-square prior 16.2.2 Exponential prior16.2.3 Dealing with sparse models16.3 Infinitesimal model for whole genome sequence data 16.3.1 Data trimming 16.3.2 Concept of continuous genome 16.4 Example: Simulated data Part III Microarray Data Analysis17 Microarray Differential Expression Analysis 17.1 Data preparation17.1.1 Data transformation17.1.2 Data normalization17.2 F-test and t-test17.3 Type I error and false discovery rate17.4 Selection of differentially expressed genes17.4.1 Permutation test17.4.2 Selecting genes by controlling FDR17.4.3 Problems of the previous methods 17.4.4 Regularized t-test 17.5 General linear model 17.5.1 Fixed model approach 17.5.2 Random model approach18 Hierarchical Clustering of Microarray Data 18.1 Distance matrix 18.2 UPGMA 18.3 Neighbor joining18.3.1 Principle of neighbor joining18.3.2 Computational algorithm18.4 Other methods18.5 Bootstrap confidence19 Model-Based Clustering of Microarray Data19.1 Cluster analysis with the K-means method19.2 Cluster analysis under Gaussian mixture19.2.1 Multivariate Gaussian distribution19.2.2 Mixture distribution 19.2.3 The EM algorithm 19.2.4 Supervised cluster analysis 19.2.5 Semi-supervised cluster analysis 19.3 Inferring the number of clusters19.4 Microarray experiments with replications20 Gene Specific Analysis of Variances 20.1 General linear model20.2 The SEM algorithm20.3 Hypothesis testing21 Factor Analysis of Microarray Data21.1 Background of factor analysis21.1.1 Linear model of latent factors 21.1.2 EM algorithm21.1.3 Number of factors 21.2 Cluster analysis21.3 Differential expression analysis21.4 MCMC algorithm22 Classification of Tissue Samples Using Microarrays22.1 Logistic regression22.2 Penalized logistic regression22.3 The coordinate descent algorithm22.4 Cross validation 22.5 Prediction of disease outcome 22.6 Multiple category classification 23 Time-Course Microarray Data Analysis 23.1 Gene expression profiles 23.2 Orthogonal polynomial 23.3 B-spline23.4 Mixed effect model23.5 Mixture mixed model23.6 EM algorithm23.7 Best linear unbiased prediction 23.8 SEM algorithm23.8.1 Monte Carlo sampling 23.8.2 SEM steps24 Quantitative Trait Associated Microarray Data Analysis 24.1 Linear association 24.1.1 Linear model 24.1.2 Cluster analysis 24.1.3 Three-cluster analysis 24.1.4 Differential expression analysis 24.2 Polynomial and B-spline24.3 Multiple trait association 25 Mapping Expression Quantitative Trait Loci 25.1 Individual marker analysis 25.1.1 SEM algorithm25.1.2 MCMC algorithm 25.2 Joint analysis of all markers 25.2.1 Multiple eQTL model25.2.2 SEM algorithm 25.2.3 MCMC algorithm25.2.4 Hierarchical evolutionary stochastic search (HESS)
Statistical genomics is a rapidly developing field, with more and more people involved in this area. However, a lack of synthetic reference books and textbooks in statistical genomics has become a major hurdle on the development of the field. Although many books have been published recently in bioinformatics, most of them emphasize DNA sequence analysis under a deterministic approach.
Principles of Statistical Genomics synthesizes the state-of-the-art statistical methodologies (stochastic approaches) applied to genome study. It facilitates understanding of the statistical models and methods behind the major bioinformatics software packages, which will help researchers choose the optimal algorithm to analyze their data and better interpret the results of their analyses. Understanding existing statistical models and algorithms assists researchers to develop improved statistical methods to extract maximum information from their data.
Resourceful and easy to use, Principles of Statistical Genomics is a comprehensive reference for researchers and graduate students studying statistical genomics.
covers microarray data analysis, which is absent in both competing books in addition to QTL mapping
introduces Bayesian method, which was not available in both competing books
uses more rigorous mathematical approaches to derive the statistical methods