A Course in Statistics with R

A Course in Statistics with R

Tattar, Prabhanjan N.
Ramaiah, Suresh
Manjunath, B.G.

98,28 €(IVA inc.)

Integrates the theory and applications of statistics using R A Course in Statistics with R has been written to bridge the gap between theory and applications and explain how mathematical expressions are converted into R programs. The book has been primarily designed as a useful companion for a Masters student during each semester of the course, but will also help applied statisticians in revisiting the underpinnings of the subject. With this dual goal in mind, the book begins with R basics and quickly covers visualization and exploratory analysis. Probability and statistical inference, inclusive of classical, nonparametric, and Bayesian schools, is developed with definitions, motivations, mathematical expression and R programs in a way which will help the reader to understand the mathematical development as well as R implementation. Linear regression models, experimental designs, multivariate analysis, and categorical data analysis are treated in a way which makes effective use of visualization techniques and the related statistical techniques underlying them through practical applications, and hence helps the reader to achieve a clear understanding of the associated statistical models. Key features: Integrates R basics with statistical concepts Provides graphical presentations inclusive of mathematical expressions Aids understanding of limit theorems of probability with and without the simulation approach Presents detailed algorithmic development of statistical models from scratch Includes practical applications with over 50 data sets INDICE: I The Preliminaries 1 .1 Why R? 2 .1.1 Why R? 2 .1.2 R Installation 4 .1.3 There is Nothing Such as PRACTICALS 5 .1.4 Data Sets in R and Internet 6 .1.4.1 List of Web–sites Containing DATA SETS 7 .1.4.2 Antique Datasets 8 .1.5 http://cran.r–project.org 10 .1.5.1 http://r–project.org 11 .1.5.2 http://www.cran.r–project.org/web/views/ 11 .1.5.3 Is subscribing to R–Mailing List useful? 12 .1.6 R and Its Interface with Other Software 12 .1.7 help and/or ? 13 .1.8 R Books 14 .1.9 A Road Map 15 .2 The R Basics 18 .2.1 Introduction 18 .2.2 Simple Arithmetics and a Little Beyond 19 .2.2.1 Absolute Values, Remainders, etc 20 .2.2.2 round, floor, etc 21 .2.2.3 Summary Functions 21 .2.2.4 Trigonometric Functions 22 .2.2.5 Complex Numbers? 23 .2.2.6 Special Mathematical Functions 25 .2.3 Some Basic R Functions 27 .2.3.1 Summary Statistics 27 .2.3.2 is, as, is.na, etc 29 .2.3.3 factors, levels, etc 31 .2.3.4 Control Programming 32 .2.3.5 Other Useful Functions 34 .2.3.6 Calculus? 37 .2.4 Vectors and Matrices in R 38 .2.4.1 Vectors 39 .2.4.2 Matrices 43 .2.5 Data Entering and Reading from Files 48 .2.5.1 Data Entering 48 .2.5.2 Reading Data from External Files 51 .2.6 Working with Packages 52 .2.7 R Session Management 54 .2.8 Bibliography 54 .2.9 Complements, Problems, and Programs 55 .3 Data Preparation and Other Tricks 57 .3.1 Introduction 57 .3.2 Manipulation with Complex Format Files 58 .3.3 Reading Datasets of Foreign Formats 64 .3.4 Displaying R Objects 65 .3.5 Manipulation Using R Functions 66 .3.6 Working with Time and Date 68 .3.7 Text Manipulations 71 .3.8 Scripts and Text Editors for R 73 .3.8.1 Text Editors for Linuxians 74 .3.9 Bibliography 75 .3.10 Complements, Problems, and Programs 75 .4 Exploratory Data Analysis 77 .4.1 Introduction: The Tukey s School of Statistics 77 .4.2 Essential Summaries of EDA 78 .4.3 Graphical Techniques in EDA 81 .4.3.1 Boxplot 81 .4.3.2 Histogram 86 .4.3.3 Histogram Extensions and the Rootogram 90 .4.3.4 Pareto Chart 93 .4.3.5 Stem–and–Leaf Plot 95 .4.3.6 Run Chart 100 .4.3.7 Scatter Plot 101 .4.4 Quantitative Techniques in EDA 103 .4.4.1 Trimean 104 .4.4.2 Letter Values 105 .4.5 Exploratory Regression Models 107 .4.5.1 Resistant Line 108 .4.5.2 Median Polish 110 .4.6 Bibliography 113 .4.7 Complements, Problems, and Programs 114 .II Probability and Inference 116 .5 Probability Theory 117 .5.1 Introduction 117 .5.2 Sample Space, Set Algebra, and Elementary Probability 118 .5.3 Counting Methods 127 .5.3.1 Sampling: The DiverseWays 128 .5.3.2 The Binomial Coefficients and the Pascals Triangle 132 .5.3.3 Some Problems Based on Combinatorics 133 .5.4 Probability: A Definition 137 .5.4.1 The Prerequisites 137 .5.4.2 The Kolmogorov Definition 142 .5.5 Conditional Probability and Independence 146 .5.6 Bayes Formula 147 .5.7 Random Variables, Expectations, and Moments 149 .5.7.1 The Definition 149 .5.7.2 Expectation of Random Variables 153 .5.8 Distribution Function, Characteristic Function, and Moment Generation Function 159 .5.9 Inequalities 162 .5.9.1 The Markov Inequality 162 .5.9.2 The Jensen s Inequality 163 .5.9.3 The Chebyshev Inequality 163 .5.10 Convergence of Random Variables 164 .5.10.1 Convergence in Distributions 165 .5.10.2 Convergence in Probability 167 .5.10.3 Convergence in rth Mean 168 .5.10.4 Almost Sure Convergence 169 .5.11 The Law of Large Numbers 170 .5.11.1 The Weak Law of Large Numbers 170 .5.12 The Central Limit Theorem 172 .5.12.1 The de Moivre Laplace Central Limit Theorem 172 .5.12.2 CLT for iid Case 173 .5.12.3 The Lindeberg–Feller CLT 175 .5.12.4 The Liapounov CLT 181 .5.13 Bibliography 184 .5.13.1 Intuitive, Elementary, and First Course Source 184 .5.13.2 The Classics and Second Course Source 184 .5.13.3 The Problem Books 185 .5.13.4 Other Useful Source 185 .5.13.5 R for Probability 185 .5.14 Complements, Problems, and Programs 186 .6 Probability and Sampling Distributions 188 .6.1 Introduction 188 .6.2 Discrete Univariate Distributions 189 .6.2.1 The Discrete Uniform Distribution 189 .6.2.2 The Binomial Distribution 190 .6.2.3 The Geometric Distribution 193 .6.2.4 The Negative Binomial Distribution 195 .6.2.5 Poisson Distribution 197 .6.2.6 The Hypergeometric Distribution 200 .6.3 Continuous Univariate Distributions 201 .6.3.1 The Uniform Distribution 201 .6.3.2 The Beta Distribution 204 .6.3.3 The Exponential Distribution 205 .6.3.4 The Gamma Distribution 206 .6.3.5 The Normal Distribution 207 .6.3.6 The Cauchy Distribution 210 .6.3.7 The t–Distribution 211 .6.3.8 The Chi–square Distribution 211 .6.3.9 The F–Distribution 212 .6.4 Multivariate Probability Distributions 212 .6.4.1 The Multinomial Distribution 213 .6.4.2 Dirichlet Distribution 213 .6.4.3 The Multivariate Normal Distribution 214 .6.4.4 The Multivariate t Distribution 214 .6.5 Populations and Samples 215 .6.6 Sampling from the Normal Distributions 216 .6.7 Some Finer Aspects of Sampling Distributions 219 .6.7.1 Sampling Distribution of Median 219 .6.7.2 Sampling Distribution of Mean of Standard Distributions 221 .6.8 Multivariate Sampling Distributions 222 .6.8.1 Noncentral Univariate Chi–square, t, and F Distributions223 .6.8.2 Wishart Distribution 225 .6.8.3 Hotellings T2 Distribution 226 .6.9 Bayesian Sampling Distributions 226 .6.10 Bibliography 228 .6.11 Complements, Problems, and Programs 228 .7 Parametric Inference 230 .7.1 Introduction 230 .7.2 Families of Distribution 232 .7.2.1 The Exponential Family 234 .7.2.2 Pitman Family 235 .7.3 Loss Functions 236 .7.4 Data Reduction 239 .7.4.1 Sufficiency 239 .7.4.2 Minimal Sufficiency 242 .7.5 Likelihood and Information 244 .7.5.1 The Likelihood Principle 244 .7.5.2 The Fisher Information 250 .7.6 Point Estimation 255 .7.6.1 Maximum Likelihood Estimation 255 .7.6.2 Method of Moments Estimator 264 .7.7 Comparison of Estimators 266 .7.7.1 Unbiased Estimators 266 .7.7.2 Improving Unbiased Estimators 269 .7.8 Confidence Intervals 271 .7.9 Testing Statistical Hypotheses – The Preliminaries 272 .7.10 The Neyman–Pearson Lemma 277 .7.11 Uniformly Most Powerful Tests 283 .7.12 Uniformly Most Powerful Unbiased Tests 288 .7.12.1 Tests for the Means: One– and Two– Sample t–Test 291 .7.13 Likelihood Ratio Tests 293 .7.13.1 Normal Distribution: One–Sample Problems 294 .7.13.2 Normal Distribution: Two–Sample Problem for the Mean297 .7.14 Behrens–Fisher Problem 298 .7.15 Multiple Comparison Tests 300 .7.15.1 Bonferroni s Method 301 .7.15.2 Holm s Method 302 .7.16 The EM Algorithm ? 303 .7.16.1 Introduction 303 .7.16.2 The Algorithm 304 .7.16.3 Introductory Applications 305 .7.17 Bibliography 311 .7.17.1 Early Classics 311 .7.17.2 Texts From the Last 30 Years 311 .7.18 Complements, Problems, and Programs 312 .8 Nonparametric Inference 314 .8.1 Introduction 314 .8.2 Empirical Distribution Function and Its Applications 314 .8.2.1 Statistical Functionals 317 .8.3 The Jackknife and Bootstrap Methods 319 .8.3.1 The Jackknife 320 .8.3.2 The Bootstrap 321 .8.3.3 Bootstrapping Simple Linear Model? 324 .8.4 Nonparametric Smoothing 326 .8.4.1 Histogram Smoothing 327 .8.4.2 Kernel Smoothing 330 .8.4.3 Nonparametric Regression Models? 334 .8.5 Nonparametric Tests 339 .8.5.1 The Wilcoxon Signed–Ranks Test 339 .8.5.2 The Mann–Whitney test 343 .8.5.3 The Siegel–Tukey Test 344 .8.5.4 The Wald–Wolfowitz Run Test 347 .8.5.5 The Kolmogorov–Smirnov Test 348 .8.5.6 Kruskal–Wallis Test? 350 .8.6 Bibliography 352 .8.7 Complements, Problems, and Programs 352 .9 Bayesian Inference 354 .9.1 Introduction 354 .9.2 Bayesian Probabilities 354 .9.3 The Bayesian Paradigm for Statistical Inference 358 .9.3.1 Bayesian Sufficiency and the Principle 359 .9.3.2 Bayesian Analysis and Likelihood Principle 360 .9.3.3 Informative and Conjugate Prior 360 .9.3.4 Noninformative Prior 361 .9.4 Bayesian Estimation 361 .9.4.1 Inference for Binomial Distribution 361 .9.4.2 Inference for the Poisson Distribution 365 .9.4.3 Inference for Uniform Distribution 366 .9.4.4 Inference for Exponential Distribution 368 .9.4.5 Inference for Normal Distributions 369 .9.5 The Credible Intervals 371 .9.6 Bayes Factors for Testing Problems 373 .9.7 Bibliography 374 .9.8 Complements, Problems, and Programs 375 .III Stochastic Processes and Monte Carlo 376 .10 Stochastic Processes 377 .10.1 Introduction 377 .10.2 Kolmogorov s Consistency Theorem 378 .10.3 Markov Chains 380 .10.3.1 The m–Step TPM 382 .10.3.2 Classification of States 383 .10.3.3 Canonical Decomposition of an Absorbing Markov Chain 387 .10.3.4 Stationary Distribution and Mean First Passage Time of an Ergodic Markov Chain 390 .10.3.5 Time Reversible Markov Chain 391 .10.4 Application of Markov Chains in Computational Statistics 392 .10.4.1 The Metropolis–Hastings Algorithm 393 .10.4.2 Gibbs Sampler 395 .10.4.3 Illustrative Examples 395 .10.5 Bibliography 403 .10.6 Complements, Problems, and Programs 403 .11 Monte Carlo Computations 404 .11.1 Introduction 404 .11.2 Generating the (Pseudo–) Random Numbers 405 .11.2.1 Useful Random Generators 405 .11.2.2 Probability Through Simulation 408 .11.3 Simulation from Probability Distributions and Some Limit Theorems  415 .11.3.1 Simulation from Discrete Distributions 415 .11.3.2 Simulation from Continuous Distributions 424 .11.3.3 Understanding Limit Theorems Through Simulation.426 .11.3.4 Understanding The Central Limit Theorem 429 .11.4 Monte Carlo Integration 431 .11.5 The Accept–Reject Technique 433 .11.6 Application to Bayesian Inference 438 .11.7 Bibliography 441 .11.8 Complements, Problems, and Programs 441 .IV Linear Models 443 .12 Linear Regression Models 444 .12.1 Introduction 444 .12.2 Simple Linear Regression Model 445 .12.2.1 Fitting a Linear Model 447 .12.2.2 Confidence Intervals 449 .12.2.3 The Analysis of Variance (ANOVA) 452 .12.2.4 The Coefficient of Determination 453 .12.2.5 The lm Function from R 454 .12.2.6 Residuals for Validation of the Model Assumptions 456 .12.2.7 Prediction for the Simple Regression Model 461 .12.2.8 Regression Through the Origin 462 .12.3 The Anscombe Warnings and Regression Abuse 464 .12.4 Multiple Linear Regression Model 467 .12.4.1 Scatter Plots: A First Look 469 .12.4.2 Other Useful Graphical Methods 469 .12.4.3 Fitting a Multiple Linear Regression Model 473 .12.4.4 Testing Hypotheses and Confidence Intervals 475 .12.5 Model Diagnostics for the Multiple Regression Model 480 .12.5.1 Residuals 480 .12.5.2 Influence and Leverage Diagnostics 483 .12.6 Multicollinearity 488 .12.6.1 Variance Inflation Factor 489 .12.6.2 Eigen System Analysis 491 .12.7 Data Transformations 493 .12.7.1 Linearization 493 .12.7.2 Variance Stabilization 495 .12.7.3 Power Transformation 497 .12.8 Model Selection 499 .12.8.1 Backward Elimination 501 .12.8.2 Forward and Stepwise Selection 505 .12.9 Bibliography 507 .12.9.1 Early Classics 507 .12.9.2 Industrial Applications 507 .12.9.3 Regression Details 507 .12.9.4 Modern Regression Texts 507 .12.9.5 R for Regression 508 .12.10Complements, Problems, and Programs 508 .13 Experimental Designs 510 .13.1 Introduction 510 .13.2 Principles of Experimental Design 510 .13.3 Completely Randomized Designs 512 .13.3.1 The CRD Model 512 .13.3.2 Randomization in CRD 513 .13.3.3 Inference for the CRD Models 515 .13.3.4 Validation of Model Assumptions 520 .13.3.5 Contrasts and Multiple Testing for the CRD Model 522 .13.4 Block Designs 527 .13.4.1 Randomization and Analysis of Balanced Block Designs527 .13.4.2 Incomplete Block Designs 532 .13.4.3 Latin Square Design 534 .13.4.4 Graeco Latin Square Design 538 .13.5 Factorial Designs 542 .13.5.1 Two Factorial Experiment 543 .13.5.2 Three Factorial Experiment 548 .13.5.3 Blocking in Factorial Experiments 554 .13.6 Bibliography 556 .13.7 Complements, Problems, and Programs 556 .14 Multivariate Statistical Analysis – I 558 .14.1 Introduction 558 .14.2 Graphical Plots for Multivariate Data 559 .14.3 Definitions, Notations, and Summary Statistics for Multivariate Data 562 .14.3.1 Definitions and Data Visualization 562 .14.3.2 Early Outlier Detection 568 .14.4 Testing for Mean Vectors : One Sample 570 .14.4.1 Testing for Mean Vector with Known Variance–Covariance Matrix 571 .14.4.2 Testing for Mean Vectors with Unknown Variance–covariance Matrix 572 .14.5 Testing for Mean Vectors : Two–Samples 574 .14.6 Multivariate Analysis of Variance 577 .14.6.1 Wilks Test Statistic 578 .14.6.2 Roy s Test 580 .14.6.3 Pillai s Test Statistic 581 .14.6.4 The Lawley Hotelling Test Statistic 581 .14.7 Testing for Variance–Covariance Matrix: One Sample 583 .14.7.1 Testing for Sphericity 584 .14.8 Testing for Variance–Covariance Matrix: k–Samples 586 .14.9 Testing for Independence of Sub–vectors 589 .14.10Bibliography 592 .14.11Complements, Problems, and Programs 592 .15 Multivariate Statistical Analysis – II 594 .15.1 Introduction 594 .15.2 Classification and Discriminant Analysis 594 .15.2.1 Discrimination Analysis 595 .15.2.2 Classification 596 .15.3 Canonical Correlations 598 .15.4 Principal Component Analysis – Theory and Illustration 601 .15.4.1 The Theory 602 .15.4.2 Illustration Through a Data Set 604 .15.5 Applications of Principal Component Analysis 608 .15.5.1 PCA for Linear Regression 608 .15.5.2 Biplots 611 .15.6 Factor Analysis 615 .15.6.1 The Orthogonal Factor Analysis Model 616 .15.6.2 Estimation of Loadings and Communalities 618 .15.7 Bibliography 624 .15.7.1 The Classics and Applied Perspectives 624 .15.7.2 Multivariate Analysis and Software 625 .15.8 Complements, Problems, and Programs 626 .16 Categorical Data Analysis 627 .16.1 Introduction 627 .16.2 Graphical Methods for CDA 628 .16.2.1 Bar and Stacked Bar Plots 628 .16.2.2 Spine Plots 632 .16.2.3 Mosaic Plots 634 .16.2.4 Pie Charts and Dot Charts 636 .16.2.5 Four Fold Plots 639 .16.3 The Odds Ratio 640 .16.4 The Simpson s Paradox 644 .16.5 The Binomial, Multinomial, and Poisson Models 645 .16.5.1 The Binomial Model 645 .16.5.2 The Multinomial Model 646 .16.5.3 The Poisson Model 648 .16.6 The Problem of Overdispersion 649 .16.7 The c2– Tests of Independence 650 .16.8 Bibliography 652 .16.9 Complements, Problems, and Programs 652 .17 Generalized Linear Models 653 .17.1 Introduction 653 .17.2 Regression Problems in Count/Discrete Data 654 .17.3 Exponential Family and the GLM 657 .17.4 The Logistic Regression Model 658 .17.5 Inference for the Logistic Regression Model 660 .17.5.1 Estimation of the Regression Coefficients and Related Parameters 660 .17.5.2 Estimation of the Variance–Covariance Matrix of b 664 .17.5.3 Confidence Intervals and Hypotheses Testing for the Regression Coefficients 665 .17.5.4 Residuals for the Logistic Regression Model 666 .17.5.5 Deviance Test and Hosmer–Lemeshow Goodness–of–Fit Test669 .17.6 Model Selection in Logistic Regression Models 671 .17.7 Probit Regression 678 .17.8 Poisson Regression Model 682 .17.9 Bibliography 686 .17.10Complements, Problems, and Programs 687 .Appendix A Open Source Software – An Epilogue 689 .Appendix B The Statistical Tables 693 .Bibliography 694 .Author Index 712 .Subject Index 718 .R Codes 729

  • ISBN: 978-1-119-15272-9
  • Editorial: Wiley–Blackwell
  • Encuadernacion: Cartoné
  • Páginas: 768
  • Fecha Publicación: 22/04/2016
  • Nº Volúmenes: 1
  • Idioma: Inglés