STATS 102A: Introduction to Computational Statistics with R
This course is an introduction to computational statistics through numerical methods and computationally intensive methods for statistical problems. Topics include statistical graphics, root finding, simulation, randomization testing, and bootstrapping. Covers intermediate to advanced programming with R.
Motivation and Synopsis
During the twentieth century, the development of statistical computing played a crucial facilitating role for the growth of the statistics discipline and the adoption of statistical methods within the scientific community and beyond. In the twenty-first century digital age, the amounts of data available for statistical analysis has grown tremendously, yielding new opportunities for statistical computing, as well as new challenges. Statistical computing constitutes an important part of a statistics education, and is highly valuable for statisticians in both academia and industry.
This course is designed to provide the upper-division statistics student with the fundamentals of statistical computing, particularly through use of the language R.
The course is thematically split into two parts. The fist part will focus on learning the tools and the necessary skills to perform computational statistics. Students will learn intermediate to advanced R programming and usage of some of its functions and packages. The student will learn how to develop functions and packages for the management, pre-processing and analysis of statistical data. The second part of the class will focus on some foundational methods in computational statistics. This includes numerical methods such as root finding, numeric integration, and mathematical optimization. It will continue on to cover the generation of random variables, simulation, and Monte Carlo methods to answer statistical questions.
The computer is the scientific laboratory of the statistician. It plays the same role for the statistical research as the traditional laboratories play for physics and chemistry researchers. As such this course should allow the student to develop a degree of comfort and competence "in the lab.”
The primary purpose of this course is to provide students with a common set of core knowledge about statistical computing computing for their class work and research. The course will have an applied focus on tools. The course will involve the practical application of the ideas of statistical computing and their implementation through statistical software, particularly R.
Syllabus of the Course
Lecture | Topics |
1 | Introduction, the R language and eco-system |
2 | Data structures and their management |
3 | R programming and writing functions |
4 | Importing data, web scraping, manipulating data with tidyr and dpyr |
5 | Visualization and graphics (ggplot2) |
6 | Numerical methods: floating point arithmetic, root finding |
7 | Numerical methods: basic optimization |
8 | Random numbers, random variables, and simulation in R |
9 | Randomization tests, permutation tests and bootstrapping | 10 | Additional topics: Monte Carlo-integration, kernel density estimation |
A detailed description of the class is available here.