Introduction to R & Setup
What is R?
R is a powerful open-source language and environment for statistical computing, data analysis, and visualization. It is widely used in academia, data science, bioinformatics, and industry.
Why R?
- Best-in-class tools for statistics and data visualization
- Thousands of specialized packages (CRAN)
- Excellent for exploratory data analysis
- Strong integration with Python (reticulate) and databases
Installation
Download R from cran.r-project.org and RStudio (recommended IDE) from posit.co.
> print("Hello, R Mastery!")
[1] "Hello, R Mastery!"
> version$version.string
[1] "R version 4.4.1 (2024-...)"
Install R and RStudio. Create a new script and run code that prints your name, today's date, and the result of 17 * 23.
Variables & Data Types
Assignment Operator
In R, we typically use <- for assignment.
x <- 42 name <- "R Student" is_valid <- TRUE
Operators & Expressions
Basic Math
5 + 10
10 / 2
2 ^ 3 # Power
Control Flow
If Statements
if (x > 10) { print("Large") } else { print("Small") }
Loops & Functions
For Loops
for (i in 1:5) { print(i) }
Data Structures & Vectors
Creating Vectors
v <- c(1, 2, 3, 4, 5) v * 2 # Vectorized operation
Data Frames & dplyr
The Pipe Operator
df %>%
filter(age > 18) %>%
select(name, salary)
Data Import & Cleaning
Learn how to read CSV and Excel files and clean messy data using tidyr.
Visualization with ggplot2
Grammar of Graphics
ggplot(data = df, aes(x = age, y = height)) +
geom_point() +
geom_smooth(method = "lm")
Statistical Analysis
Performing hypothesis testing and correlation analysis in R.
Advanced Data Manipulation
Using purrr for functional programming and stringr for text processing.
Capstone Project: Data Analysis
Perform an end-to-end data analysis on a real-world dataset, from cleaning and exploration to modeling and visualization.
# Project goals: # 1. Load and clean a dataset # 2. Perform exploratory data analysis (EDA) # 3. Create publication-quality plots # 4. Build a predictive model