Back

Hands-on Virtual Workshop on R for Data Science & Machine Learning

We will cover the sections
Probability Distributions & Inferential Statistics
in the session Oct – Dec, 21 (Batch-2)


Section Title: Probability Distributions


Day-1: Environment Setup

  1. Difference between R and RStudio
    1. What is R?
    2. What is RStudio?
  2. Why should you use RStudio?
  3. Uninstalling the older version of R 
  4. Why should you install R before installing RStudio?
  5. Uninstalling the older version of RStudio
  6. Downloading R from https://www.r-project.org/
  7. Installing R
  8. Downloading RStudio from https://www.rstudio.com/
  9. Installing RStudio
  10. Introducing a cloud drive for uploading classwork & homework
  11. Solving learners problems
  12. Saving the classwork on Cloud Drive
  13. Collecting the classwork & homework links
  14. Assigning the handout of the day
  15. Recording the attendance of the learners

Day-2: An Introduction to the For Loop

  1. Understanding the for loop
    1. Vectors and variables in R
    2. General Concept of the For Loop
    3. Storing the values resulted from a for loop
  2. Collecting the Day-1 handout & Assigning the Day-2 handout
  3. Solving learners problems
  4. Checking the classwork/homework of the learners
  5. Taking the attendance of the learners

Day-3: Central limit theorem demonstration using R (Part-1)

  1. What is uniform distribution?
  2. Introduction to the Central Limit Theorem
  3. Creating a uniform distribution of 10000 numbers
  4. Creating a vector of containing a uniform distribution of 10000 numbers
  5. Creating a histogram for the 10000 numbers
  6. Calculating mean & standard deviation for the numbers
  7. Understanding standard deviation
  8. Understanding standard error of the mean
  9. Taking a specific number of samples from the population
  10. Calculating the mean of the samples
  11. Generating 1000 sample means using the “For Loop”
  12. Creating a histogram for the 1000 sample means
  13. Collecting the Day-2 handout & Assigning the Day-3 handout
  14. Solving learners problems
  15. Checking the classwork/homework of the learners
  16. Taking the attendance of the learners

Day-4: Central limit theorem demonstration using R (Part-2)

  1. Explaining the central limit theorem gradually increasing the sample size
  2. Plotting a 2 by 2 histogram for summerizing the whole thing
  3. Adding a density curve over the histogram
  4. Collecting the Day-3 handout & Assigning the Day-4 handout
  5. Solving learners problems
  6. Checking the classwork/homework of the learners
  7. Taking the attendance of the learners

Day-5: Some Important Concepts of a Normal Probability Distribution

  1. Some important characteristics of normal probability distribution curve
  2. How a normal distribution curve is defined by the mean and the standard deviation?
  3. Some more properties of a normal probability distribution
  4. Equation for the normal distribution curve
  5. What is Z score?
  6. Introducing the Z table for standard normal distribution
  7. Collecting the Day-4 handout & Assigning the Day-5 handout
  8. Solving learners problems
  9. Checking the classwork/homework of the learners
  10. Taking the attendance of the learners

Day-6: Four Important Functions for Normal Probability Distribution

  1. Understanding the function rnorm() and it’s R implementation
  2. Understanding the function pnorm() and it’s R implementation
  3. Understanding the function qnorm() and it’s R implementation
  4. Understanding the function dnorm() and it’s R implementation
  5. Plotting normal distribution using R functions
  6. Saving the classwork
  7. Collecting the Day-5 handout & Assigning the Day-6 handout
  8. Solving learners problems
  9. Checking the classwork/homework of the learners
  10. Taking the attendance of the learners

Day-7: Binomial Probability Distribution (Theory Part)

  1. Difference between continuous and discrete data
  2. Probability distribution for the discrete data
  3. Properties of binomial distribution
  4. Probability distribution of flipping two coins at the same time
  5. Formula for binomial distribution
    1. What is the probability of getting 1 head when you flip a coin 10 times?
  6. Collecting the Day-6 handout & Assigning the Day-7 handout
  7. Solving learners problems
  8. Checking the classwork/homework of the learners
  9. Taking the attendance of the learners

Day-8: Introducing the “Visualize” Package for the Normal Probability Distribution

  1. What is a R package?
  2. Installation of the “visualize” package
  3. Importing a package
  4. Understanding the “visualize” package for the standard normal distribution
  5. Understanding the “section” argument for the standard normal distribution
  6. Understanding the the visualize package for a different normal distribution
  7. Understanding the “section” argument for a different normal distribution
  8. Collecting the Day-7 handout & Assigning the Day-8 handout
  9. Solving learners problems
  10. Checking the classwork/homework of the learners
  11. Taking the attendance of the learners

Day-9: Four Important Functions for Binomial Probability Distribution

  1. Understanding the function rbinom() and it’s R implementation
  2. Understanding the function pbinom() and it’s R implementation
  3. Understanding the function qbinom() and it’s R implementation
  4. Understanding the function dbinom() and it’s R implementation
  5. Collecting the Day-8 handout & Assigning the Day-9 handout
  6. Solving learners problems
  7. Checking the classwork/homework of the learners
  8. Taking the attendance of the learners

Day-10: Visualization of the Binomial Probability Distribution

  1. Remembering the dbinom() function
  2. Plotting Binomial Distribution using R functions
  3. Visualization of the probability of getting 5 or less numbers of heads after tossing a coin 10 times
  4. Binomial Distribution visualization using the visualize package
  5. Collecting the Day-9 handout & Assigning the Day-10 handout
  6. Solving learners problems
  7. Checking the classwork/homework of the learners
  8. Taking the attendance of the learners

Day-11: Poisson Probability Distribution

  1. Properties of Poisson Distribution
  2. Example of Poisson Distribution: On a booking counter on the average 3.6 people come every 10 minute on weekends. What is the probability of getting 7 people in 10 minutes?
  3. Solving the problem by using the formula of the poisson distribution
  4. Solving the problem using the dpois() function
  5. Understanding the ppois() function
  6. Collecting the Day-10 handout & Assigning the Day-11 handout
  7. Solving learners problems
  8. Checking the classwork/homework of the learners
  9. Taking the attendance of the learners

Day-12: Plotting & Visualization of the Poisson Distribution

  1. Understanding the problem: On a booking counter on the average 3.6 people come every 10 minute on weekends. What is the probability of getting 7 people in 10 minutes?
  2. Solving the problem using the dpois() function
  3. Recovery of the ppois() function
  4. Plotting Poisson Distribution with R functions
    • Creating a vector x for a specific sequence of numbers
    • Creating a probability vector y using dpois() for the vector x
    • Creating a barplot for x
    • Creating a plot for x and y
  5. Installing and loading the “visualize” package
  6. Visualization of the Poisson Distribution
    • Visualization of the probability of getting a specific number and less than of the number (section = “lower”)
    • Visualization of the probability of getting a specific number and more than of the number (section = “upper”)
    • Visualization of the probability of getting less than a specific number and more than of another specific number (section = “tails”)
    • Visualization of the probability of getting between two specific numbers (section = “bounded”)
  7. Collecting the Day-11 handout & Assigning the Day-12 handout
  8. Solving learners problems
  9. Checking the classwork/homework of the learners
  10. Taking the attendance of the learners

Section Title: Inferential Statistics


  1. One-Sample t-Test in R
  2. Two-Sample t-Test in R
  3. Mann Whitney U aka Wilcoxon Rank-Sum Test in R
  4. Bootstrap Hypothesis Testing in R
  5. Bootstrap Confidence Interval with R
  6. Permutation Hypothesis Test in R
  7. Paired t-Test in R
  8. Wilcoxon Signed-Rank Test in R
  9. ANOVA, Multiple Comparisons & Kruskal Wallis in R
  10. Chi-Square Test, Fishers Exact Test, and Cross Tabulations in R
  11. Calculate Odds Ratio and Relative Risk in R
  12. Correlation and Covariance in R

Section Title: R Programming Basics


Day – 1: Environment Setup

  1. Difference between R and RStudio
    1. What is R?
    2. What is RStudio?
  2. Why should you use RStudio?
  3. Uninstalling the older version of R 
  4. Why should you install R before installing RStudio?
  5. Uninstalling the older version of RStudio
  6. Downloading R from https://www.r-project.org/
  7. Installing R
  8. Downloading RStudio from https://www.rstudio.com/
  9. Installing RStudio
  10. Handout issue
  11. Uploading the classwork on Cloud Drive

Day – 2: Basic Arithmetic Functions & Coding

  1. Assigning a value to an object in R
  2. Printing a value of an object in R
  3. Case sensitivity
  4. Overwriting a value to an object
  5. R workspace memory
  6. Observing the list of the objects using the ls() command
  7. Removing an object from the workspace memory using the rm() command
  8. Object naming rules: https://www.w3schools.com/r/r_variables.asp
  9. Assigning character values to objects
  10. Performing arithmetic operations in R: addition, subtraction, multiplication, division, square, square root, log, exponent, log of the other bases
  11. Calculating absolute value using abs() command
  12. Incomplete commands in R
  13. Accessing the previously entered commands using the Arrow Keys
  14. Writing notes or comments in R
  15. Uploading the classwork on Cloud Drive
  16. Assigning the Day-2 handout
  17. Solving learners problems

Day – 3: Creating Vectors, Matrices, & Performing Some Simple Operations on Them

  1. Clearing the console of RStudio
  2. Creating a vector of numbers using the c() command
  3. Creating a vector of character elements using the c() command
  4. Creating a sequence of integer values
  5. Creating a sequence of integer/noninteger values using the seq() command
  6. Creating a vector of repeated numbers or characters using the rep() command
  7. Repeating a sequence of integer values multiple times
  8. Repeating a sequence of noninteger values multiple times
  9. Repeating a sequence of characters multiple times
  10. Adding/subtracting/multiplying/dividing a value to each element of a vector
  11. Extracting elements from a vector
  12. Creating a matrix of values
  13. Storing a matrix in an object
  14. Extracting elements from a matrix
  15. Adding/subtracting/multiplying/dividing a value to each element of a matrix 
  16. Saving all the console inputs in a text file
  17. Uploading the classwork on Cloud Drive
  18. Assigning the Day-3 handout
  19. Solving learners problems

Day – 4: Importing & Copying Data from Excel to R

  1. Downloading the necessary resources
  2. Importing a CSV file to RStudio 
    1. Using the read.csv() command
    2. Using the read.table() command
  3. Importing a tab-delimited text file to RStudio
    1. Using the read.delim() command
    2. Using the read.table() command
  4. Import/Read Data from Excel (both xls and xlsx formats) into R using RStudio (readxl package)
  5. Solving learners problems
  6. Uploading the classwork on Cloud Drive
  7. Assigning the Day-4 handout

Day – 5: Checking the Imported Data & Working with Variables

  1. Downloading the necessary resources
  2. Importing a dataset
  3. Understanding the dataset
  4. Checking the dimensions of the dataset using the dim() command
  5. Observing the first 6 rows using the head() command
  6. Observing the last 6 rows using the tail() command
  7. Observing the other rows of the dataset using square brackets
  8. Observing the variable names using the names() command
  9. Extracting a variable from a dataset
  10. Attaching a dataset in the workspace memory using the attach() command
  11. Unattaching a dataset using the detach() command
  12. Checking the variable type using the class() command
  13. Observing the categories of a variable using the levels() command
  14. Converting a character type variable to a factor type variable using the as.factor() command
  15. Observing the general summary of the dataset using the summary() command
  16. Changing the data type of a variable while importing a dataset
  17. Uploading the classwork on Cloud Drive
  18. Assigning the Day-5 handout

Day – 6: Subsetting Data Based on Conditions & Logical Statements

  1. Downloading the necessary resources
  2. Observing the number of observations in an object or variable using the length() command
  3. Subsetting data using square brackets for a single variable
  4. Subsetting data for a variable for other variables (Calculating mean age only for male or female)
  5. Creating an object for specific categories
  6. Creating an object subsetting data from two variables (Creating a data frame for the over 15 years old females/males)
  7. Creating a logical vector or variable
  8. Creating a logical vector or variable using the as.numeric() command
  9. Creating a logical vector for multiple conditions 
  10. Attaching a logical vector in a column-wise fashion to the original dataset using the cbind() command
  11. Uploading the classwork on Cloud Drive
  12. Assigning the Day-6 handout

Day – 7: Setting Up a Working Directory

  1. Downloading the necessary resources
  2. Observing the current working directory using the getwd() command
  3. Changing the current working directory using the setwd() command
  4. Changing the current working directory from the RStudio menu
  5. Saving the current workspace using the save.image() command
  6. Saving the current workspace from the RStudio menu
  7. Clearing workspace from the RStudio menu
  8. Loading the previous workspace image using the load() command
  9. Loading the previous workspace image using the RStudio menu
  10. Uploading the classwork on Cloud Drive
  11. Assigning the Day-7 handout

Day – 8: History, Scripts, & Installing Packages

  1. Downloading the necessary resources
  2. Loading history from an existing file
  3. Sending commands from history to console
  4. Sending commands from history to script
  5. Removing the selected history entries
  6. Clearing all history entries
  7. Creating, opening, and saving R scripts
  8. Running commands from R script
  9. Installing a new package using the install.packages() command
  10. Removing a package using the remove.packages() command
  11. Installing/Removing packages from the RStudio menu
  12. Uploading the classwork on Cloud Drive
  13. Assigning the Day-8 handout
  14. Solving learners problems

Day – 9: Customizing the Look of RStudio & Introducing Apply Function

  1. Downloading the necessary resources
  2. Changing the default working directory
  3. Changing the appearance of RStudio
  4. Customizing the pane layout
  5. Changing the primary CRAN repository
  6. Introducing the apply function
  7. Uploading the classwork on Cloud Drive
  8. Assigning the Day-9 handout
  9. Solving learners problems

Day – 10: More with APPLY Function

  1. Downloading the necessary resources
  2. Calculating percentiles for each column
  3. Creating a plot of each column using a line
  4. Calculating the SUM of each row
  5. Calculating the SUM of each row using the rowSums() command
  6. Creating a plot against the market value of each day
  7. Adding some nice colored points to the plots
  8. Uploading the classwork on Cloud Drive
  9. Assigning the Day-10 handout
  10. Solving learners problems

Day – 11: tapply() Function

  1. Downloading the necessary resources
  2. How to use the tapply() function
  3. How to use tapply() function to subsets of a variable or vector
  4. Use of the simplify argument in tapply() function
  5. Including the summary function in the tapply() function
  6. Applying the quantile function in the tapply() function
  7. Passing a list of factors to the INDEX argument of the tapply() function
  8. Uploading the classwork on Cloud Drive
  9. Assigning the Day-11 handout
  10. Solving learners problems

Day – 12: R Data Frames

  1. Downloading the necessary resources
  2. Introduction to R Data Frames
  3. Learn the Basics of Data Frames in R
  4. Learn how to grab data from a Dataframe in R
  5. Get an overview of the variety of operations you can use on a Data Frame in R
  6. Data Frame Training Exercise
  7. Uploading the classwork on Cloud Drive
  8. Assigning the Day handout
  9. Solving learners problems

Section Title: Data Visualization and Descriptive Statistics with R


Day – 1: Making Barcharts & Piecharts

  1. Downloading the necessary resources
  2. What is a bar chart?
  3. Creating a frequency table using the table() command
  4. Calculating the relative frequency of the categories of a categorical variable
  5. Producing a bar chart using the barplot() command
  6. Adding a title and labels to a bar chart
  7. Rotating the values of the y-axis using the las argument
  8. Changing the labels of the bars in a bar chart using the names.arg argument
  9. Rotating the bar chart horizontally using the horiz argument
  10. Producing a pie chart using the pie() command
  11. Adding a title to the pie chart using the main argument
  12. Adding a box around the pie chart using the box() command
  13. Uploading the classwork on Cloud Drive
  14. Assigning the Day-12 handout
  15. Solving learners problems

Day – 2: Making Boxplots & Stratified Boxplots

  1. Downloading the necessary resources
  2. Producing a boxplot using the boxplot() command
  3. Understanding the boxplot using the quantile() command
  4. Adding labels to the boxplot
  5. Setting limits to the y-axis of the boxplot using the ylim argument
  6. Rotating the values of the y-axis using the las command
  7. Comparing groups of a categorical variable using boxplots
  8. Producing a boxplot for one or more groups of a categorical variable
  9. Creating stratified boxplots
  10. Uploading the classwork on Cloud Drive
  11. Assigning the Day-13 handout
  12. Solving learners problems

Day – 3: Histograms in R

  1. Downloading the necessary resources
  2. Understanding the hist() command from the help menu
  3. Producing a histogram using the hist() command
  4. Changing the histogram plot from the default values
  5. Converting the y-axis from frequency to probability density
  6. An alternative way of converting the y-axis from frequency to probability density
  7. Changing the y-axis limit using the ylim argument
  8. Changing the x-axis limit using the xlim argument
  9. Changing the bin width using the breaks argument
  10. Specifying the breakpoints using the breaks argument
  11. Using the seq() command in the breaks argument
  12. Changing the axis labels and title of the histogram
  13. Rotating the y-axis labels using the las argument
  14. Adding a density curve over the histogram using the lines command
  15. Changing the color and thickness of the density curve using the col and lwd arguments respectively
  16. Uploading the classwork on Cloud Drive
  17. Assigning the Day-14 handout
  18. Solving learners problems

Day – 4: Making Stacked Barcharts, Clustered Barcharts, & Scatterplots

  1. Downloading the necessary resources
  2. Graphically examining the relationship between two categorical variables using barplots
  3. Producing a contingency table before producing a barchart using the table() command
  4. Producing a barplot using the created contingency table
  5. Transforming a stacked barchart to a clustered barchart using the beside argument
  6. Adding legends to a barplot using the legend.text argument
  7. Transforming legends from their defaults
  8. Changing the defaults of a barchart
  9. Running a Pearson Correlation for getting an idea about the strength of the linear relationship between two numeric variables
  10. Creating a scatter plot using the plot command
  11. Changing the defaults of a scatterplot
  12. Resizing the plotting dots from the defaults using the cex argument
  13. Changing the plotting characters using the pch argument
  14. Adding a linear regression line in a scatterplot
  15. Solving learners problems
  16. Uploading the classwork on Cloud Drive
  17. Assigning the Day-15 handout
  18. Taking learners attendance

Day – 5: Producing Numeric Summaries for Specific Variables

  1. Summarizing a categorical variable
    1. Creating a frequency table for a categorical variable using the table() command
    2. Observing the proportions of the categories of a categorical variable
    3. Producing a contingency table for two variables
  2. Summarizing a numeric variable
    1. Calculating arithmetic mean using the mean() command
    2. Calculating trim means using the trim argument
    3. Calculating the median using the median() command
    4. Calculating the variance of a variable using the var() command
    5. Calculating the standard deviation of a variable using the sd() command
    6. Calculating the minimum observation using the min() command
    7. Calculating the maximum observation using the max() command
    8. Calculating the range of a variable using the range command
    9. Calculating percentile using the quantile() command
    10. Calculating the summation of all the observed values of a variable using the sum() command
    11. Calculating the Pearson correlation coefficient using the cor() command
    12. Calculating Spearman correlation using the method argument in the cor() command
  3. Solving learners problems
  4. Uploading the classwork on Cloud Drive
  5. Assigning the Day-16 handout
  6. Taking learners attendance

Day – 6: Customizing and Modifying the Look of Plots in RStudio (Part I)

  1. Creating a simple scatter plot
  2. Changing the font and value labels size of a plot using the “cex” argument
  3. Changing the fonts of a plot using the “font” argument
  4. Changing colors on plots using the “col” argument
  5. Changing plotting character using the “pch” argument
  6. Adding a regression line to the scatter plot using the “abline” command
  7. Changing the regression lines color, type, and width
  8. Solving learners problems
  9. Uploading the classwork on Cloud Drive
  10. Assigning the Day-17 handout
  11. Taking learners attendance

Day – 7: Customizing and Modifying the Look of Plots in RStudio (Part II)

  1. Downloading the necessary practice files
  2. Identifying the category of a variable on the same plot using plotting characters & colors
  3. Creating separate plots on one screen in R
  4. Relabeling the axis of a plot in R
  5. Solving learners problems
  6. Uploading the classwork on Cloud Drive
  7. Assigning the Day-18 handout
  8. Taking learners attendance

Day – 8: Adding Text to Plots & Modifying Text in R

  1. Adding text to a plot using the text() command
  2. Controlling the text position in the x and y coordinates using the “adj” argument
  3. Changing the size, color, and font of the text
  4. Creating a horizontal line across the mean of the y-axis
  5. Adding text for the horizontal line
  6. Adding text to the margins of the plot
  7. Uploading the classwork on Cloud Drive
  8. Assigning the Day-19 handout
  9. Taking learners attendance

Day – 9: Adding Legends to Plots in R

  1. Adding legend to the plot using the legend() command
  2. Customizing legends using the pch argument
  3. Removing the box from the legend using the “bty” argument
  4. Adding legends for lines
  5. Changing the line types in the legend
  6. Uploading the classwork on Cloud Drive
  7. Assigning the handout
  8. Taking learners attendance

Day – 10: Getting Started with Data Visualization in R with ggplot2 by creating scatterplots

  1. Installing the necessary packages 
  2. Importing the necessary libraries
  3. Observing the list of the existing R data sets
  4. Understanding the background of an existing dataset
  5. Creating a simple geometric canvas/map using ggplot
  6. The short process of creating a simple geometric canvas/map using ggplot
  7. Working with more example datasets for understanding ggplot
  8. Working with the “pipe” operator
  9. Uploading the classwork on Cloud Drive
  10. Assigning the handout
  11. Taking learners attendance

Day – 11: Creating boxplots using ggplot2

  1. Creating boxplots
  2. Adding points to the boxplots
  3. Changing the size and color of the points based on the values of other variables
  4. Changing the transparency of the points
  5. Flipping the orientation of the boxplots
  6. Producing separate boxplots based on a categorical variable
  7. Changing the theme of a boxplot
  8. Adding a title to the boxplot
  9. Uploading the classwork on Cloud Drive
  10. Assigning the handout
  11. Taking learners attendance

Day – 12: Creating histograms using ggplot2

  1. Installing the package “ggplot2movies”
  2. Introducing the Rstudio ggplot cheat sheet
  3. Importing the necessary libraries
  4. Viewing the “movies” dataset
  5. Creating an object for the main ggplot aesthetic
  6. Creating a histogram in the main ggplot aesthetic
  7. Setting specific bin width for the histogram
  8. Changing the color of the histogram’s bins’ border
  9. Changing the bins’ fill color of the histogram
  10. Changing the transparency of the fill color
  11. Changing the labels of the plot
  12. Adding a tittle to the plot
  13. Uploading the classwork on Cloud Drive
  14. Assigning the handout
  15. Taking learners attendance

Section Title: Data Preprocessing in R


Section Title: Regression


  1. Simple Linear Regression in R
  2. Checking Linear Regression Assumptions in R
  3. Multiple Linear Regression in R
  4. Changing Numeric Variable to Categorical in R
  5. Creating Dummy Variables or Indicator Variables in R
  6. Change Reference (Baseline) Category in Regression Model with R
  7. Including Variables/ Factors in Regression with R, Part I
  8. Including Variables/ Factors in Regression with R Part II
  9. Multiple Linear Regression with Interaction in R
  10. Interpreting Interaction in Linear Regression with R
  11. Partial F-Test for Variable Selection in Linear Regression with R
  12. Polynomial Regression in R
  13. Multiple Linear Regression in R
  14. Polynomial Regression in R
  15. Support Vector Regression in R
  16. Decision Tree Regression in R
  17. Random Forest Regression in R
  18. Evaluating Regression Models Performance
  19. Regression Model Selection in R

Section Title: Classification


  1. Logistic Regression in R
  2. K-Nearest Neighbors in R
  3. Support Vector Machine in R
  4. Kernel SVM in R
  5. Naive Bayes in R
  6. Decision Tree Classification in R
  7. Random Forest Classification in R
  8. Evaluating Classification Models Performance

Section Title: Clustering


  1. K-Means Clustering in R
  2. Hierarchical Clustering in R

Section Title: Association Rule Learning


  1. Apriori in R
  2. Eclat in R

Section Title: Reinforcement Learning


  1. Upper Confidence Bound in R
  2. Thompson Sampling

Section Title: Natural Language Processing in R


Section Title: Deep Learning


  1. Artificial Neural Networks in R
  2. Convolutional Neural Networks

Section Title: Dimensionality Reduction


  1. Principal Component Analysis
  2. Linear Discriminant Analysis
  3. Kernel PCA

Section Title: Model Selection & Boosting


  1. Model Selection
  2. XGBoost