Simple Linear Regression

  • Assumptions to check:

    • Independence of observations,

    • The distribution of data in the underlying population from each of the samples is derived is normal. Normal distribution (testable by e.g. Kolmogorov–Smirnov test, Shapiro–Wilk test, the Anderson-Darling test, Q-Q plot)

    • Equal variances (homoscedasticity) of the group samples (testable by e.g. F-test, Levene’s test, Bartlett’s test, Brown-Forsythe test)

    • Linearity (e.g inspecting scatterplot)

    • Independence of residuals (testable e.g. Durbin-Watson test)

  • Report on design in Method section:

    • Report on variables

    • Name the statistical package or program used in the analysis

  • Report statistics in Results section:

    • Regression equation, R^2, F-statistic, degrees of freedom, p-value

    • Report on assumptions

  • Example:

    • Method section: We used simple linear regression to determine if age can effectively predict BMI in the studied population. We used GraphPad Prism (RRID:SCR_002798) to perform the  analysis.

    • Results section:  The simple linear regression model was found to be significant (Adjusted-R2 = .32, F(1,98) = 47.57, p < .001), indicating that AGE can predict BMI (t = 6.90, p < .001). The fitted regression model equation is BMI = 23.60 + 0.13 * AGE. The scatterplot shows a linear relationship between the variables. The residuals of the model follow the normal distribution (Shapiro-Wilk W = .98, p = .203), are homoscedastic (Breusch-Pagan χ2 = 1.92, p = .166), and independent (Durbin-Watson D = 1.85, p = .486).

Multiple Linear Regression

  • Assumptions to check:

    • Independence of observations,

    • Linearity (e.g inspecting scatterplot)

    • Normality (testable by e.g. Kolmogorov–Smirnov test, Shapiro–Wilk test, the Anderson-Darling test, Q-Q plot)

    • Equal variances (homoscedasticity) of the group samples (testable by e.g. F-test, Levene’s test, Bartlett’s test, Brown-Forsythe,  test),

    • Independence of residuals (testable e.g. Durbin-Watson test)

    • Absence of multicollinearity

  • Report design in Methods section

    • Report on variables

    • Name the statistical package or program used in the analysis

  • Report statistics  in Results section

    • Regression equation, R^2, F-statistic, degrees of freedom, p-value

    • Report on assumptions

  • Example: 

    • Method section: We used multiple linear regression to determine if age and hours of sleep are good predictors of BMI index. We used GraphPad Prism (RRID:SCR_002798) to perform the  analysis.

    • Results section: The multiple linear regression model was significant (Adjusted-R2 = .52, F(2,97) = 53.47, p < .001), indicating that both AGE (t = 6.15, p < .001) and hours of SLEEP (t = -6.35, p < .001) are predictors of BMI. The equation obtained from the analysis BMI = 28.62 + 0.10 * AGE - 0.58 * SLEEP. The adjusted coefficient of determination (Adjusted-R2), which measures the model's ability to explain the observed values, was 0.52 (52%).

      In the scatterplots, we can observe the linear relationship between each independent variable and the dependent variable. The residuals of the model fit the normal distribution (Shapiro-Wilk W = .99, p = .362), are homoscedastic (Breusch-Pagan χ2 = .87, p = .647), and are independent (Durbin-Watson DW = 1.88, p = .647). Both collinearity measures indicate no multicollinearity (VIF = 108, Tolerance = .93).