Simple Linear Regression
Assumptions to check:
Independence of observations,
The distribution of data in the underlying population from each of the samples is derived is normal. Normal distribution (testable by e.g. Kolmogorov–Smirnov test, Shapiro–Wilk test, the Anderson-Darling test, Q-Q plot)
Equal variances (homoscedasticity) of the group samples (testable by e.g. F-test, Levene’s test, Bartlett’s test, Brown-Forsythe test)
Linearity (e.g inspecting scatterplot)
Independence of residuals (testable e.g. Durbin-Watson test)
Report on design in Method section:
Report on variables
Name the statistical package or program used in the analysis
Report statistics in Results section:
Regression equation, R^2, F-statistic, degrees of freedom, p-value
Report on assumptions
Example:
Method section: We used simple linear regression to determine if age can effectively predict BMI in the studied population. We used GraphPad Prism (RRID:SCR_002798) to perform the analysis.
Results section: The simple linear regression model was found to be significant (Adjusted-R2 = .32, F(1,98) = 47.57, p < .001), indicating that AGE can predict BMI (t = 6.90, p < .001). The fitted regression model equation is BMI = 23.60 + 0.13 * AGE. The scatterplot shows a linear relationship between the variables. The residuals of the model follow the normal distribution (Shapiro-Wilk W = .98, p = .203), are homoscedastic (Breusch-Pagan χ2 = 1.92, p = .166), and independent (Durbin-Watson D = 1.85, p = .486).
Multiple Linear Regression
Assumptions to check:
Independence of observations,
Linearity (e.g inspecting scatterplot)
Normality (testable by e.g. Kolmogorov–Smirnov test, Shapiro–Wilk test, the Anderson-Darling test, Q-Q plot)
Equal variances (homoscedasticity) of the group samples (testable by e.g. F-test, Levene’s test, Bartlett’s test, Brown-Forsythe, test),
Independence of residuals (testable e.g. Durbin-Watson test)
Absence of multicollinearity
Report design in Methods section
Report on variables
Name the statistical package or program used in the analysis
Report statistics in Results section
Regression equation, R^2, F-statistic, degrees of freedom, p-value
Report on assumptions
Example:
Method section: We used multiple linear regression to determine if age and hours of sleep are good predictors of BMI index. We used GraphPad Prism (RRID:SCR_002798) to perform the analysis.
Results section: The multiple linear regression model was significant (Adjusted-R2 = .52, F(2,97) = 53.47, p < .001), indicating that both AGE (t = 6.15, p < .001) and hours of SLEEP (t = -6.35, p < .001) are predictors of BMI. The equation obtained from the analysis BMI = 28.62 + 0.10 * AGE - 0.58 * SLEEP. The adjusted coefficient of determination (Adjusted-R2), which measures the model's ability to explain the observed values, was 0.52 (52%).
In the scatterplots, we can observe the linear relationship between each independent variable and the dependent variable. The residuals of the model fit the normal distribution (Shapiro-Wilk W = .99, p = .362), are homoscedastic (Breusch-Pagan χ2 = .87, p = .647), and are independent (Durbin-Watson DW = 1.88, p = .647). Both collinearity measures indicate no multicollinearity (VIF = 108, Tolerance = .93).