Data Analysis

5 Variable Transformations to Improve Your Regression Model

In this article, we will discuss how you can use the following transformations to build better regression models: Log transformation Square root transformation Polynomial transformation Standardization Centering by substracting the mean Compared to fitting a model using variables in their raw form, transforming them can help: Make the model’s coefficients more interpretable. Meet the model’s …

5 Variable Transformations to Improve Your Regression Model Read More »

What is a Good R-Squared Value? [Based on Real-World Data]

I analyzed the content of 43,110 randomly chosen research papers from PubMed to learn more about R-squared. Specifically, I wanted to answer the following questions: What is a good value for R-squared? What is a low value for R-squared? Is a higher R-squared always better? Is a low R-squared necessarily bad? Let’s start with a …

What is a Good R-Squared Value? [Based on Real-World Data] Read More »

Statistical Power: What It Is and How It Is Used in Practice

Statistical power is a measure of study efficiency, calculated before conducting the study to estimate the chance of discovering a true effect rather than obtaining a false negative result, or worse, overestimating the effect by detecting the noise in the data. Here are 5 seemingly different, but actually similar, ways of describing statistical power: Definition …

Statistical Power: What It Is and How It Is Used in Practice Read More »

Identify Variable Types in Statistics (with Examples)

Here’s a table that summarizes the types of variables: Types of variables Quantitative(a.k.a. Numerical) Qualitative(a.k.a. Categorical) Continuous Discrete Ordinal Nominal Consists of numerical values that can be measured but not counted. Consists of numerical values that can be counted. Consists of text or labels that have a logical order. Consists of text or labels that …

Identify Variable Types in Statistics (with Examples) Read More »

Assess Variable Importance in Linear and Logistic Regression

In this article, we will be concerned with the following question: Given a regression model, which of the predictors X1, X2, X3, etc. has the most influence on the outcome Y? In general, assessing the relative importance of predictors by directly comparing their (unstandardized) regression coefficients is not a good idea because: For numerical predictors: …

Assess Variable Importance in Linear and Logistic Regression Read More »

How to Report a Chi-Square Test

The 3 main types of Chi-square tests are: Chi-square goodness-of-fit test: used to compare the distribution of a categorical variable (with more than 2 levels) to a hypothetical distribution. Chi-square homogeneity test: used to test whether 2 groups (coming from 2 different samples) have the same distribution regarding a certain categorical variable. Chi-square independence test: …

How to Report a Chi-Square Test Read More »

Correlation vs Collinearity vs Multicollinearity

Here’s a table that summarizes the differences between correlation, collinearity and multicollinearity:   Correlation Collinearity Multicollinearity Definition Correlation refers to the linear relationship between 2 variables Collinearity refers to a problem when running a regression model where 2 or more independent variables (a.k.a. predictors) have a strong linear relationship Multicollinearity is a special case of …

Correlation vs Collinearity vs Multicollinearity Read More »

Standardized vs Unstandardized Regression Coefficients

Here’s a table that summarizes the similarities and differences between standardized and unstandardized linear regression coefficients:   Unstandardized β Standardized β Definition Unstandardized coefficients are obtained after running a regression model on variables measured in their original scales Standardized coefficients are obtained after running a regression model on standardized variables (i.e. rescaled variables that have …

Standardized vs Unstandardized Regression Coefficients Read More »