George Choueiry

I am George Choueiry, PharmD, MPH, my objective is to help you conduct studies, from conception to publication.

Correlation Coefficient vs Regression Coefficient

Both the correlation and regression coefficients rely on the hypothesis that the data can be represented by a straight line. They are similar in many ways, but they serve different purposes. Here’s a table that summarizes the similarities and differences between the correlation coefficient, r, and the regression coefficient, β: Correlation coefficient: r Regression coefficient: …

Correlation Coefficient vs Regression Coefficient Read More »

An Example of Using Marginal and Conditional Distributions

The conditional distribution of a variable, for example heights, is the distribution of heights given the value of another variable, for example gender. Plotting the conditional distribution of heights given gender is a way of visualizing the relationship between the 2 variables. The marginal distribution of heights is the distribution of heights for everybody, independent …

An Example of Using Marginal and Conditional Distributions Read More »

How to Handle Missing Data in Practice: Guide for Beginners

Handling missing data involves 2 steps: Determining the type of missing data, which can be: Missing completely at random (MCAR) Missing at random (MAR) Missing not at random (MNAR) Choosing a method to deal with these missing values, such as: Deleting variables (i.e. columns) that contain missing values Deleting observations (i.e. rows) whose values are …

How to Handle Missing Data in Practice: Guide for Beginners Read More »

Solve a Polynomial in R

A polynomial p(x) is an expression of the form: \(p(x) = a_0 + a_1x + a_2x^2 + a_3x^3 + … + a_nx^n\) Where n is any non-negative integer. Solve a polynomial p(x) in R To solve the equation \(p(x) = 0\) in R, we can use the function: polyroot. For example, let’s solve the equation: …

Solve a Polynomial in R Read More »

How to Solve an Equation in R

In this article, will use the uniroot.all() function from the rootSolve package to find all the solutions of an equation over a given interval (or domain). Input: uniroot.all() takes 2 arguments: a function f and an interval. How it works: Its searches the interval for all possible roots of f. Output: uniroot.all() returns a vector …

How to Solve an Equation in R Read More »

Front-Door Criterion to Adjust for Unmeasured Confounding

Suppose we conducted an observational study to estimate the causal effect of some depression treatment on the quality of life of patients: The problem is that the relationship between the two is confounded by the severity of depression: The arrows in the diagram reflect causal associations: The arrow from “depression severity” to “treatment” reflects the …

Front-Door Criterion to Adjust for Unmeasured Confounding Read More »

How to Start an Introduction? Examples from 98,093 Research Papers

The examples below are from 98,093 full-text PubMed research papers that I analyzed in order to explore common ways to start the Introduction section. The research papers included in this analysis were selected at random from those uploaded to PubMed Central between the years 2016 and 2021. Note that I used the BioC API to …

How to Start an Introduction? Examples from 98,093 Research Papers Read More »

Meta-Analysis Software Popularity in 1,321 Research Papers

I analyzed a random sample of 1,957 meta-analysis full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to check the popularity packages of meta-analysis software among medical researchers. (I used the BioC API to download the articles — see the References section below). Out of these 1,957 meta-analysis papers, …

Meta-Analysis Software Popularity in 1,321 Research Papers Read More »

Does the Number of Authors Matter? Data from 101,580 Research Papers

I analyzed a random sample of 101,580 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to explore the influence of the number of authors of a research paper on its quality. I used the BioC API to download the data (see the References section below). Here’s a summary …

Does the Number of Authors Matter? Data from 101,580 Research Papers Read More »

“I” & “We” in Academic Writing: Examples from 9,830 Studies

I analyzed a random sample of 9,830 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to explore whether first-person pronouns are used in the scientific literature, and how? I used the BioC API to download the data (see the References section below). Popularity of first-person pronouns in the …

“I” & “We” in Academic Writing: Examples from 9,830 Studies Read More »

How Long Should the Discussion Section Be? Data from 61,517 Examples

I analyzed a random sample of 61,517 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to answer the questions: What is the typical length of a discussion section? and which factors influence it? I used the BioC API to download the data (see the References section below). Here’s …

How Long Should the Discussion Section Be? Data from 61,517 Examples Read More »

How Long Should the Results Section Be? Data from 61,458 Examples

I analyzed a random sample of 61,458 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to answer the questions: What is the typical length of a results section? and which factors influence it? I used the BioC API to download the data (see the References section below). Here’s …

How Long Should the Results Section Be? Data from 61,458 Examples Read More »

How Long Should the Methods Section Be? Data from 61,514 Examples

I analyzed a random sample of 61,514 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to answer the questions: What is the typical length of a methods section? and which factors influence it? I used the BioC API to download the data (see the References section below). Here’s …

How Long Should the Methods Section Be? Data from 61,514 Examples Read More »

How Long Should the Introduction of a Research Paper Be? Data from 61,518 Examples

I analyzed a random sample of 61,518 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to answer the questions: What is the typical length of an introduction section? and which factors influence it? I used the BioC API to download the data (see the References section below). Here’s …

How Long Should the Introduction of a Research Paper Be? Data from 61,518 Examples Read More »

5 Variable Transformations to Improve Your Regression Model

In this article, we will discuss how you can use the following transformations to build better regression models: Log transformation Square root transformation Polynomial transformation Standardization Centering by substracting the mean Compared to fitting a model using variables in their raw form, transforming them can help: Make the model’s coefficients more interpretable. Meet the model’s …

5 Variable Transformations to Improve Your Regression Model Read More »

7 Different Ways to Control for Confounding

Confounding can be controlled in the design phase of the study by using: Random assignment Restriction Matching Or in the data analysis phase by using: Stratification Regression Inverse probability weighting Instrumental variable estimation Here’s a quick summary of the similarities and differences between these methods: Study Phase Method Can easily control for multiple confounders Can …

7 Different Ways to Control for Confounding Read More »

List of All Biases [Sorted by Popularity in Research Papers]

I analyzed the content of 98,709 randomly chosen research papers from PubMed to learn more about bias. Specifically, I wanted to do 2 things: Rank 64 types of biases by popularity, in order to determine on which ones professional researchers focus the most in practice. Test the hypothesis that addressing bias issues is a sign …

List of All Biases [Sorted by Popularity in Research Papers] Read More »

What is a Good R-Squared Value? [Based on Real-World Data]

I analyzed the content of 43,110 randomly chosen research papers from PubMed to learn more about R-squared. Specifically, I wanted to answer the following questions: What is a good value for R-squared? What is a low value for R-squared? Is a higher R-squared always better? Is a low R-squared necessarily bad? Let’s start with a …

What is a Good R-Squared Value? [Based on Real-World Data] Read More »

Statistical Power: What It Is and How It Is Used in Practice

Statistical power is a measure of study efficiency, calculated before conducting the study to estimate the chance of discovering a true effect rather than obtaining a false negative result, or worse, overestimating the effect by detecting the noise in the data. Here are 5 seemingly different, but actually similar, ways of describing statistical power: Definition …

Statistical Power: What It Is and How It Is Used in Practice Read More »

Matched Pairs Design vs Randomized Block Design

In a matched pairs design, treatment options are randomly assigned to pairs of similar participants, whereas in a randomized block design, treatment options are randomly assigned to groups of similar participants. The objective of both is to balance baseline confounding variables by distributing them evenly between the treatment and the control group. Matched pairs design …

Matched Pairs Design vs Randomized Block Design Read More »

Randomized Block Design vs Completely Randomized Design

A randomized block design differs from a completely randomized design by ensuring that an important predictor of the outcome is evenly distributed between study groups in order to force them to be balanced, something that a completely randomized design cannot guarantee. A Completely randomized design uses simple randomization to assign participants to different treatment options …

Randomized Block Design vs Completely Randomized Design Read More »

Identify Variable Types in Statistics (with Examples)

Here’s a table that summarizes the types of variables: Types of variables Quantitative(a.k.a. Numerical) Qualitative(a.k.a. Categorical) Continuous Discrete Ordinal Nominal Consists of numerical values that can be measured but not counted. Consists of numerical values that can be counted. Consists of text or labels that have a logical order. Consists of text or labels that …

Identify Variable Types in Statistics (with Examples) Read More »

Pretest-Posttest Control Group Design: An Introduction

The pretest-posttest control group design, also called the pretest-posttest randomized experimental design, is a type of experiment where participants get randomly assigned to either receive an intervention (the treatment group) or not (the control group). The outcome of interest is measured 2 times, once before the treatment group gets the intervention — the pretest — …

Pretest-Posttest Control Group Design: An Introduction Read More »

Assess Variable Importance in Linear and Logistic Regression

In this article, we will be concerned with the following question: Given a regression model, which of the predictors X1, X2, X3, etc. has the most influence on the outcome Y? In general, assessing the relative importance of predictors by directly comparing their (unstandardized) regression coefficients is not a good idea because: For numerical predictors: …

Assess Variable Importance in Linear and Logistic Regression Read More »

Separate-Sample Pretest-Posttest Design: An Introduction

The separate-sample pretest-posttest design is a type of quasi-experiment where the outcome of interest is measured 2 times: once before and once after an intervention, each time on a separate group of randomly chosen participants. The difference between the pretest and posttest measures will estimate the intervention’s effect on the outcome. The intervention can be: …

Separate-Sample Pretest-Posttest Design: An Introduction Read More »

How to Report a Chi-Square Test

The 3 main types of Chi-square tests are: Chi-square goodness-of-fit test: used to compare the distribution of a categorical variable (with more than 2 levels) to a hypothetical distribution. Chi-square homogeneity test: used to test whether 2 groups (coming from 2 different samples) have the same distribution regarding a certain categorical variable. Chi-square independence test: …

How to Report a Chi-Square Test Read More »

Checking the Popularity of 125 Statistical Tests and Models

I analyzed the methods sections of 43,110 randomly chosen research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to check the popularity of 125 statistical methods in medical research. I used the BioC API to download the articles (see the References section below). Here’s a summary of the key findings …

Checking the Popularity of 125 Statistical Tests and Models Read More »

How Many References Should a Research Paper Have? Study of 96,685 Articles

I analyzed a random sample of 96,685 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to answer the question: How many references should you cite when writing a research article? I used the BioC API to download the data (see the References section below). Here’s a summary of …

How Many References Should a Research Paper Have? Study of 96,685 Articles Read More »

Statistical Software Popularity in 40,582 Research Papers

I analyzed a random sample of 76,147 full-text research papers, uploaded to PubMed Central between the years 2016 and 2021, in order to check the popularity of statistical software among medical researchers. (I used the BioC API to download the articles — see the References section below). Out of these 76,147 research papers, only 40,582 …

Statistical Software Popularity in 40,582 Research Papers Read More »