weighted.mean
Usage
Used to calculate the weighted mean of a variable.
Usage 1:
weighted.mean(dataframe$variable, dataframe$weights, na.rm=TRUE)
- This calculates the weighted mean of
variableindataframeusingweightsas weights. - We always include
na.rm=TRUEas the third input to tell R to remove anyNAvalues from the calculation.
Usage 2:
some_dataframe_name <- dataframe %>%
group_by(groupvar1, groupvar2, ...) %>%
summarize(
avg_variable = weighted.mean(variable, weights, na.rm=TRUE)
)
- This calculates the weighted mean of
variableindataframewithin each possible combination of values forgroupvar1,groupvar2, … - See the vignette on summary statistics for more details
Example 1
rm(list=ls())
df <- read.csv("IPUMS_ACS2019_CA_1.csv")
weighted.mean(df$AGE, df$PERWT, na.rm=TRUE)
This calculates the weighted mean of AGE in df using PERWT as weights.
Example 2
rm(list=ls())
library(dplyr)
df <- read.csv("IPUMS_ACS2019_CA_1.csv")
df$EMPLOYED <- df$EMPSTAT==1
emprate_by_race <- df %>%
group_by(RACHSING) %>%
summarize(
EMPLOYMENT_RATE = weighted.mean(EMPLOYED, PERWT, na.rm=TRUE)
)
This calculates the weighted mean of EMPLOYED (e.g. the employment rate) within each value of RACHSING (e.g. by race), using PERWT as the weights.
Mathematical Background
Suppose you have \(N\) samples of a variable:
\[(x_1, x_2, x_3, \ldots, x_N)\]Along with \(N\) weights:
\[(w_1, w_2, w_3, \ldots, w_N)\]The weighted mean of \(x\) using \(w\) as weights is:
\[\frac{ \sum_{i=1}^{N} x_i w_i }{ \sum_{i=1}^{N} w_i }\]