Generalized Linear Model

Objective:

  • General linear model (GLM) used to predict response variables with both continuous or discrete distribution and response variables can have linear or non-linear relationship with explanatory variables.

Model structure:

  • GLM includes three parts. Random component, Systematic component, Link function.

Assumption:

  • yiy_i is independent and typically assumed to follow an exponential family distribution with μi\mu_i.
  • GLM assumes transformed dependent variables (through link function) and independent variables has a linear relationship.
  • Error ϵi\epsilon_i should be independent but not normally distributed.

Parameter estimate:

  • xix_i as covariates and β\beta for coefficients

Model selection:

  • feature selection

Model fit:

  • Least square

Generalized Linear Models (GLM)

Model Random Link Systematic
linear regression normal identity continuous
ANOVA normal identity ategorical
logistic regression binomial logit mixed
loglinear poisson log categorical
poisson regression poisson log mixed
multinomial response multinomial generalized logit mixed

Introduction of 3 components of any GLM:
random component: which is the probability distribution of response variable Y

systematic component: show the explanatory variables (x1,x2,..,xk)(x_1,x_2,..,x_k) in the model. And GLM has a strong assumption that η=βx\eta=\beta x, which means the natural parateter η\eta in exponential family should equal to linear predictor (design choice).

link function: η\eta or g(u)g(u) - refer to the link between random and systematic components. In another words, it shows how response variables change based on explanatory variables. It can be a non-linear function. In addition, the inverse function g1(η)=μg^{-1}(\eta)=\mu is the response function which can reflect linear predictor to the target yy.

Then we look some examples for GLM component:

simple linear regression

yI=β0+βxi+ϵiy_I = \beta_0 + \beta x_i + \epsilon_i

  • Random component: Y is dependent variable and has yN(η,σe2)y\sim N(\eta,\sigma_e^2), ϵiN(0,σ2)\epsilon_i \sim N(0,\sigma^2)
  • Systematic component: independent variable X typically be continuous and linear predictor is η=βxi\eta = \beta x_i
  • Link function: identity link , η=g(E(yi))=E(yi)\eta=g(E(y_i))=E(y_i), this is the simple link function which model the mean directly. Hypothesis h(x)=E(yx,β)=μ=g1(η)=μh(x) = E(y|x,\beta)=\mu=g^{-1}(\eta)=\mu

Logistic regression

logit(π)=log(π1π)=β0+βxilogit(\pi)=log(\frac{\pi}{1-\pi})=\beta_0+\beta x_i

which is the log odds of probability of “success” as a function of explanatory variables.

  • Random component: The distribution of Y is assumed as Bernoulli(π)Bernoulli(\pi), π\pi is probability of “success”
  • Systematic component: X is independent variables typically be discrete, linear predictor is η=βxi\eta = \beta x_i
  • link function: Logit function, η=logit(π)=log(π1π)\eta = logit(\pi)=log(\frac{\pi}{1-\pi}) which models the log odds of the mean(μ\mu). Hypothesis h(xi)=E(yiβ,xi)=sigmoid(ηi)h(x_i)=E(y_i|\beta,x_i) = sigmoid(\eta_i)

Log-linear model

log(μij)=λ+λiA+λjB+λijABlog(\mu_{ij}) = \lambda + \lambda_i^A + \lambda_j^B + \lambda_{ij}^{AB}

  • Random component: The distribution of counts, follows Poisson distribution
  • Systematic component: Independent variable X is discrete and is linear in parameters λ+λixi+...+λnxn\lambda + \lambda_i^{x_i}+...+ \lambda_n^{x_n}
  • Link function: Log link, η=log(μ)\eta = log(\mu) which model the log of mean.

Advantage of GLM over OLS(ordinary least square)

  • There is no need to transform the respose YY to have normal distribution.
  • Variance no need to be constant
Author: shixuan liu
Link: http://tedlsx.github.io/2019/08/08/glm/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.
Donate
  • Wechat
  • Alipay

Comment