Objective:
- General linear model (GLM) used to predict response variables with both continuous or discrete distribution and response variables can have linear or non-linear relationship with explanatory variables.
Model structure:
- GLM includes three parts. Random component, Systematic component, Link function.
Assumption:
- is independent and typically assumed to follow an exponential family distribution with .
- GLM assumes transformed dependent variables (through link function) and independent variables has a linear relationship.
- Error should be independent but not normally distributed.
Parameter estimate:
- as covariates and for coefficients
Model selection:
- feature selection
Model fit:
- Least square
Generalized Linear Models (GLM)
| Model | Random | Link | Systematic |
|---|---|---|---|
| linear regression | normal | identity | continuous |
| ANOVA | normal | identity | ategorical |
| logistic regression | binomial | logit | mixed |
| loglinear | poisson | log | categorical |
| poisson regression | poisson | log | mixed |
| multinomial response | multinomial | generalized logit | mixed |
Introduction of 3 components of any GLM:
random component: which is the probability distribution of response variable Y
systematic component: show the explanatory variables in the model. And GLM has a strong assumption that , which means the natural parateter in exponential family should equal to linear predictor (design choice).
link function: or - refer to the link between random and systematic components. In another words, it shows how response variables change based on explanatory variables. It can be a non-linear function. In addition, the inverse function is the response function which can reflect linear predictor to the target .
Then we look some examples for GLM component:
simple linear regression
- Random component: Y is dependent variable and has ,
- Systematic component: independent variable X typically be continuous and linear predictor is
- Link function: identity link , , this is the simple link function which model the mean directly. Hypothesis
Logistic regression
which is the log odds of probability of “success” as a function of explanatory variables.
- Random component: The distribution of Y is assumed as , is probability of “success”
- Systematic component: X is independent variables typically be discrete, linear predictor is
- link function: Logit function, which models the log odds of the mean(). Hypothesis
Log-linear model
- Random component: The distribution of counts, follows Poisson distribution
- Systematic component: Independent variable X is discrete and is linear in parameters
- Link function: Log link, which model the log of mean.
Advantage of GLM over OLS(ordinary least square)
- There is no need to transform the respose to have normal distribution.
- Variance no need to be constant


