Linear Regression Model

Objective:

  • Trying to predict the continuous variable Y which is a linear function of several continus variables x.

Model structure:

  • Yi=β0+β1xi+ϵiY_i=\beta_0 + \beta_1 x_i + \epsilon_i

Assumption:

  • Y follows normal distribution, error ϵi\epsilon_i is indepdent and has ϵiN(0,σ2)\epsilon_i \sim N(0,\sigma^2). Data X is fixed

Parameter estimate:

  • β0\beta_0 as intercept and β1\beta_1 as slope

Model selection:

  • feature selection

Model fit:

  • R2R^2
  • residual analysis
  • F-statistic

Multiple Linear Regression Model

Multiple linear regression is a linear model with more than 1 variable. These variables are called dependent variables and the predict variable is called independent variables.

Where the formula for Multiple linear regression model is:

for i = 1,...,n is the number of observations or data. $p$ is the number of dependent variables.

Hence the matrix notation for multiple linear regression is:

Which can also write in simple statement:

Using leat square, we need to minimise this function

To find the ‘best’ β\beta. One way is to find the relation between yy^y-\hat{y} and XX:

In this figure, the residuals yy^y-\hat{y} are orthogonal to the columns of XX:

Then we can define

Also we can use Least Squares as our loss(error) function which can minimize the Eucledian distance between the predicted y^\hat{y} and actual yy:

Lossfunction=L=12i=1n(yiβTxi)2=12yβx2=12(yxβ)T(yxβ)Loss function =L = \frac{1}{2}\sum_{i=1}^{n}(y_i-\beta^Tx_i)^2=\frac{1}{2}||y-\beta x||^2 = \frac{1}{2}(y-x\beta)^T(y-x\beta)

Finding the minium of the loss function, we can use differentiate for β\beta:

dLdβ=XTy+XTXβ=0\frac{dL}{d\beta}=-X^Ty+X^TX\beta=0

We still got the result:

β^=(XTX)1XTy\hat{\beta}=(X^TX)^{-1}X^Ty

This of course works only if the inverse exists. If the inverse does not exist, the normal equations can still be solved, but the solution may not be unique.

For fitted y^\hat{y}, we can plug in the β^\hat{\beta}

The matrix HH (Hat-matrix) is a nnn*n matrix, it maps the observed values yy onto the fitted value y^\hat{y}

And residuals can be written as

Here is a comparison between my own code build with numpy performance and the package in sckit-learn

1
2
3
4
5
6
7
8
9
10
11
12
13
14
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# y = 1 * x_0 + 2 * x_1 + 3
y = np.dot(X, np.array([1, 2])) + 3

## add a column with value 1 at the left side of x
x_with_constant = np.insert(x,0,1, axis=1)
x_t = np.transpose(x_with_constant)

beta = np.linalg.inv(x_t.dot(x_with_constant)).dot(x_t).dot(y)
## beta is array([3., 1., 2.])
y_pred = np.array([[1,3,5]]).dot(beta)

y_pred
## array([16.])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# y = 1 * x_0 + 2 * x_1 + 3
y = np.dot(X, np.array([1, 2])) + 3
reg = LinearRegression().fit(X, y)
reg.score(X, y)
### 1.0
eg.coef_
## array([1., 2.])
reg.intercept_
### 3.0000...
reg.predict(np.array([[3, 5]]))
## array([16.])

Comparing to the function in sckit-learn, both of them can do the correct prediction. And other performance can be consider later…

Leverage and Influence

To assess leverage and influence of the observations, we compute the leverage values hii (the diagonal elements of the hat matrix H) and Cook’s D-values. An easy way to compute the hat matrix is to use the influence() function in R. This function returns a list with the following components:

• $hat is the diagonal of the hat matrix

• The rows of $coefficients contain the differences βˆ − βˆ(i) between the full parameter estimate and the estimate when observation i is omitted.

• $sigma contains the estimated values of σ from the model with observation i omitted.

The function cooks.distance() can be used to compute Cook’s D values $$D_1^2 , . . . , D_ n^2$$

To look for potential outliers, we find the observations with the largest residuals, leverages and D-values.

Both rows are identical, so the two ways to compute βˆ − βˆ(21) give the same result. Finally, Cook’s D-value D2 21 is defined to be $$\frac{D_{21}^2 = (\hat{β} − \hat{β}(21))^T X^T X(βˆ − βˆ(21))}{(p + 1)\hat{σ}^2}$$

Model Selection

To build up a model stepwise, we include columns one by one, until all F-values are below a pre-specified value FINF_{IN}. While there are variables with $$Fj ≥ F_{IN}$$, we add a new variable. There are two choices:

a) We can either add the variable which leads to the largest R2R^2 -value; or

b) we can add the variable with the largest F-value.

Forward stepwise, backward stepwise.

Robustness

The kinds of questions typically asked in the robustness literature are:

  1. Is the procedure sensitive to small departures from the model?
  2. To first order, what is the sensitivity?
  3. How wrong can the model be before the procedure produces garbage?

The first issue is that of qualitative robustness; the second is quantitative robustness; the third is the “breakdown point”.

Resistance and Breakdown Point

A statistic is resistant if arbitrary changes to a few data(outlier) will not change too much in result.

Suppose we are allowed to change the values of the observations in the sample. What is the smallest fraction would we need to change to make the estimator take an arbitrary value? The answer is the breakdown point of the estimator

M-Estimation

Linear least-squares estimates can behave badly when the error distribution is not normal, particularly when the errors are heavy-tailed. One remedy is to remove influential observations from the least-squares fit. Another approach, termed robust regression, is to use a fitting criterion that is not as vulnerable as least squares to unusual data.

This class of estimators can be regarded as a generalization of maximum-likelihood estimation, hence the term “M”-estimation.[2]

Reference

[1] http://mezeylab.cb.bscb.cornell.edu/labmembers/documents/supplement 5 - multiple regression.pdf

[2] http://users.stat.umn.edu/~sandy/courses/8053/handouts/robust.pdf

Author: shixuan liu
Link: http://tedlsx.github.io/2019/08/05/linear-model/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.
Donate
  • Wechat
  • Alipay

Comment