# Machine Learning - Andrew NG in Coursera(Cont.)

Linear Regression with multiple variables

## Multiple features

• Notation: n = number of features, x(i) = input (features) of ith training example. x(i)j = value of feature j in ith training example.
• hθ(x) = θ0 + θ1x1 + θ2x2 + … + θnxn

## Feature Scaling

• Idea: make sure features are on a simular scale.

• E.g. x1 = size(0-2000 feet2) => x1 = size(feet2) / 2000; x2 = number of bedrooms(1-5) => x2 = number of bedrooms / 5

• Get every feature into approximately a -1 =< xi =< 1 range.

• Mean normaliztion: Replace xi with xi - ui to make features have approximately zero mean(Do not apply to x0 = 1).

## Learning Rate

• Making sure gradient descent is working correctly. The goal is make sure J(θ) should descrease after every iteration.

• Example automatic convergence test: Declare convergence if J(θ) descreases by less than 10-3 in one iteration.

• Gradient descent not working, use smaller ⍺.

• For sufficently small ⍺, J(θ) should descrease on every iteration. But if ⍺ is too small, gradient descent can be slow to converge.

## Features and Polynomial Regression

• E.g. Housing prices prediction: hθ(x) = θ0 + θ1 X frontage + θ2 X depth

• tranfer area = frontage X depth as the feature, then with quadratic function or cubic function to replace previous function be more fitable to the model.

## Normal equation and non-invertibility

• • What if XTX is non-invertible?

• Redundant features(linearly dependent).

• Too many features.(delete some features, or use regularization)

• Normal equation: method to solve for θ analytically.

• θ = (XTX)-1XTy

• 