Machine Learning  Andrew NG in Coursera(Cont.)
Linear Regression with multiple variables
Multiple features
 Notation: n = number of features, x^{(i)} = input (features) of i^{th} training example. x^{(i)}_{j} = value of feature j in i^{th} training example.
 h_{θ}(x) = θ_{0} + θ_{1}x_{1} + θ_{2}x_{2} + … + θ_{n}x_{n}
Feature Scaling

Idea: make sure features are on a simular scale.

E.g. x_{1} = size(02000 feet^{2}) => x_{1} = size(feet^{2}) / 2000; x_{2} = number of bedrooms(15) => x_{2} = number of bedrooms / 5

Get every feature into approximately a 1 =< x_{i} =< 1 range.

Mean normaliztion: Replace x_{i} with x_{i}  u_{i} to make features have approximately zero mean(Do not apply to x_{0} = 1).
Learning Rate

Making sure gradient descent is working correctly. The goal is make sure J_{(θ)} should descrease after every iteration.

Example automatic convergence test: Declare convergence if J_{(θ)} descreases by less than 10^{3} in one iteration.

Gradient descent not working, use smaller ⍺.

For sufficently small ⍺, J_{(θ)} should descrease on every iteration. But if ⍺ is too small, gradient descent can be slow to converge.
Features and Polynomial Regression

E.g. Housing prices prediction: h_{θ}(x) = θ_{0} + θ_{1} X frontage + θ_{2} X depth

tranfer area = frontage X depth as the feature, then with quadratic function or cubic function to replace previous function be more fitable to the model.
Normal equation and noninvertibility

What if X^{T}X is noninvertible?

Redundant features(linearly dependent).

Too many features.(delete some features, or use regularization)

Normal equation: method to solve for θ analytically.

θ = (X^{T}X)^{1}X^{T}y