Machine Learning - Andrew NG in Coursera(Cont.)

Linear Regression with multiple variables

Multiple features

  • Notation: n = number of features, x(i) = input (features) of ith training example. x(i)j = value of feature j in ith training example.
  • hθ(x) = θ0 + θ1x1 + θ2x2 + … + θnxn

Feature Scaling

  • Idea: make sure features are on a simular scale.

  • E.g. x1 = size(0-2000 feet2) => x1 = size(feet2) / 2000; x2 = number of bedrooms(1-5) => x2 = number of bedrooms / 5

  • Get every feature into approximately a -1 =< xi =< 1 range.

  • Mean normaliztion: Replace xi with xi - ui to make features have approximately zero mean(Do not apply to x0 = 1).

Learning Rate

  • Making sure gradient descent is working correctly. The goal is make sure J(θ) should descrease after every iteration.

  • Example automatic convergence test: Declare convergence if J(θ) descreases by less than 10-3 in one iteration.

  • Gradient descent not working, use smaller ⍺.

  • For sufficently small ⍺, J(θ) should descrease on every iteration. But if ⍺ is too small, gradient descent can be slow to converge.

Features and Polynomial Regression

  • E.g. Housing prices prediction: hθ(x) = θ0 + θ1 X frontage + θ2 X depth

  • tranfer area = frontage X depth as the feature, then with quadratic function or cubic function to replace previous function be more fitable to the model.

Normal equation and non-invertibility

  • normal equation

  • What if XTX is non-invertible?

  • Redundant features(linearly dependent).

  • Too many features.(delete some features, or use regularization)

  • Normal equation: method to solve for θ analytically.

  • θ = (XTX)-1XTy

  • gradient descent vs normal equation