- Machine learning
- What is machine leaning
- Arthur Samuel(1959)
- Field of study that gives computers the ability to learn without being explicitly programmed.
- Tom Mitchell(1998)
- A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
Machine learning algorithms
- Supervised learning
- Unsupervised learning
- Others: Reinforcement learning, recommender systems.
- “right answers” given
- Regression: Predict continuous valued output(price)
- Classification: Discrete valued output(0 or 1)
- Just give the data, then give the structure of these data set - Cluster
- Cocktail party problem
- [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x’);
Notation: m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable
Idea: Choose θ0，θ1 so that hθ(x) is close to y for our training examples(x,y)
Sometimes called the squared error cost function
hθ(x) for fix θ1, this is a function of x, J(θ1) function of the parameter θ1
⍺ is learning rate, if ⍺ is too small, gradient descent can be slow. But if it is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.
The simultaneous update should like the formula correct one before reassign them all.
Derivative term after ⍺, you can take the tangent to one point for example. When it goes positive number, after θ1: θ1 - ⍺ x (positive number), θ1 will go minus. It goes to left with a decrease one. This is the right direction to get closer to minimum value.
Gradient descent can converge to a local minimum, even with the learning rate ⍺ fixed. So no need to descrease ⍺ over time.
Batch Gradient Descent, Each step of gradient descent uses all the training examples.
Linear Algebra: Notation and set of the things you can do with matrices and vectors.