The process of gradient descent in machine learning

The optimization problem is a very important part of the machine learning algorithm. The core of almost every machine learning algorithm is to deal with the optimization problem.

In this article, I will introduce some of the most commonly used and very mastered optimization algorithms in the field of machine learning. After reading this article you will understand:

• What is a gradient descent method?

• How to apply the gradient descent method to a linear regression model?

• How to use gradient descent to process large-scale data?

• Some tips for gradient descent

let's start!

Gradient descent

The gradient descent method is an optimization algorithm for finding parameter values ​​that minimize the cost function. When we can't find the optimal solution of a function through analytical calculations (such as linear algebraic operations), we can use the gradient descent method to solve the problem.

Gradient Descent Method for Intuitive Experience

Imagine a large bowl that you often use to eat grains or to store the cost, the shape of the cost function is similar to the shape of the bowl.

The process of gradient descent in machine learning

Any random position on the bowl surface represents the cost value corresponding to the current coefficient, and the bottom of the bowl represents the cost function value corresponding to the optimal solution set. The goal of the gradient descent method is to constantly try different coefficient values, then evaluate the cost function and select the parameter values ​​that can reduce the cost function. Repeat the iteration to calculate the above steps until convergence, we can obtain the optimal solution corresponding to the minimum cost function value.

Gradient descent method

The gradient descent method first needs to set an initial parameter value. Normally we set the initial value to zero (coefficient=0coefficient=0), and then we need to calculate the cost function cost=f(coefficient)cost=f(coefficient) or cost. =evaluate(f(coefficient))cost=evaluate(f(coefficient)). Then we need to calculate the derivative of the function (the derivative is a concept of calculus, which refers to the slope value at some point in the function) and sets the value of the learning efficiency parameter (alpha).

Coefficient=coefficient−(alpha∗delta)

The above process is repeated until the parameter value converges, so that we can obtain the optimal solution of the function.

You can see how simple the gradient descent method is. You only need to know the gradient value of the cost function or the function to be optimized. Next I will introduce how to apply the gradient descent method to the field of machine learning.

Batch Gradient Descent

The goal of all supervised machine learning algorithms is to use the known independent (X) data to predict the value of the dependent variable (Y). All sorting and regression models are dealing with this issue.

Machine learning algorithms use a statistic to characterize the fit of the objective function. Although different algorithms have different objective function representation methods and different coefficient values, they have a common goal - that is, optimal parameter values ​​are obtained by optimizing the objective function.

Linear regression models and logistic regression models are classical cases of using gradient descent methods to find optimal parameter values.

We can use a variety of measurement methods to assess the fit of the machine learning model to the objective function. The cost function method measures the fit of the model by calculating the degree of difference between the predicted and true values ​​of each training set, such as the sum of residual squares.

We can calculate the derivative value corresponding to each parameter in the cost function, and then perform iterative calculation through the above updated equation.

After each step of the gradient descent method, we need to calculate the cost function and its derivative. Each iterative calculation process is called a batch, so this form of gradient descent method is also called batch gradient descent method.

The batch gradient descent method is a common gradient descent method in the field of machine learning.

Stochastic gradient descent method

When dealing with large-scale data, the gradient descent method is very inefficient. Because the gradient descent method needs to calculate the prediction of the training set during each iteration, it takes a long time when the amount of data is very large. When you deal with large-scale data, you can use the stochastic gradient descent method to improve computational efficiency. This algorithm differs from the gradient descent method described above in that it performs a coefficient update process for each random training sample, rather than performing a coefficient update process after each batch of samples has been computed.

The first step of the stochastic gradient descent method requires that the samples of the training set are randomly ordered, which is to disrupt the update process of the coefficients. Because we will update the coefficient value after each training instance is finished, the coefficient value and the cost function value will randomly jump. By disorganizing the order of the coefficient updating process, we can use this random walk property to avoid the problem of non-convergence of the model.

In addition to the inconsistent calculation of the cost function, the coefficient update process of the stochastic gradient descent method is exactly the same as the gradient descent method described above. For large-scale data, the rate of convergence of stochastic gradient descent method is obviously higher than other algorithms. Usually, you only need a small number of iterations to get a relatively better fitting parameter.

Some suggestions for gradient descent

This section lists several techniques that can help you better grasp the gradient descent algorithm in machine learning:

• Plot the cost function over time: Collect and plot the cost function values ​​obtained during each iteration. For the gradient descent method, each iteration calculation reduces the cost function value. If you cannot reduce the cost function value, you can try to reduce the learning efficiency value.

• Learning efficiency: The learning efficiency value in the gradient descent algorithm is usually 0.1, 0.001 or 0.0001. You can try different values ​​and choose the best learning efficiency.

• Standardized processing: Gradient descent quickly converges if the cost function is not skewed. Covertly, you can standardize input variables in advance.

• Plot the average cost trend graph: The random gradient descent method usually brings some random noise during the update process, so we can consider measuring the convergence of the algorithm by observing the changes of the average error of the process during 10, 100, or 1000 updates.

to sum up

This article mainly introduces the gradient descent method in machine learning. After reading this article, you learned that:

• Optimization theory is a very important part of machine learning.

• Gradient descent is a simple optimization algorithm that you can apply to many machine learning algorithms.

• The batch gradient descent method calculates the derivative value of all parameters before performing the parameter update process.

• Stochastic gradient descent means calculating the derivative from each training instance and performing the parameter update process.

15.6 Inch Laptop

15.6 inch Laptop is one of the most important sizes on the market, more than 85% clients choose this size for business, teachers, middle or high school students, university students projects. 15.6 inch Gaming Laptop is ranking the first level of custom laptop, you can see i5 15.6 inch laptop, intel celeron n5095 Cheap 15.6 Inch Laptop, i7 11th 15.6 Inch Laptop In cm, etc.

15inch gaming Laptop with 11th Gen Intel Core i5-1135G7 processor ( up to 4.2GHZ, 4core, 8threads, 8MB caches) or 15.6 inch i7 1165G7 8 512gb Solid State Drive laptop ( up to 4.7GHZ, 4core, 8threads, 12MB Caches) should attractive your eyes when choose a competitive and hot gaming laptop.

Of course, other parameters levels, like 14 inch n4020 64gb laptop for online classes, 10.1 Inch Laptop equipped with 64gb rom, android 11 or windows 10 , or 11 Inch Windows Laptop in metal with 360 rotating, celeron quad core cpu, etc.

Except, 8 inch Android Tablet, Mini PC host, All In One Desktop also available here. So just share your idea about what you exactly need, then we do following for you.

15.6 Inch Laptop,15.6 Inch Gaming Laptop,15.6 Inch Laptop In Cm,I5 15.6 Inch Laptop,Cheap 15.6 Inch Laptop

Henan Shuyi Electronics Co., Ltd. , https://www.shuyilaptop.com