Introduction: Machine learning is a multidisciplinary field that has risen for more than 20 years. It involves many disciplines such as probability theory, statistics, approximation theory, convex analysis, and computational complexity theory. Machine learning theory is mainly to design and analyze some algorithms that allow computers to automatically "learn". The machine learning algorithm is an algorithm that automatically analyzes and obtains the rules from the data and uses the law to predict the unknown data. Because a large number of statistical theories are involved in learning algorithms, machine learning is particularly closely related to inferential statistics. It is also called statistical learning theory. In terms of algorithm design, machine learning theory focuses on achievable and effective learning algorithms. Many inference problems are difficult to follow without a program, so part of the machine learning study is to develop an easy-to-handle approximation algorithm.
Machine learning has been widely used in data mining, computer vision, natural language processing, biometrics, search engines, medical diagnostics, detection of credit card fraud, stock market analysis, DNA sequencing, speech and handwriting recognition, strategy games, and robotics.
What is machine learning?
Machine learning is a data analysis method that automatically analyzes the building of a model. By using an iterative learning algorithm, machine learning can make the computer discover hidden areas without being explicitly programmed.
Iteration is very important in machine learning. Due to its existence, the model can adapt itself to the data when it encounters new data. They can learn from previously generated reliable calculations, repeated decisions and results. Machine learning is not a new discipline - it is a discipline that gains new impetus.
Due to the emergence of new computing technologies, machine learning today is very different from the past. Although many machine learning algorithms have existed for a long time, the ability to automatically apply complex mathematical calculations to big data (one after another, faster and faster) is the latest development. Here are some examples of widely advertised machine learning applications that you may be familiar with:
· A lot of hype, Google self-driving car? The essence of machine learning.
· Online referral services like Amazon and Netflix? The application of machine learning in daily life
· Know what customers have said about you on Twutter? Machine learning combined with linguistic rules.
· Fraud detection? A more obvious and important use in our lives today.
Why are more and more people interested in machine learning?
The revival of people’s interests in machine learning is also due to the same factors that data mining and Bayesian analysis are more popular than ever. With similar amounts of growth and available data, computing is more affordable, more powerful, and affordable for data storage.
All of these factors imply that machine learning can generate models faster and automatically to analyze larger, more complex data, and that the transfer is more rapid and the results are more precise—even on a very large scale. The result is? In the absence of human intervention in the real world, high-value predictions can lead to better decisions and more sensible behaviors.
The establishment of automatic models is a key to generating sensible actions in reality. Analytical thought leader Thomas H. Davenport wrote in the Wall Street Journal that the ever-changing, ever-growing data, "...you need fast-moving modeling streams to keep." And you can do it through machine learning These ones. He also said that "humans can usually create one or two good models a week; machine learning can create thousands of models a week."
What is the current application of machine learning?
Have you ever wondered how an online retailer can instantaneously provide you with quotes for products that may be of interest? Or how does a lender provide a near-real-time response to your loan request? Many of our daily activities are driven by machine learning algorithms, including:
What are the most popular learning methods in machine learning?
The two most widely adopted methods of machine learning are supervised learning and unsupervised learning. Most machine learning (about 70%) is supervised learning. Unsupervised learning accounts for about 10%-20%. Semi-supervised and reinforcement learning are two techniques that are sometimes used.
Supervised learning algorithms use tag instances for training, just like the inputs that are known to require output. For example, a device may have data points marked "F" (Failed) or "R" (Run). The learning algorithm receives a series of inputs with corresponding correct outputs, and the algorithm learns by comparing the actual output with the correct output to find the error. Then modify the model accordingly. Through the methods of classification, regression, prediction, and gradient enhancement, supervised learning uses patterns to predict the value of additional unlabeled data labels. Supervised learning is commonly used to use historical data to predict future events that may occur. For example, it can predict when a credit card transaction may be fraudulent, or which insurance customer may file a claim.
· Unsupervised learning uses the opposite of non-historical tags. The system will not be informed of the "correct answer." The algorithm must figure out what is being presented. The goal is to explore the data and find some internal structures. Unsupervised learning works well for transactional data processing. For example, it can identify groups of customers with the same attributes (which can be treated the same in marketing). Or it can find the main attribute to distinguish the customer groups from each other. Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering, and singular value decomposition. These algorithms are also used for paragraph text topics, recommended items, and determining data outliers.
· The application of semi-supervised learning is the same as supervised learning. But it uses both tagged and untagged data for training—usually with a small amount of tagged data and a lot of unlabeled data (because untagged data is inexpensive and can be obtained with less effort) . This type of learning can use methods such as classification, regression and prediction. Semi-supervised learning is used when a fully-tagged training process is too costly for related labels. Among the early examples included the identification of a person's face on a webcam.
· Reinforcement learning is often used for robots, games and navigation. Through intensive learning, the algorithm maximizes the rewards generated through trial and error discovery actions. This type of learning has three major components: agents (learners or decision makers), environments (all agent interactions), and actions (what agents can do). The goal is to act on behalf of the agent and maximize the expected rewards within a given period of time. With a good strategy, agents will reach their goals faster. Therefore, the goal of reinforcement learning is to learn the best strategy.
What is the difference between data mining, machine learning and deep learning?
The difference between machine learning and other statistical and learning methods, such as data mining, is another hot topic of debate. In simple terms, although machine learning uses many of the same algorithms and techniques as data mining, one of the differences lies in the prediction of these two disciplines:
· Data mining is the discovery of previously unknown patterns and knowledge.
· Machine learning is used to reproduce the known patterns and knowledge, automatically applied to other data, then automatically apply these results to decision-making and action.
The current increasing ability of computers also stimulates the evolution of data mining for machine learning. For example, neural networks have been used for data mining applications for a long time. As the computing power increases, you can create many layers of neural networks. In machine learning languages, these are called "deep neural networks." It is the increase in computing power that ensures that automated learning handles many neural network layers quickly.
Further, artificial neural networks (ANNs) are simply a set of algorithms based on our understanding of the brain. ANNs can - theoretically - simulate any kind of relationship in the data set, but in practice it is very tricky to get reliable results from neural networks. Research on artificial intelligence dates back to the 1950s — it was tagged by the success and failure of neural networks.
Today, a new area of ​​neural network research called “deep learning†has achieved great success in many areas where past artificial intelligence methods have failed.
Deep learning combines computational power and special types of neural networks to learn complex patterns in large amounts of data. Deep learning technology is currently best at recognizing words in the image and words in the sound. Researchers are now looking for ways to identify these successful patterns to more complex tasks such as automatic language translation, medical diagnosis, and many other important social and business issues.
Machine Learning Algorithms and Processes
algorithm
SAS' graphical user interface helps you build machine learning models and implement an iterative machine learning process. You are not required to be a senior statistician. We can choose machine learning algorithms to help you quickly get value from big data, including many SAS products. SAS machine learning algorithms, including:
Tools and Processes
As we now know, it is not just an algorithm. In the end, the secret to getting the most out of your big data lies in pairing the best algorithms with the task at hand:
SAS machine learning experience and expertise
SAS continues to find and evaluate new methods. They have a long history of implementing statistical methods to solve the problems you face most. They combine the rich, complex heritage of statistics and data mining with the latest, state-of-the-art architecture to ensure that your model runs as fast as possible (even in a large corporate environment).
We understand that fast time values ​​mean not only the performance of fast, automated models, but also the time it takes for data to move between platforms—especially for big data. High-performance, distributed analytics technology benefits from the massively parallel processing that combines Hadoop with all major data foundations. You can quickly cycle through all the steps of the modeling process—without moving data.
Via:SAS
PS : This article was compiled by Lei Feng Network (search “Lei Feng Network†public number) and it was compiled without permission.