Baidu Wu Enda: How to Use GPUs to Create Artificial Intelligence "Rockets"

Editor's note: This article was compiled by Baidu's chief scientist, Wu Enda, at GTC China 2016. At this yearâ€™s GTC China, Wu Endaâ€™s issue sharing is exactly what a neural network is and why GPUs are well-suited for neural network training.

Hello everyone, people are now saying that artificial intelligence is new energy. Electricity has changed many different industries. I think artificial intelligence will also bring about the same big changes in many industries. Now most of the artificial intelligence calculations need to rely on the GPU. I would like to share with you today why this is happening, and share with you the effect of artificial intelligence on your work.

Our people in the field of artificial intelligence are very fortunate, because it was not clear how important it was to learn deeply at a very early age. At that time, Huang Renxun did a lot of work on artificial intelligence to produce a platform for this type of GPU, making Baidu and other companies can achieve a lot of achievements.

Two weeks ago, Baidu released a number of technologies to provide services for everyone. Many of them use machine learning and deep learning, especially GPU-based learning. In the end what is deep learning, we tend to compare it with the neurons in the brain. I hope to introduce deeper technical issues. What exactly is neural network? Why do we think GPU is very suitable for training these neural networks? .

The 99% of our deep learning work can be compared to multiplying different matrices or multiplying matrices and vectors. From day one, GPUs are very efficient in matrix multiplication, so the entire field uses the GPU platform to do their job.

At present, the economic value of virtually all AI comes from a model called supervised learning.

What is supervised learning? What to input is what to output. If you want to perform face recognition, you want to train the system of face recognition. First of all, there are some data groups, such as a face and a face. We use neural networks to perform multi-matrix multiplication. Training, this is how we conduct face recognition. The economic value of many deep learning lies in finding very clever ways to use supervised learning. There are more examples, such as wanting to identify spam; if you have enough data, you can see a lot of users and advertising information in Baidu can also train a model to predict whether the user will click on an ad. So it is to find a very clever supervised learning model, which also brings a lot of economic value. There are a lot of basic research we are doing now on supervised learning, and there are intensive learning and many other learning. I hope that in the coming years, we will also do other areas. This formula of supervised learning is enough to bring about a lot of changes in your current work.

All the details of these technologies I have talked about have actually existed 20 years ago. So why is deep learning now really developing?

There are several major trends. The first is that the reason that deep learning has just developed in recent years is because of the scale. I like to do an analogy. Building artificial intelligence is like building a rocket. What exactly is a rocket? If you want to build a rocket, you first need a big engine and then you need a lot of rocket fuel. Both are very big. If the engine is very large, but the fuel is very small, this rocket can not fly far. If the engine is very small and the fuel is very high, it is possible that the rocket will not be able to take off at all. Only when the engine is very large and fuel is very large can a very good rocket be built. The neural network is like a rocket engine. Now we can build this kind of neural network because it is now scaled because of the development of the GPU. The previously mentioned rocket fuel is the data obtained by a large number of Internet companies today.

The innovation of the algorithm is also very important. For us, we must first establish a very good network and then have sufficient data. This is the basis. In the past few years, I have seen such trends, that is, scale. We learned about deep learning through common CPUs about ten years ago. At that time, there were probably 1 million connections, and the progress was very slow. In 2008 we wrote the first article on neural network training on CUDA. It was a study at Stanford University with a 10-fold change.

In 2001 I led a Google team and we used CPU computing to further scale and use a lot of CPUs. But soon we realized that using a lot of CPUs to use cloud computing actually didn't really promote deep learning. At that time, at Stanford and later at Baidu we realized that HPC HPC was used. The recent use of supercomputers can further promote the progress of deep learning algorithms, so the most advanced deep learning systems have begun to use high performance algorithms. We need to train 20 billion times for a speech recognition model. We need to spend $1 million to train on a model. One of our researchers needs to spend $100 on a model and it needs 4 trillion. Byte data.

Baidu is the first company in the world to establish a GPU cluster for deep learning. We not only train but actually operate. Our early investment is optimistic about the ability of GPU to help us lead in this area and promote AI capabilities. development of.

Next, I would like to share an example with everyone to explain why deep learning has changed many Baidu AI applications.

The previous speech recognition system was divided into many stages. First, input an audio, capture the features of the audio, obtain the phonemes, have a language model, and then transcribe. In 2011, when we established a speech recognition system at Baidu, we thought that we had spent several decades doing speech recognition. We still replaced our entire work with neural networks. We found that with a large neural network, which is equivalent to a rocket engine, using end-to-end learning methods can help us train the best speech recognition systems.

Last month we worked with Stanford University and the University of Washington to discover that if you want to use a mobile phone to enter certain information, using speech recognition can be 3 times faster than using a keyboard, and these results are dependent on our DSP system.

Before we talked about the importance of scale, including the size of the calculations and the size of the data, for these deep learning systems to train, here I would like to introduce a simple method to you, if you can improve the performance of the machine learning system, of course, a bit It's too simple, but when my team asked me how to upgrade their machine learning system, I would first tell them this simple method.

First of all ask them, are they currently performing on training data? If not, then I will tell them that your neural network needs to have a larger scale, which means that the rocket's engine is stronger and bigger. Then you continue to improve in this area until you have good performance on the training data. After that, ask how well you perform on the test data. If not, I would tell them that the data is more, that is to say, There must be more rocket fuel. Continue to improve in this area until it performs well on measured data. This is a very simple formula. The real world will be more complicated. It's too simple, but such a simple method is of great help. It helps us improve the performance of the system. I also believe that it can help everyone's performance of the machine learning system. Promote.

In the past few years, many performance improvements have been due to the increase in computing and data size. The reason for the increase in the scale of computing lies in the emergence of GPU computing. This is actually much more complicated than this. If you want to understand the details, how can you improve the performance of machine learning? , we can refer to a book I have written, from this site can get a free book.

I talked about using GPUs for training. I also saw this helpful to Baidu's work and the work of many other companies. Another trend is to use GPUs not only for training but also for providing online services. HPC trained a huge neural network, we found that we have a problem, how can we put such a large neural network on the server to provide online services?

If you look at the traditional framework for providing online services, the traditional CPU server architecture, the architecture is like this is a CPU server, there are several threads, if a user has some data, he gave a 4 by 1 Vectors, such as some voice data, are handed to a thread for calculation and output. The second user came in, or used a second thread to help him calculate, the third and fourth is the same. This is more traditional CPU architecture provides online services. Because we are training very large neural networks in supercomputing and using many GPUs, we find it very difficult to deploy these very large models in traditional CPUs because this architecture is not suitable.

Baidu was the first large company to announce the GPU's investment in the business, which is reasoning and providing services, not just training.

We have a specialized technology called Batch Dispatch. We put the data into our data center. If the user appears, when he has some data input, we will temporarily let this data wait a little bit, and then When several users appear, each has its own data and makes them a batch. We stack these vectors together into a matrix, which is the first second, fourth, and fourth, and becomes a 4 by 4 matrix. At the same time, it is handed to the GPU processor for processing. It processes the data of these four users at the same time. These results will also come out at the same time. The GPU has very strong parallel processing capability and can perform parallel processing very efficiently. We take the result. After that, they are separated and provided to four users.

We have found that this allows us to have a larger model scale and to provide more users with services at a lower cost. Yesterday, we were in charge of Baidu's data center and data center. We saw a trend in Baidu. Now we are increasingly using GPUs and HPC in data centers. Therefore, our team is redesigning data centers. To make better use of high-density computing models, we have teams redesigning the power supply and cooling, so that we can incorporate higher-density computing stations into our data centers for training and reasoning. Some of you may be working in the data center. There is a lot of work here that can be done to redesign the data center architecture to use these high-density GPUs.

Before I talked about deep learning, the first one was the size of the calculation and the size of the data. The second trend I have seen over the past few years, deep learning can now give more complex output. What I mean is that most of the machine learning that was done five years ago was just integers. For example, if you enter an email and you type 0 or 1, it's not trash. The image is the same. The output is an integer. Changes have now taken place, and more and more deep learning can output very complex results, such as a sentence or an image. Our DSP Batch system input audio clips, can output an English or Chinese sentence, the picture shows that we can enter the picture, the output is a picture description to describe the picture that this is a yellow car opened on the road. So now neural networks can output complex things such as sentences and some picture descriptions, not just some integers. Including translation, you can enter sentences in English and then output sentences in Chinese. You can also correct the grammar. You may input grammatically incorrect text and output sentences that are grammatically correct. This important trend can also be used very intelligently to gain greater value in AI and deep learning.

Of course, we also know that the main limitation of AI now lies in this way of learning, that is, the way of supervised learning requires a lot of marked data. In the future, I hope we can make some breakthroughs in unsupervised learning, but at present we can supervise Learn to transform a lot of industries to achieve great development.

We just mentioned that scale is very important. We need a lot of data to train a lot of models. Scale is very important, we need to use a lot of data to train big models. There is another reason,

Why does AI need to be calculated?

Let's take a look at this simple example of a neural network. We have to spend a lot of time and a lot of experiments to discover the structure of these neural networks. I may have been working in this area for 25 years now, and when I start to start a new problem, I donâ€™t know what kind of The network is suitable. Researchers need to do a lot of experiments. A dozen or a few hundred models can find a good model to accomplish this task. With so much training data, the speech recognition system has 50,000 hours of data, so it may take 3 months for you to perform such a training, so that the time utilization of the researchers is not so high. For another reason, Baidu spends a lot of effort to optimize developer's efficiency. Because you are doing this model, you don't know exactly what kind of model you want to do. You have to do a lot of experiments to find out what is feasible. We find that we invest in computing systems to speed up the process of experimentation, trial and error, and make researchers more efficient, allowing them more time to invent new ideas.

Therefore, in this regard, we strongly emphasize that we are the first investment computing platform based on the GPU-based HPC computing platform. Secondly, we are investing heavily in the development of easy-to-use deep learning tools. We have made our own deep learning platform open source. It is called PaddlePaddle and it is easy to use. Everyone can easily try the deep learning model to find out exactly what The model is most suitable for your application. PaddlePaddle supports multiple GPUs. We are not currently performing calculations on one GPU. We can experiment with 32, 64, and 128 GPUs at a time.

I have high hopes for the future of AI, and have full confidence in the future of artificial intelligence. I hope that in a few years we can use artificial intelligence to accompany robots to personalize personal teaching, music composition, and robot doctors. These products and technologies It can bring immense changes to many industries and also bring great value to human beings. Many of these projects are in the research phase. In the era of artificial intelligence, if you listen to us about the future, the future will soon come.

I would like to show you an example. We are working on a Baidu medical brain project. This project is under research. Please take a look at this video. If you enter a question, the baby has a fever and a lot of rash, Baidu medical brain this software will understand your question, ask you a lot of the problem, if you slowly answer its questions, it can identify your condition What is it like? It can also output some information and suggestions about your condition. Of course, this software is not a substitute for a doctor. If the patient wants to use this information, he must first discuss it with the doctor. The technology is still in the research stage. I hope this technology can bring many useful information to patients and doctors in the future.

I think we are very fortunate to have such a good GPU platform and develop many AI applications on this platform. I am very excited about the development of AI tools at Baidu. It is not only helping us but also helping many industries. I am in Baidu. We hope to develop some AI tools based on our hardware to help everyone. thank you all!

Bluetooth Headsets

The bluetooth headsets are to apply Bluetooth technology to the hands-free headset, so that users can avoid the annoying wires and easily to talk in various ways. With Bluetooth headset, you can write EMAIL while talking on the phone, you can talk while driving, you can call while doing housework, no more annoying wires!

Advantages:

1: Bluetooth headsets generally has a standby time about 100 hours or more, talking time is about 5 hours or more . It can avoid the trouble of charging constantly, and it is especially convenient to carry.

2: Using Bluetooth headset is very "healthy". The electromagnetic wave of the Bluetooth headset is much lower than that of the mobile phone. When you talk on the phone, just put the mobile phone in your briefcase or in your pocket, put on the headset and talk easily, neither need to raise your hand. It can also effectively reduce the impact of electromagnetic waves on the human body.

Bluetooth Headset

Bluetooth Headsets,Silicone Earbuds,Wireless Earbud,Bluetooth Wireless Earbuds

Shenzhen Linx Technology Co., Ltd. , https://www.linxheadphone.com