Advice from top AI researchers: Do not step on these four pits!

Machine learning is so popular that it is all treated as AI itself, and even more intense deep learning. Happily, your start-up team has already received funding, or your team budget has just been passed. Now you are about to start entering deep learning.

Before, you have felt happy with artificial intelligence technology such as Keras, Imagenet, etc. This is very exciting! However, when you really want to start practicing artificial intelligence in business, there are several things that you must consider.

Next, I will elaborate on my suggestions with a few examples. These examples occurred when I was studying autopilot with comma.ai earlier this year with George Hotz.

Help, AI! Where can I go?

Don't let data and engineers out of touch

Deep learning is a data-first science. The whole point of your team or startup is to make this data meaningful. Think about it, you can only develop your text to make sense, and you can develop your artificial intelligence Bitcoin chat robot! Only by understanding the content of images, videos, etc. can you build the next Snapchat Stories-style automated multimedia collage.

You should treat data processing as a core part of your work. It must be done well in this area. For example, if you think "it only takes 15 minutes" to prepare and load the data set, then you must wait for it 15 times whenever you find a better model architecture or a bug in your Tensorflow code. Minutes of time.

The rules are simple. Version your dataset and preprocess it once and then use it repeatedly. Some tools like Celery and Luigi are your good helpers.

If you are working in a large team and all the tasks in the team need to be submitted to a cluster, then you should consider a data solution and provide data to the employees trained by the model in batches. Do not let the members of the team have to wait until the entire data set is loaded before you can know that the model is vulnerable.

Story: comma.ai may have the second or third largest driving data set in the world. In the early years of comma.ai, in order to train the driving model, it was necessary to load hours-long videos on large machines with more than 700 Gb of memory. Whenever George needs more data to train, he will immediately increase the memory by 100gb. The main job I joined was to develop a better version of this model, but I didn't want to wait 15 minutes to load the data. Instead, I got some content from a simple ZMQ service open source project. Since then, there is no longer any data for us to worry about. We can expand our training scale and use cheaper machines. Now that the training of the model is limited, only the GPU and its developers are left.

2. Start with what you can visualize

For deep learning, we are fortunate to have Tensorboard, recently launched Visdom, and other tools to help us visualize the results. I believe that data science is generally best suited for visual driver development because visualization allows you to properly handle the problems you encounter at every step of development. You don't have to learn d3.js to get useful visualizations unless you're a JavaScript fanatic.

Story: In my resignation talk, I consulted with George and hoped to get some suggestions for improving the efficiency of engineers (believe me, he is the most efficient person I have ever met, I will seize every opportunity to him Learn). His advice is to build something first and let these things visualize what I'm doing. This was what George himself had done. In addition, all of George's IPython notebooks have a sliding widget that can quickly show how the parameters affect the results in the prototype design.

3. Identify your validation/difficult case data sets as early as possible

I put the content of fun-filled visualizations in the second place so that you can take a short break after being frightened by "preparation data." However, if you want to avoid becoming a monkey on a typewriter, you will only add more layers to the neural network at random. You must learn how to measure progress.

Ask yourself which metrics are more relevant to good deliverables and what data you should track.

This may exceed the simple "random verification of 10% of the remaining data." The verified database preferably has the same statistical properties as the product. The same product can also be used to track down difficult, marginal, or even failed cases to create future verification sets. Therefore, your validation set may continue to evolve and should be versioned like a training set.

Story: I learned that for autopilot, the moments when you have to control the vehicle during driving are difficult cases and validation sets. However, the best verification test is to have an experienced control engineer on the road to accurately judge the quality of the autopilot system. If you are in this industry, it is best to go to Tesla to dig engineers (just kidding).

4. Premature expansion is the main reason for the collapse of early startups

When you hear this suggestion, you might say, "Don't try to teach me this. I've heard more business stories than you do!" True, but here's something new to tell you: You should treat GPUs and hardware training as Employees consider the same factors. Once you have hired/purchased more than you need, you will spend a lot of effort on arranging additional resources. Managing a cluster can be difficult, and the large-scale HPC for deep learning is itself a research topic.

My suggestion here is that before you want to buy a new GPU, you should ensure that all your GPUs are fully utilized. You can of course be as aggressive as Google, provided you have the same productivity and profitability as Google.

If your team and company are large enough, you must seriously recruit those who work on the infrastructure. If you employ researchers who are 10 times the number of hardware employees, but they are forced to wait, then the best situation is that they build their own infrastructure, the worst case is that they simply withdraw directly. This is certainly not the situation you want to see.

Story: When I once left the office and didn't let all my GPUs run, Niel (comma's vice president of mobile APP) gave me a very disappointed look, which even gave me a "free GPU phobia ". Today, this has become a very common problem.

That's right! Working in the field of artificial intelligence is both challenging and fun. Make sure you have some thoughts on how to deal with resources and visualizations, then you will be fine.

燑br>

Aspire

Aspire Vape pen, Manufacture Aspire Vaporizer, Aspire Vape pod

Shenzhen Xcool Vapor Technology Co.,Ltd , https://www.szxcoolvapor.com

Posted on