[{"data":1,"prerenderedAt":226},["ShallowReactive",2],{"content-query-IqGO2B53HS":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":10,"date":11,"cover":12,"type":13,"category":14,"body":15,"_type":220,"_id":221,"_source":222,"_file":223,"_stem":224,"_extension":225},"/technology-blogs/en/1622","en",false,"",[9],"MindSpore Made Easy","Take a deep dive into the deep learning networks.","2022-06-14","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/07/18/c87abaa90dad4dffab916adbef748182.png","technology-blogs","Basics",{"type":16,"children":17,"toc":217},"root",[18,32,38,47,52,60,65,70,75,80,88,95,100,105,110,115,120,125,130,135,143,148,153,158,166,173,178,183,188,193,200,205,212],{"type":19,"tag":20,"props":21,"children":23},"element","h1",{"id":22},"mindspore-made-easy-deep-learning-series-confusing-concepts-in-deep-learning",[24,30],{"type":19,"tag":25,"props":26,"children":27},"span",{},[28],{"type":29,"value":9},"text",{"type":29,"value":31}," Deep Learning Series - Confusing Concepts in Deep Learning",{"type":19,"tag":33,"props":34,"children":35},"p",{},[36],{"type":29,"value":37},"In this article, we will talk about some confusing concepts in deep learning.",{"type":19,"tag":33,"props":39,"children":40},{},[41],{"type":19,"tag":42,"props":43,"children":44},"strong",{},[45],{"type":29,"value":46},"1. Training Set, Validation Set, and Test Set",{"type":19,"tag":33,"props":48,"children":49},{},[50],{"type":29,"value":51},"Correctly configured training set, validation set (dev set), and test set can greatly help you create efficient neural networks. However, even deep learning experts find it almost impossible to correctly guess the best choice of parameters for optimal fitting at the first time. Building a deep learning network is an iterative process. You start with an idea, code a network, and run experiments. The efficiency of this cycle determines how fast a project progresses. High-quality training sets, validation sets and test sets help the cycle iterate more efficiently.",{"type":19,"tag":33,"props":53,"children":54},{},[55],{"type":19,"tag":56,"props":57,"children":59},"img",{"alt":7,"src":58},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/07/18/9c6b33eb4bb44ef5a561e40a61ef3117.png",[],{"type":19,"tag":33,"props":61,"children":62},{},[63],{"type":29,"value":64},"In the early stage of machine learning development when the amount of data was small, the common practice was to use 70% of the data as the training set and the remaining 30% as the test set. If no separate validation set is specified, data can be split into training, validation, and test sets at a 3-1-1 ratio. But in this big data era, the validation and test sets take up a much smaller portion of the total data. A validation set is used to validate different algorithms and determine the most effective one. Although, this requires the validation set to be large enough, a small portion can work when the entire dataset is huge. Therefore, for a large dataset, the validation set or test set should be less than 20% or 10% of the total dataset.",{"type":19,"tag":33,"props":66,"children":67},{},[68],{"type":29,"value":69},"Here I'd like to share two tips:",{"type":19,"tag":33,"props":71,"children":72},{},[73],{"type":29,"value":74},"1. Make sure the data of the validation set and test set is from the same distribution.",{"type":19,"tag":33,"props":76,"children":77},{},[78],{"type":29,"value":79},"2. It might be OK to not have a test set. You can train on the training set, try different model architectures, evaluate these models on the validation set, and then iterate and determine an appropriate model.",{"type":19,"tag":33,"props":81,"children":82},{},[83],{"type":19,"tag":42,"props":84,"children":85},{},[86],{"type":29,"value":87},"1.2 Bias and Variance",{"type":19,"tag":33,"props":89,"children":90},{},[91],{"type":19,"tag":56,"props":92,"children":94},{"alt":7,"src":93},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/07/18/0306a71bd0cb44b8b8e007586a919523.png",[],{"type":19,"tag":33,"props":96,"children":97},{},[98],{"type":29,"value":99},"For the dataset in the figure, we can draw a straight line to obtain a logistic regression fit, but it is not a good fit due to high bias (underfitting).",{"type":19,"tag":33,"props":101,"children":102},{},[103],{"type":29,"value":104},"If we apply a complex classifier to the dataset, the variance will be high, causing overfitting.",{"type":19,"tag":33,"props":106,"children":107},{},[108],{"type":29,"value":109},"Probably, there might be a classifier in between that produces a reasonable fit to the data, that is, the \"just right\" fit.",{"type":19,"tag":33,"props":111,"children":112},{},[113],{"type":29,"value":114},"The keys to understanding the bias and variance are the train set error and dev set error.",{"type":19,"tag":33,"props":116,"children":117},{},[118],{"type":29,"value":119},"1. If a model has a train set error of 1% and a dev set error of 11%, it indicates that the model is doing well on the training set, but poorly on the validation set. The model may be over-fitted to the training set. In this case, the variance is high.",{"type":19,"tag":33,"props":121,"children":122},{},[123],{"type":29,"value":124},"2. If a model has a train set error of 15% and a dev set error of 16%, it indicates that the model is not well trained and under-fitted to the training set. In this case, the bias is high. On the other hand, the model produces a reasonable result on the validation set, since the dev set error is only 1% higher than the train set error.",{"type":19,"tag":33,"props":126,"children":127},{},[128],{"type":29,"value":129},"3. If a model has a train set error of 15% and a dev set error of 30%, the bias and variance are both high.",{"type":19,"tag":33,"props":131,"children":132},{},[133],{"type":29,"value":134},"4. If a model has a train set error of 0.5% and a dev set error of 1%, it is \"just right\" because the bias and variance are both low.",{"type":19,"tag":33,"props":136,"children":137},{},[138],{"type":19,"tag":42,"props":139,"children":140},{},[141],{"type":29,"value":142},"1.3 Solution to Under- and Overfitting",{"type":19,"tag":33,"props":144,"children":145},{},[146],{"type":29,"value":147},"When the initial model training is complete, you need to find out the bias of the model. If the bias is high, or the model even fails to fit to the training set, you should turn to another or more other networks without hesitation.",{"type":19,"tag":33,"props":149,"children":150},{},[151],{"type":29,"value":152},"You can also try using a larger network or training for a longer time until the bias is reduced enough for the model to fit the data.",{"type":19,"tag":33,"props":154,"children":155},{},[156],{"type":29,"value":157},"Once the bias is reduced to an acceptable value, check the model variance on the validation set. If the variance is high, feed more data to the model to solve the problem. If no more data is available, you can add regularization to avoid overfitting.",{"type":19,"tag":33,"props":159,"children":160},{},[161],{"type":19,"tag":42,"props":162,"children":163},{},[164],{"type":29,"value":165},"1.4 Why Regularization Reduces Overfitting",{"type":19,"tag":33,"props":167,"children":168},{},[169],{"type":19,"tag":56,"props":170,"children":172},{"alt":7,"src":171},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/07/18/e76d325df24447848fb23ad79f5102da.png",[],{"type":19,"tag":33,"props":174,"children":175},{},[176],{"type":29,"value":177},"Assume a neural network that is currently overfitting. Its cost function is J, which contains parameters W and b. We can add a regularization term to prevent the data weight matrices from getting too large.",{"type":19,"tag":33,"props":179,"children":180},{},[181],{"type":29,"value":182},"If the regularization parameter is set to a large enough value, the weight matrices W will be set to values close to 0. And if the weights of a lot of hidden units are set to 0, the impact of these hidden units will be zeroed out. In this case, the neural network becomes a small network, which is almost like a logistic regression unit with multiple layers. Thus the overfitting network becomes a high bias network similar to the left figure above. However, there will be an intermediate value of the regularization parameter that takes the network to the \"just right\" status.",{"type":19,"tag":33,"props":184,"children":185},{},[186],{"type":29,"value":187},"Assume that a hyperbolic tangent activation function tanh(z) is used.",{"type":19,"tag":33,"props":189,"children":190},{},[191],{"type":29,"value":192},"g(z) denotes tanh(z). If z takes on only a small range of parameters, the activation function will be close to a linear function.",{"type":19,"tag":33,"props":194,"children":195},{},[196],{"type":19,"tag":56,"props":197,"children":199},{"alt":7,"src":198},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/07/18/dfc7dbf1b0344140b477154e0d56f95d.png",[],{"type":19,"tag":33,"props":201,"children":202},{},[203],{"type":29,"value":204},"If the regularization parameter λ is large, the parameter of the activation function will be relatively small. If W is small, z is also small.",{"type":19,"tag":33,"props":206,"children":207},{},[208],{"type":19,"tag":56,"props":209,"children":211},{"alt":7,"src":210},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/07/18/4b6eb9b21bdd4dcc99164f3e7f8d1568.png",[],{"type":19,"tag":33,"props":213,"children":214},{},[215],{"type":29,"value":216},"In conclusion, if the regularization term is very large and the W parameter is small, z will take on a small range of values if we ignore the effect of b. As a result, the tanh activation function will be relatively linear so the whole neural network calculates values close to the linear function, avoiding overfitting.",{"title":7,"searchDepth":218,"depth":218,"links":219},4,[],"markdown","content:technology-blogs:en:1622.md","content","technology-blogs/en/1622.md","technology-blogs/en/1622","md",1776506103369]