5 Strategies to Forestall Overfitting in Neural Networks

By Abhinav Sagar, VIT Vellore

I’ve been engaged on deep studying for greater than a 12 months now. On this time interval, I’ve used numerous neural networks like Convolutional Neural Community, Recurrent Neural Community, Autoencoders etcetera. Some of the widespread issues that I encountered whereas coaching deep neural networks is overfitting.

Overfitting happens when a mannequin tries to foretell a development in knowledge that’s too noisy. That is the induced because of a very advanced mannequin with too many parameters. A mannequin that’s overfitted is inaccurate as a result of the development doesn’t replicate the fact current within the knowledge. This may be judged if the mannequin produces good outcomes on the seen knowledge(coaching set) however performs poorly on the unseen knowledge(check set). The aim of a machine studying mannequin is to generalize nicely from the coaching knowledge to any knowledge from the issue area. This is essential as we would like our mannequin to make predictions sooner or later on knowledge that it has by no means seen earlier than.

On this article, I’ll current 5 strategies to forestall overfitting whereas coaching neural networks.


1. Simplifying The Mannequin

Step one when coping with overfitting is to lower the complexity of the mannequin. To lower the complexity, we are able to merely take away layers or scale back the variety of neurons to make the community smaller. Whereas doing this, you will need to calculate the enter and output dimensions of the varied layers concerned within the neural community. There is no such thing as a common rule on how a lot to take away or how massive your community ought to be. However, in case your neural community is overfitting, strive making it smaller.


2. Early Stopping

Early stopping is a type of regularization whereas coaching a mannequin with an iterative methodology, reminiscent of gradient descent. Since all of the neural networks be taught solely by utilizing gradient descent, early stopping is a method relevant to all the issues. This methodology replace the mannequin in order to make it higher match the coaching knowledge with every iteration. Up to some extent, this improves the mannequin’s efficiency on knowledge on the check set. Previous that time nevertheless, enhancing the mannequin’s match to the coaching knowledge results in elevated generalization error. Early stopping guidelines present steering as to what number of iterations might be run earlier than the mannequin begins to overfit.


Early Stopping


This system is proven within the above diagram. As we are able to see, after some iterations, check error has began to extend whereas the coaching error continues to be lowering. Therefore the mannequin is overfitting. So to fight this, we cease the mannequin on the level when this begins to occur.


3. Use Information Augmentation

Within the case of neural networks, knowledge augmentation merely means growing measurement of the info that’s growing the variety of pictures current within the dataset. Among the in style picture augmentation strategies are flipping, translation, rotation, scaling, altering brightness, including noise etcetera. For a extra full reference, be at liberty to checkout albumentations and imgaug.


Information Augmentation


This system is proven within the above diagram. As we are able to see, utilizing knowledge augmentation numerous comparable pictures might be generated. This helps in growing the dataset measurement and thus scale back overfitting. The reason being that, as we add extra knowledge, the mannequin is unable to overfit all of the samples, and is pressured to generalize.


4. Use Regularization

Regularization is a method to cut back the complexity of the mannequin. It does so by including a penalty time period to the loss operate. The commonest strategies are often known as L1 and L2 regularization:

  • The L1 penalty goals to reduce absolutely the worth of the weights. That is mathematically proven within the under method.

L1 Regularization


  • The L2 penalty goals to reduce the squared magnitude of the weights. That is mathematically proven within the under method.

L2 Regularization


The under desk compares each the regularization strategies.


L1 vs L2 Regularization


So which method is best at avoiding overfitting? The reply is — it relies upon. If the info is just too advanced to be modelled precisely then L2 is a more sensible choice because it is ready to be taught inherent patterns current within the knowledge. Whereas L1 is best if the info is easy sufficient to be modelled precisely. For a lot of the laptop imaginative and prescient issues that I’ve encountered, L2 regularization nearly all the time provides higher outcomes. Nevertheless, L1 has an added benefit of being strong to outliers. So the proper selection of regularization relies on the issue that we are attempting to unravel.


5. Use Dropouts

Dropout is a regularization method that stops neural networks from overfitting. Regularization strategies like L1 and L2 scale back overfitting by modifying the associated fee operate. Dropout however, modify the community itself. It randomly drops neurons from the neural community throughout coaching in every iteration. Once we drop completely different units of neurons, it’s equal to coaching completely different neural networks. The completely different networks will overfit in several methods, so the online impact of dropout will likely be to cut back overfitting.


Utilizing Dropouts


This system is proven within the above diagram. As we are able to see, dropouts are used to randomly take away neurons whereas coaching of the neural community. This system has confirmed to cut back overfitting to a wide range of issues involving picture classification, picture segmentation, phrase embeddings, semantic matching etcetera.



As a fast recap, I defined what overfitting is and why it’s a widespread downside in neural networks. I adopted it up by presenting 5 of the commonest methods to forestall overfitting whereas coaching neural networks — simplifying the mannequin, early stopping, knowledge augmentation, regularization and dropouts.


References/Additional Readings

Why dropouts prevent overfitting in Deep Neural Networks
Here I will illustrate the effectiveness of dropout layers with a simple example. Dropout layers provide a simple way…


A comparison of methods to avoid overfitting in neural networks training in the case of catchment…
Artificial neural networks (ANNs) becomes very popular tool in hydrology, especially in rainfall-runoff modelling…


How to Avoid Overfitting in Deep Learning Neural Networks
Training a deep neural network that can generalize well to new data is a challenging problem. A model with too little…


Deep neural networks: preventing overfitting.
In previous posts, I’ve introduced the concept of neural networks and discussed how we can train neural networks. For…


This python library helps you with augmenting images for your machine learning projects. It converts a set of input…




If you wish to hold up to date with my newest articles and initiatives follow me on Medium. These are a few of my contacts particulars:

Comfortable studying, glad studying and glad coding!

Bio: Abhinav Sagar is a senior 12 months undergrad at VIT Vellore. He’s fascinated about knowledge science, machine studying and their functions to real-world issues.

Original. Reposted with permission.


About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *