Transfer Learning– Short Explanation

Transfer learning is a machine learning phenomenon where knowledge gained from solving one problem is applied to another similar problem. For instance, a machine learning model trained to classify images to identify a bird in an image can be reused to recognize any other object, like animals in an image.
Transfer learning is thus defined as reusing information collected from a previous machine learning task. This method is getting popular as it allows for quickly developing models, even when there is little data available. It is used in several computer vision and natural language processing applications.
The concept of transfer learning can also be related to other problems, such as multi-task learning and concept drift, besides its use in machine learning.

What is transfer learning

Machine learning models take a long time to train and require huge volumes of data to gain a good level of accuracy in their results. As most AI and machine learning tasks try to mimic how the human brain and knowledge transfer work, it makes sense to think of a solution where learnings from one model can be applied to another.
Humans normally educate themselves about various subjects, can correlate their learnings, and apply a similar solution to multiple problems. For instance, when someone learns how to ride a bicycle, it becomes easier to learn to ride a motorbike. When the same technique is adapted for machine learning, it is called transfer learning. For instance, a machine learning model trained to recognize food items in an image can be reused to create a model that can also detect drinks from images.
Transfer learning thus helps go beyond the isolated learning taken up by machine learning models and allows one to use acquired knowledge from one task to another similar task.
This transfer learning approach is quite popular with deep learning models, where many complex problems are worked upon with the help of neural network models.
Given its wide range of applicability and ongoing research, the terms — learning to learn, knowledge consolidation, and inductive transfer — are all used synonymously to denote transfer learning.

Video on Transfer Learning

What is Transfer Learning? [Explained in 3 minutes]

How does transfer learning work?

Transfer learning can be simplified into the following steps:

  • Train a machine learning model to solve a problem or execute a particular task. Say, recognize birds from a picture.
  • The knowledge gained from this model is used to execute another task B, say recognizing animals from a picture. The learned weights from the network gained while training for task A are transferred to task B, accelerating the model training process for task B. The knowledge transferred from task A to task B could vary depending on the problem and available data. It can be the model fit, it can be training data features, it can be the algorithms used, and so on.
  • This is advantageous, especially in cases where task B has little data to work with. The learnings from task A can be exploited without training the model for task B with enormous amounts of data that would be otherwise needed.
  • Given the conceptual nature of this technique, it has a wider scope beyond machine learning. It can also be seen as a design methodology in fields such as active learning. It is also applied in the field of cognitive science.

Inductive transfer

The inductive transfer is the process by which general features suitable for both the base and target tasks are transferred from the base network to a target network. The base network is first trained on the base dataset, and then the results and learned features from this task are repurposed for the target network.
This type of transfer learning is used in deep learning models. The scope of the possible target models can be figured out by using a model fit of a base model or task. Inductive transfer improves the learning process of the target model by narrowing down the model bias.
To make use of transfer learning in predictive modeling, you can follow any of the two common approaches

  • Develop model approach
    In this approach, a base or source task is first identified and developed into a model. This model is then used as the base to create a model fit upon which several other models can be adapted or refined from.
  • Pre-trained model approach
    In this approach, an existing pre-trained source model is selected and used as a base upon which the secondary models are built. This is the popular approach used in many deep-learning models.
  • Feature extraction
    Another approach to implementing transfer learning is to gather the most important features of a problem and then use them to create a better model. This approach is also called representation learning and helps improve the model design and performance compared to manual feature extractions.
    While capable of figuring out the best representative features, Neural networks can still use transfer learning to save up on computing resources and time. This approach also helps reduce data size and overall computation time, making it suitable for traditional algorithms.
    Programming-wise, developers can make use of the available transfer learning algorithms and libraries such as ADAP (Python), TLib (Python), and Domain Adaptation Toolbox (Matlab).
    While traditional machine learning systems learn from only their input dataset, transfer learning allows for knowledge gained from a previous system and the input dataset to be used for developing a model. Thus compared to the isolated learning carried out by individual models, transfer learning can leverage knowledge and solve problems with less data, computational power, and a shorter period.
    Mathematically transfer learning can be described with the following representation:
    Consider a domain D with two components D = {x, P(x)}
    X represents the feature space and P(X) the marginal distribution of it. I.e P(X) , X = { x1,…., xn}
    Xi represents the individual vector. If a task to be completed by the model is denoted as T it can be defined with two components
    T ] {Y, P(Y|X)} = { Y, η} , Y = { y1, …., un}
    Here Y is the label space and η represents the predictive function.
    With these representations, Sebastian Ruder explains transfer learning as
    “Given a source domain Ds , a corresponding source task Ts, as well as a target domain Dt, and a target task Tt, the objective of transfer learning now is to enable us to learn the target conditional probability distribution P(Yt| Xt) in Dt with the information gained from Ds and Ts where Ds ≠ Dt or Ts ≠ Tt”

Transfer deep learning types

  • Domain adaptation
    When a suture domain with different feature spaces and distributions is adapted for another target domain, it is referred to as domain adaptation. This type of learning can be seen in computer vision projects.
  • Domain confusion
    To make the source and target domains resemble each other to at least some degree, a sort of objective is added to the source domain. This introduced confusion can help make the samples more similar and help with easier transfer learning across the domains.
  • Multi-task learning
    In this type of learning, multiple tasks are learned simultaneously without distinguishing between the source and targets. This helps develop a richer combined feature vector that can be applied to various problems within the domain, allowing for a shared knowledge system.
  • One-shot learning
    One-shot learning is used for classification tasks such as facial recognition, where the exact input and features must be transferred from one system to another.
  • Zero-shot learning
    This strategy is used in cases such as machine translation where there is a lack of labeled data in the target language. This approach is used to deal with unseen data and would thus require additional data while training.

Transfer learning applications

  • Transfer learning with image data
    A good example of how transfer learning works with deep learning can be demonstrated using predictive models that use image data as input. Developers can use existing image predictive models such as the Oxford VGG model, Google Inception Model or the Microsoft ResNet model and incorporate them into their target tasks. These pre-trained models have already extracted features from huge volumes of images and can be effectively used in a new model.
  • Transfer learning with language data
    Natural language processing is another domain where transfer learning can find immense use. Similar word meanings can be easily derived by comparing the word mappings of a base model. This is done via word embedding, where certain words are mapped to a continuous vector space. Words with similar meanings will have similar vector space representations. This makes it easier for the newer models to form these word mappings with the knowledge transferred from a base model. There are quite a few efficient algorithms that make this transfer learning possible. Distributed word representation models such as Google’s word2vec Model and Stanford’s GloVe Model can be downloaded and used to develop deep learning language models faster and more efficiently.
  • Transfer learning from simulations
    Simulations are used by many commercial ML applications to gather data and train models for further accuracy. The simulation presents a controlled and less risky way to safely gather data that can be used for further learning in machine learning systems. Udacity’s open-source simulator and OpenAI’s Universe are some simulation systems that serve as examples of this type of transfer learning application.
    As transfer learning can be effectively applied to image data, it has found great use in medical diagnosis with applications specializing in cancer subtype discovery, medical imaging, and so on. It is also used for general game playing, text classification, and digit recognition, all of which use image data as input.
    Transfer learning also finds use in cognitive science and is being researched to be used in understanding EMG signals and EEG brainwaves. The use of transfer learning in CNNs and neural networks have proved to be more efficient as well.


Training Data is available from clickworker in all quantities and in high quality to train your transfer learning system optimally

More About Datasets for Machine Learning

Advantages and challenges of transfer learning

Transfer learning can save up a lot of time that goes into training similar models. By applying the learnings from the first model, the second model training can be optimized and can also be able to have improved performance.
Here are some of the advantages that transfer learning provides:

  • Helps with rapid training and progress of machine learning projects, given that a similar pilot project has already been used.
  • Target models with transfer learning usually have a better-converged skill
  • Allows you to create models even when you have little data available. As you must be aware, training machine learning models require a large amount of data. But in cases where data is scarce, using transfer learning could help develop models that would otherwise be impossible to develop.
  • Improves the efficiency of reinforcement learning techniques.
  • Helps reduce the computational power required to develop new models and train them.

In general, thus you use transfer learning when you are pressed for time and require better performance. Whenever you have an existing pre-trained model already in use for a similar task, it makes sense financially and logically to reuse it for better accuracy and faster time to market. Transfer learning also becomes much easier when both models have the same input.

Limitations of transfer learning

Having listed out the advantages of transfer learning, it is also necessary to note that it cannot be applied to every case. Transfer learning has its limitations, such as those explained below.

  • Transfer learning can only be utilized for generic features that can be easily compared between the two models. Unique and model-specific features cannot be replicated or found in other models; thus, transfer learning will find little use in such cases.
  • You might have to take up certain preprocessing steps if the input data size for the pre-trained task A differs from that of the new task B.

History and future of transfer learning

The first mathematical model for transfer learning was put forth by Stevo Bozinovski and Ante Fulgosi in 1976. Later in 1981, the applicability of transfer learning was experimentally demonstrated. Several algorithms have since been published on transfer learning, specifically the 1993 discriminability-based transfer (DBT) algorithm. Works by Pratt and Sebastian Thrun have refined the transfer learning concept and advanced techniques such as multi-tasking learning.
A major motivation behind the AI-based transfer learning methodologies was seeded by The Neural Information Processing Systems (NIPS) 1995 workshop Learning to Learn: Knowledge Consolidation and Transfer in Inductive Systems.


Transfer learning is one of the growing research topics among AI engineers and machine learning experts, given its capability to develop models quickly and efficiently. Andre Ng has famously noted that transfer learning would be the focus of commercial ML applications right after the more popular supervised learning.
While they are still nowhere near to how actual knowledge transfer happens with human education, transfer learning presents a viable approach where AI systems can also learn from each other and grow more intelligent daily. It has found its use in many kinds of machine learning-based applications and is also applied in the field of cognitive science.

FAQs on Transfer Learning

When should transfer learning be used?

Transfer learning can be used when you are pressed for time and require better performance. Whenever you have an existing pre-trained model already in use for a similar task, it makes sense financially and logically to reuse it for better accuracy and faster time to market.

What is the principle of transfer learning?

Transfer learning is the process of using a machine learning model that has already been trained to solve a separate but connected problem.

What is the history of transfer learning

  • The first mathematical model for transfer learning was put forth by Stevo Bozinovski and Ante Fulgosi in 1976.
  • Later in 1981, the applicability of transfer learning was experimentally demonstrated.