– What is Natural Language Processing (NLP) ? List some of the common NLP tasks
– What are transformers or transformer models?
– What are some of the most common practical, real world applications of NLP?
Pretraining, in the context of machine learning and neural networks, refers to training a language model from scratch. Starting with randomly initialized weights, the model is trained on a massive corpus of text data. Pretraining methods hide parts of the input from the model, and train the model to reconstruct those parts. During the training phase, the model learns to understand the structure of language, the relationships between words, and other linguistic features , thereby constructing a pretrained model (also known as foundational models) with updated weights.
Pretrained models have been remarkably efficient at building strong representations of language, initializing parameters for NLP models and creating probability distributions over language, from which we can generate samples. For instance, with the emergence of transformer models, a paradigm of pretraining followed by finetuning has developed. In this approach, pretrained models are fine-tuned on smaller, task-specific datasets to tailor them for NLP tasks such as sentiment analysis, text classification, and machine translation.
Pretraining a language model, however, can be a very resource intensive process, both in terms of time and money. It could take several days/weeks and significant financial investments, depending on the size of data to build a pretrained model. For instance, BERT, which has 340 milion parameters, was pretrained with 64 TPU chips for a total of 4 days [source]. Another example is GPT-3 model, which has 175 billion parameters, was trained on 10,000 V100 GPUs, for 14.8 days [source] with an estimated training cost of over $4.6 million [source].
Note: ChatGPT was powered by the original GPT-3 model when launched in Nov 2022 and has been evolving ever since
Fine-tuning, in the context of machine learning and deep learning, refers to the process of taking a pretrained model and further training it on a smaller, task-specific dataset. The goal of fine-tuning is to adapt the model’s learned representations to perform well on a specific task without starting the training process from scratch.
What is the advantage of fine-tuning instead of training a model from scratch?
The advantages of the pretraining followed by fine-tuning paradigm are substantial including reduced data requirements, shorter training times, decreased computational resources, and consequently, lowered costs.
Training a model from scratch often requires a substantial amount of data to learn meaningful representations. It is is computationally expensive and time-consuming. Fine-tuning, on the other hand, can work effectively with a smaller amount of task-specific data since it starts with knowledge learned by the pretrained model. Fine-tuning builds upon a pretrained model, and adapts it to the specific task, potentially leading to quicker convergence and better performance. This significantly reduces the amount of data, training time and computational resources needed.
Note: It is important to choose a pretrained model that bears some similarity to the task-specific dataset
Transfer learning is a machine learning technique where knowledge gained from solving one task is applied to a different, but related, task. Transfer learning can be particularly useful in scenarios where you have limited data for the target task but ample data for a related task.
Pretraining + Finetuning is one of the best examples for transfer learning. This involves training a model on a source task [Pretrained model] and then using the learned knowledge to improve the model’s performance on a target task [Finetuned model].