A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT
Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract PFMs are used for various downstream tasks with different data modalities Pretraining is used to provide reasonable parameter initialization for a wide range of applications GPT and BERT use Transformers to train on large datasets AI has made waves in a variety of fields over the past few years This study provides a comprehensive review of recent research advancements, current and future challenges, and opportunities for PFMs Paper Content Introduction PFMs are essential components of AI in the era of big data PFMs are studied in the three major AI fields: NLP, CV and GL PFMs are powerful general models that are effective in various fields or across fields PFMs have demonstrated great potential in learning feature representations in various learning tasks PFMs show superior performance for training on multiple tasks with large-scale corpus and fine-tuning it to similar small-scale tasks Pfms and pretraining PFMs are based on pretraining technique which uses large amounts of data and tasks Pretraining originates from transfer learning in CV tasks When applied to NLP, LMs capture rich knowledge beneficial for downstream tasks Pretraining data can be derived from any unlabeled text corpus Early pretraining was static, but dynamic pretraining techniques have been proposed PFMs are used for text, image, and graph tasks PFMs have two major advantages: minor fine-tuning and already vetted on quality Related work focuses on model efficiency, security, and compression Contribution and organization Several survey studies have reviewed pretrained models for specific areas Bommasani et....