Full Paper Content
1. Introduction
Language model pre-training has been shown to be effective for improving many natural language processing tasks. These include sentence-level tasks such as natural language inference and paraphrasing.
2. Related Work
Pre-training general language representations has a long history in NLP. Non-neural approaches that pre-train word embeddings have been widely used for years.
3. BERT
We introduce BERT and its detailed implementation in this section. There are two steps in our framework: pre-training and fine-tuning.
4. Experiments
We evaluate BERT on 11 NLP tasks. The GLUE benchmark includes a diverse set of sentence-level classification tasks.
5. Conclusion
We have presented BERT, a new model for pre-training and fine-tuning NLP systems. Our approach achieves state-of-the-art results on a wide array of sentence-level and token-level tasks.