Attention Is All You Need

2017

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin

Abstract:
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

🔒 Full Content Protected

Complete the challenge below to view the full paper content.

📚 Paper Vault

Attention Is All You Need

🔒 Full Content Protected

Verify you're human

Full Paper Content

1. Introduction

2. Background

3. Model Architecture

4. Why Self-Attention

5. Training