欢迎,计算机科学与信息计算爱好者!

标签:Language modelling

  GLU Variants Improve Transformer  

  GLU Variants Improve Transformer   Noam ShazeerGooglenoam@google.com Abstract Gated Linear Units [Dauphin et al., 2016] consist of the component-wise produc……

Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking

Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking Samuel BroscheitData and Web Science Group, University of Mannheim, Germanybroscheit@informati……

Exploration Based Language Learning for Text-Based Games

Exploration Based Language Learning for Text-Based Games Andrea MadottoHKUST &Mahdi NamazifarUber AI &Joost HuizingaUber AI/ANDPiero MolinoUber AI &Adrien EcoffetUber……

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

Cross-Lingual Ability of Multilingual BERT: An Empirical Study Karthikeyan KDepartment of Computer Science and EngineeringIndian Institute of Technology KanpurKanpur, Uttar Prade……

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

This entry is part 1 of 1 in the series Language modelling 17 Sep 2019 Mohammad Shoeybi • Mostofa Patwary • Raul Puri • Patrick LeGresley&……