BERT (언어 모델)

BERT(Bidirectional Encoder Representations from Transformers)는 구글 연구원이 2018년에 도입한 마스킹된 언어 모델 제품군이다.^[1]^[2] 2020년 문헌 조사에서는 "1년이 조금 넘는 기간 동안 BERT는 모델을 분석하고 개선하는 150개 이상의 연구 간행물을 포함하는 자연어 처리(NLP) 실험의 유비쿼터스 기준선이 되었다."라고 결론지었다.^[3]

BERT는 원래 두 가지 모델 크기에서 영어로 구현되었다.^[1] (1) BERTBASE: 총 1억 1천만 개의 매개변수에 해당하는 12개의 양방향 자기 주의 헤드가 있는 12개의 인코더 및 (2) BERTLARGE: 총 3억 4천만 개의 양방향 자기 주의 헤드가 있는 16개의 인코더가 있는 24개의 인코더 파라미터. 두 모델 모두 토론토 북코퍼스(Toronto BookCorpus, 800M개 단어)^[4] 및 영어 위키백과(2,500M개 단어)에서 사전 훈련되었다.

성능[편집]

BERT가 출시되었을 때 다양한 자연어 이해 작업에서 최첨단 성능을 달성했다.

GLUE(General Language Understanding Evaluation) 태스크 세트(9개 태스크로 구성)
SQuAD(Stanford Question Answering Dataset^[5]) v1.1 및 v2.0
SWAG(Situations With Adversarial Generations^[6])

각주[편집]

↑ ^가 ^나 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (2018년 10월 11일). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. arXiv:1810.04805v2 [cs.CL].
↑ “Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing”. 《Google AI Blog》 (영어). 2018년 11월 2일. 2019년 11월 27일에 확인함.
↑ Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). “A Primer in BERTology: What We Know About How BERT Works”. 《Transactions of the Association for Computational Linguistics》 8: 842–866. arXiv:2002.12327. doi:10.1162/tacl_a_00349. S2CID 211532403.
↑ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). “Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books”. arXiv:1506.06724 [cs.CV]. arXiv 인용에서 지원되지 않는 변수를 사용함 (도움말)
↑ Rajpurkar, Pranav; Zhang, Jian; Lopyrev, Konstantin; Liang, Percy (2016년 10월 10일). “SQuAD: 100,000+ Questions for Machine Comprehension of Text”. arXiv:1606.05250 [cs.CL].
↑ Zellers, Rowan; Bisk, Yonatan; Schwartz, Roy; Choi, Yejin (2018년 8월 15일). “SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference”. arXiv:1808.05326 [cs.CL].

외부 링크[편집]

[:0-1] 가 ^나 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (2018년 10월 11일). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. arXiv:1810.04805v2 [cs.CL].

[2] “Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing”. 《Google AI Blog》 (영어). 2018년 11월 2일. 2019년 11월 27일에 확인함.

[3] Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). “A Primer in BERTology: What We Know About How BERT Works”. 《Transactions of the Association for Computational Linguistics》 8: 842–866. arXiv:2002.12327. doi:10.1162/tacl_a_00349. S2CID 211532403.

[4] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). “Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books”. arXiv:1506.06724 [cs.CV]. arXiv 인용에서 지원되지 않는 변수를 사용함 (도움말)

[5] Rajpurkar, Pranav; Zhang, Jian; Lopyrev, Konstantin; Liang, Percy (2016년 10월 10일). “SQuAD: 100,000+ Questions for Machine Comprehension of Text”. arXiv:1606.05250 [cs.CL].

[6] Zellers, Rowan; Bisk, Yonatan; Schwartz, Roy; Choi, Yejin (2018년 8월 15일). “SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference”. arXiv:1808.05326 [cs.CL].

[1]

[2]

[3]

[4]

[5]

[6]