기울기 소멸 문제

기울기 소멸 문제(vanishing gradient problem)는 신경망 활성함수의 도함수 값이 계속 곱해지다 보면 가중치에 따른 결과 값의 기울기가 0에 가까워지며, 기울기가 너무 작아져 가중치를 변경할 수 없게 되는 현상이다.^[1] 최악의 경우 아예 신경망의 훈련이 멈춰버릴 수 있다.^[1] ReLU처럼 활성함수를 개선하는 방법, 층을 건너뛴 연결을 하는 ResNet, 배치 정규화(batch normalization) 등의 해법이 나왔다.

오차 역전파를 통해 연구자들은 지도 심층 인공신경망을 처음부터 훈련할 수 있게 되었으나, 초기에는 거의 성공을 거두지 못했다. 셉 호하이터(Sepp Hochreiter)는 이런 실패의 이유를 1991년 공식적으로 "기울기 소멸 문제"로 확인하였다.^[2]^[3] 이는 다층 순방향 신경망뿐 아니라,^[4] 순환 신경망에도 영향을 미쳤다.^[5]

한편 기울기 소멸 문제와 반대로 기울기값이 계속 증폭될 경우 기울기 폭발 문제(exploding gradient problem)가 발생한다.

같이 보기

스펙트럼 반지름

각주

↑ ^가 ^나 Basodi, Sunitha; Ji, Chunyan; Zhang, Haiping; Pan, Yi (September 2020). “Gradient amplification: An efficient way to train deep neural networks”. 《Big Data Mining and Analytics》 3 (3): 198. doi:10.26599/BDMA.2020.9020004. ISSN 2096-0654. S2CID 219792172.
↑ Hochreiter, S. (1991). 《Untersuchungen zu dynamischen neuronalen Netzen》 (PDF) (Diplom thesis). Institut f. Informatik, Technische Univ. Munich.
↑ Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). 〈Gradient flow in recurrent nets: the difficulty of learning long-term dependencies〉. Kremer, S. C.; Kolen, J. F. 《A Field Guide to Dynamical Recurrent Neural Networks》. IEEE Press. ISBN 0-7803-5369-2.
↑ Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav (2017년 6월 15일). “Deep learning for computational chemistry”. 《Journal of Computational Chemistry》 (영어) 38 (16): 1291–1307. arXiv:1701.04503. Bibcode:2017arXiv170104503G. doi:10.1002/jcc.24764. PMID 28272810. S2CID 6831636.
↑ Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (2012년 11월 21일). “On the difficulty of training Recurrent Neural Networks”. arXiv:1211.5063 [cs.LG].

이 글은 컴퓨터 과학에 관한 토막글입니다. 여러분의 지식으로 알차게 문서를 완성해 갑시다.

[Basodi2020-1] 가 ^나 Basodi, Sunitha; Ji, Chunyan; Zhang, Haiping; Pan, Yi (September 2020). “Gradient amplification: An efficient way to train deep neural networks”. 《Big Data Mining and Analytics》 3 (3): 198. doi:10.26599/BDMA.2020.9020004. ISSN 2096-0654. S2CID 219792172.

[2] Hochreiter, S. (1991). 《Untersuchungen zu dynamischen neuronalen Netzen》 (PDF) (Diplom thesis). Institut f. Informatik, Technische Univ. Munich.

[3] Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). 〈Gradient flow in recurrent nets: the difficulty of learning long-term dependencies〉. Kremer, S. C.; Kolen, J. F. 《A Field Guide to Dynamical Recurrent Neural Networks》. IEEE Press. ISBN 0-7803-5369-2.

[4] Goh, Garrett B.; Hodas, Nathan O.; Vishnu, Abhinav (2017년 6월 15일). “Deep learning for computational chemistry”. 《Journal of Computational Chemistry》 (영어) 38 (16): 1291–1307. arXiv:1701.04503. Bibcode:2017arXiv170104503G. doi:10.1002/jcc.24764. PMID 28272810. S2CID 6831636.

[:1-5] Pascanu, Razvan; Mikolov, Tomas; Bengio, Yoshua (2012년 11월 21일). “On the difficulty of training Recurrent Neural Networks”. arXiv:1211.5063 [cs.LG].

[1]

[2]

[3]

[4]

[5]