t-분포 확률적 임베딩

t-분포 확률적 임베딩(t-SNE)은 데이터의 차원 축소에 사용되는 기계 학습 알고리즘 중 하나로, 2002년 샘 로이스_{Sam Rowise}와 제프리 힌튼에 의해 개발되었다.^[1] t-SNE는 비선형 차원 축소 기법으로, 고차원 데이터를 특히 2, 3차원 등으로 줄여 가시화하는데에 유용하게 사용된다. 구체적으로 t-SNE는 비슷한 데이터는 근접한 2, 3차원의 지점으로, 다른 데이터는 멀리 떨어진 지점으로 맵핑한다.

t-SNE 알고리즘은 두 단계에 걸쳐서 진행된다. 첫번째로, 각 데이터 쌍에 대해서 결합분포를 만든다. 이 분포는 비슷한 데이터는 선택될 확률이 매우 높지만 다른 데이터끼리는 선택될 확률이 매우 낮도록 설계된다.

t-SNE는 컴퓨터 보안,^[2] 음악 분석,^[3] 암 연구,^[4] 생물정보학,^[5] 생체신호처리^[6] 등 너른 분야에서 데이터의 시각화를 위해 사용된다.인공 신경망의 상위 계층을 표현하는데 쓰이기도 한다.^[7]

각주[편집]

↑ Roweis, Sam; Hinton, Geoffrey (January 2002). 《Stochastic neighbor embedding》 (PDF). Neural Information Processing Systems.
↑ Gashi, I.; Stankovic, V.; Leita, C.; Thonnard, O. (2009). “An Experimental Study of Diversity with Off-the-shelf AntiVirus Engines”. 《Proceedings of the IEEE International Symposium on Network Computing and Applications》: 4–11.
↑ Hamel, P.; Eck, D. (2010). “Learning Features from Music Audio with Deep Belief Networks”. 《Proceedings of the International Society for Music Information Retrieval Conference》: 339–344.
↑ Jamieson, A.R.; Giger, M.L.; Drukker, K.; Lui, H.; Yuan, Y.; Bhooshan, N. (2010). “Exploring Nonlinear Feature Space Dimension Reduction and Data Representation in Breast CADx with Laplacian Eigenmaps and t-SNE”. 《Medical Physics》 37 (1): 339–351. doi:10.1118/1.3267037. PMC 2807447. PMID 20175497.
↑ Wallach, I.; Liliean, R. (2009). “The Protein-Small-Molecule Database, A Non-Redundant Structural Resource for the Analysis of Protein-Ligand Binding”. 《Bioinformatics》 25 (5): 615–620. doi:10.1093/bioinformatics/btp035. PMID 19153135.
↑ Birjandtalab, J.; Pouyan, M. B.; Nourani, M. (2016년 2월 1일). 《Nonlinear dimension reduction for EEG-based epileptic seizure detection》. 《2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)》. 595–598쪽. doi:10.1109/BHI.2016.7455968. ISBN 978-1-5090-2455-1. S2CID 8074617.
↑ Visualizing Representations: Deep Learning and Human Beings Christopher Olah's blog, 2015

외부 링크[편집]

Visualizing Data Using t-SNE, Google Tech Talk about t-SNE
Implementations of t-SNE in various languages, A link collection maintained by Laurens van der Maaten

[SNE-1] Roweis, Sam; Hinton, Geoffrey (January 2002). 《Stochastic neighbor embedding》 (PDF). Neural Information Processing Systems.

[2] Gashi, I.; Stankovic, V.; Leita, C.; Thonnard, O. (2009). “An Experimental Study of Diversity with Off-the-shelf AntiVirus Engines”. 《Proceedings of the IEEE International Symposium on Network Computing and Applications》: 4–11.

[3] Hamel, P.; Eck, D. (2010). “Learning Features from Music Audio with Deep Belief Networks”. 《Proceedings of the International Society for Music Information Retrieval Conference》: 339–344.

[4] Jamieson, A.R.; Giger, M.L.; Drukker, K.; Lui, H.; Yuan, Y.; Bhooshan, N. (2010). “Exploring Nonlinear Feature Space Dimension Reduction and Data Representation in Breast CADx with Laplacian Eigenmaps and t-SNE”. 《Medical Physics》 37 (1): 339–351. doi:10.1118/1.3267037. PMC 2807447. PMID 20175497.

[5] Wallach, I.; Liliean, R. (2009). “The Protein-Small-Molecule Database, A Non-Redundant Structural Resource for the Analysis of Protein-Ligand Binding”. 《Bioinformatics》 25 (5): 615–620. doi:10.1093/bioinformatics/btp035. PMID 19153135.

[6] Birjandtalab, J.; Pouyan, M. B.; Nourani, M. (2016년 2월 1일). 《Nonlinear dimension reduction for EEG-based epileptic seizure detection》. 《2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)》. 595–598쪽. doi:10.1109/BHI.2016.7455968. ISBN 978-1-5090-2455-1. S2CID 8074617.

[7] Visualizing Representations: Deep Learning and Human Beings Christopher Olah's blog, 2015

[1]

[2]

[3]

[4]

[5]

[6]

[7]