분포 의미론

분포 의미론^[1](Distributional semantics)은 대규모 언어 데이터 샘플에서 언어 항목 간의 의미적 유사성을 그 분포 특성을 기반으로 정량화하고 분류하기 위한 이론 및 방법을 개발하고 연구하는 연구 분야이다. 분포 의미론의 기본 아이디어는 소위 분포 가설로 요약될 수 있다. 유사한 분포를 갖는 언어 항목은 유사한 의미를 갖는다.

분포 가설

언어학의 분포 가설(distributional hypothesis)은 언어 사용의 의미론에서 파생된다. 즉, 동일한 맥락에서 사용되고 발생하는 단어는 유사한 의미를 나타내는 경향이 있다.^[2]

"단어는 그것이 유지하는 회사에 의해 특징지어진다"는 근본적인 생각은 1950년대 존 루퍼트 퍼스에 의해 대중화되었다.^[3]

분포 가설은 통계 의미론의 기초이다. 분포 가설은 언어학에서 시작되었지만^[4] 이제는 특히 단어 사용의 맥락과 관련하여 인지과학에서 주목을 받고 있다.^[5]

최근 수 년 동안 분포 가설은 언어 학습에서 유사성 기반 일반화 이론의 기초를 제공했다. 즉, 어린이가 유사한 단어의 분포를 통해 그 사용에 대해 일반화함으로써 이전에 거의 접하지 못했던 단어를 사용하는 방법을 알아낼 수 있다는 아이디어이다.^[6]^[7]

분포 가설은 두 단어가 의미적으로 유사할수록 분포적으로 더 유사할 것이며 따라서 유사한 언어적 맥락에서 더 많이 발생하는 경향이 있음을 시사한다.

같이 보기

각주

↑ Lenci, Alessandro; Sahlgren, Magnus (2023). 《Distributional Semantics》. Cambridge University Press. ISBN 9780511783692.
↑ Harris 1954
↑ Firth 1957
↑ Sahlgren 2008
↑ McDonald & Ramscar 2001
↑ Gleitman 2002
↑ Yarlett 2008

출처

Harris, Z. (1954). “Distributional structure”. 《Word》 10 (23): 146–162. doi:10.1080/00437956.1954.11659520.
Firth, J.R. (1957). “A synopsis of linguistic theory 1930-1955”. 《Studies in Linguistic Analysis》: 1–32. Reprinted in F.R. Palmer, 편집. (1968). 《Selected Papers of J.R. Firth 1952-1959》. London: Longman.
Lenci, Alessandro; Sahlgren, Magnus (2023). 《Distributional Semantics》. Cambridge University Press. ISBN 9780511783692.
Sahlgren, Magnus (2008). “The Distributional Hypothesis” (PDF). 《Rivista di Linguistica》 20 (1): 33–53. 2012년 3월 15일에 원본 문서 (PDF)에서 보존된 문서. 2010년 12월 10일에 확인함.
McDonald, S.; Ramscar, M. (2001). 〈Testing the distributional hypothesis: The influence of context on judgements of semantic similarity〉. 《Proceedings of the 23rd Annual Conference of the Cognitive Science Society》. 611–616쪽. CiteSeerX 10.1.1.104.7535.
Gleitman, Lila R. (2002). 〈Verbs of a feather flock together II〉. 《The Legacy of Zellig Harris》. Current Issues in Linguistic Theory 1. 209–229쪽. doi:10.1075/cilt.228.17gle. ISBN 978-90-272-4736-0.
Yarlett, D. (2008). 《Language Learning Through Similarity-Based Generalization》 (PDF) (학위논문). Stanford University. 2014년 4월 19일에 원본 문서 (PDF)에서 보존된 문서. 2012년 7월 12일에 확인함.
Rieger, Burghard B. (1991). On Distributed Representations in Word Semantics (PDF) (보고서). ICSI Berkeley 12-1991. CiteSeerX 10.1.1.37.7976. 2024년 4월 27일에 원본 문서 (PDF)에서 보존된 문서. 2024년 3월 24일에 확인함.
Deerwester, Scott; Dumais, Susan T.; Furnas, George W.; Landauer, Thomas K.; Harshman, Richard (1990). “Indexing by Latent Semantic Analysis” (PDF). 《Journal of the American Society for Information Science》 41 (6): 391–407. CiteSeerX 10.1.1.33.2447. doi:10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9. 2012년 7월 17일에 원본 문서 (PDF)에서 보존된 문서.
Padó, Sebastian; Lapata, Mirella (2007). “Dependency-based construction of semantic space models”. 《Computational Linguistics》 33 (2): 161–199. doi:10.1162/coli.2007.33.2.161. S2CID 7747235.
Schütze, Hinrich (1993). 〈Word Space〉. 《Advances in Neural Information Processing Systems 5》. 895–902쪽. CiteSeerX 10.1.1.41.8856.
Sahlgren, Magnus (2006). 《The Word-Space Model》 (PDF) (학위논문). Stockholm University. 2012년 6월 19일에 원본 문서 (PDF)에서 보존된 문서. 2012년 11월 26일에 확인함.
Thomas Landauer; Susan T. Dumais. “A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge”. 2007년 7월 2일에 확인함.
Kevin Lund; Curt Burgess; Ruth Ann Atchley (1995). 《Semantic and associative priming in a high-dimensional semantic space》. Cognitive Science Proceedings. 660–665쪽.
Kevin Lund; Curt Burgess (1996). “Producing high-dimensional semantic spaces from lexical co-occurrence”. 《Behavior Research Methods, Instruments, and Computers》 28 (2): 203–208. doi:10.3758/bf03204766.

외부 링크

Zellig S. Harris

[1] Lenci, Alessandro; Sahlgren, Magnus (2023). 《Distributional Semantics》. Cambridge University Press. ISBN 9780511783692.

[2] Harris 1954

[3] Firth 1957

[4] Sahlgren 2008

[5] McDonald & Ramscar 2001

[6] Gleitman 2002

[7] Yarlett 2008

[1]

[2]

[3]

[4]

[5]

[6]

[7]