특징 선택

특징 선택(Feature selection)은 모델 구성에 사용할 관련 특징(변수, 예측 변수)의 하위 집합을 선택하는 프로세스이다. 스타일로메트리와 DNA 마이크로어레이 분석은 특징 선택이 사용되는 두 가지 경우이다. 특징 추출과는 구별되어야 한다.^[1]

특징 선택 기술은 다음과 같은 여러 가지 이유로 사용된다.

연구자/사용자가 해석하기 쉽도록 모델을 단순화하기 위해^[2]
더 짧은 훈련 시간^[3]
차원의 저주를 피하기 위해^[4]
학습 모델 클래스와의 데이터 호환성을 향상시키기 위해^[5]
입력 공간에 존재하는 고유한 대칭성을 인코딩하기 위해^[6]^[7]^[8]^[9]

특징 선택 기술을 사용할 때의 핵심 전제는 데이터에 중복되거나 관련성이 없는 일부 특징이 포함되어 있으므로 정보 손실을 많이 발생시키지 않고 제거할 수 있다는 것이다.^[10] 중복성과 관련성 없음은 서로 다른 두 가지 개념이다. 하나의 관련 기능이 밀접하게 연관되어 있는 다른 관련 기능이 있으면 중복될 수 있기 때문이다.

특징 추출은 원래 특징의 함수로부터 새로운 특징을 생성하는 반면, 특징 선택은 특징의 하위 집합을 반환한다. 특징 선택 기술은 특징이 많고 샘플(또는 데이터 포인트)이 상대적으로 적은 도메인에서 자주 사용된다.

같이 보기[편집]

각주[편집]

↑ Sarangi, Susanta; Sahidullah, Md; Saha, Goutam (September 2020). “Optimization of data-driven filterbank for automatic speaker verification”. 《Digital Signal Processing》 104: 102795. arXiv:2007.10729. doi:10.1016/j.dsp.2020.102795. S2CID 220665533.
↑ Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani (2013). 《An Introduction to Statistical Learning》. Springer. 204쪽.
↑ Brank, Janez; Mladenić, Dunja; Grobelnik, Marko; Liu, Huan; Mladenić, Dunja; Flach, Peter A.; Garriga, Gemma C.; Toivonen, Hannu; Toivonen, Hannu (2011), 〈Feature Selection〉, Sammut, Claude; Webb, Geoffrey I., 《Encyclopedia of Machine Learning》 (영어), Boston, MA: Springer US, 402–406쪽, doi:10.1007/978-0-387-30164-8_306, ISBN 978-0-387-30768-8, 2021년 7월 13일에 확인함
↑ Kramer, Mark A. (1991). “Nonlinear principal component analysis using autoassociative neural networks”. 《AIChE Journal》 (영어) 37 (2): 233–243. doi:10.1002/aic.690370209. ISSN 1547-5905.
↑ Kratsios, Anastasis; Hyndman, Cody (2021). “NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation”. 《Journal of Machine Learning Research》 22 (92): 1–51. ISSN 1533-7928.
↑ Persello, Claudio; Bruzzone, Lorenzo (July 2014). 〈Relevant and invariant feature selection of hyperspectral images for domain generalization〉. 《2014 IEEE Geoscience and Remote Sensing Symposium》 (PDF). IEEE. 3562–3565쪽. doi:10.1109/igarss.2014.6947252. ISBN 978-1-4799-5775-0. S2CID 8368258.
↑ Hinkle, Jacob; Muralidharan, Prasanna; Fletcher, P. Thomas; Joshi, Sarang (2012). 〈Polynomial Regression on Riemannian Manifolds〉. Fitzgibbon, Andrew; Lazebnik, Svetlana; Perona, Pietro; Sato, Yoichi; Schmid, Cordelia. 《Computer Vision – ECCV 2012》. Lecture Notes in Computer Science (영어) 7574. Berlin, Heidelberg: Springer. 1–14쪽. arXiv:1201.2395. doi:10.1007/978-3-642-33712-3_1. ISBN 978-3-642-33712-3. S2CID 8849753.
↑ Yarotsky, Dmitry (2021년 4월 30일). “Universal Approximations of Invariant Maps by Neural Networks”. 《Constructive Approximation》 (영어) 55: 407–474. arXiv:1804.10306. doi:10.1007/s00365-021-09546-1. ISSN 1432-0940. S2CID 13745401.
↑ Hauberg, Søren; Lauze, François; Pedersen, Kim Steenstrup (2013년 5월 1일). “Unscented Kalman Filtering on Riemannian Manifolds”. 《Journal of Mathematical Imaging and Vision》 (영어) 46 (1): 103–120. doi:10.1007/s10851-012-0372-9. ISSN 1573-7683. S2CID 8501814.
↑ Kratsios, Anastasis; Hyndman, Cody (2021년 6월 8일). “NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation”. 《Journal of Machine Learning Research》 22: 10312. Bibcode:2015NatSR...510312B. doi:10.1038/srep10312. PMC 4437376. PMID 25988841.

외부 링크[편집]

Feature Selection Package, Arizona State University (Matlab Code)
NIPS challenge 2003 (see also NIPS)
Naive Bayes implementation with feature selection in Visual Basic 보관됨 2009-02-14 - 웨이백 머신 (includes executable and source code)
Minimum-redundancy-maximum-relevance (mRMR) feature selection program
FEAST (Open source Feature Selection algorithms in C and MATLAB)

[1] Sarangi, Susanta; Sahidullah, Md; Saha, Goutam (September 2020). “Optimization of data-driven filterbank for automatic speaker verification”. 《Digital Signal Processing》 104: 102795. arXiv:2007.10729. doi:10.1016/j.dsp.2020.102795. S2CID 220665533.

[islr-2] Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani (2013). 《An Introduction to Statistical Learning》. Springer. 204쪽.

[3] Brank, Janez; Mladenić, Dunja; Grobelnik, Marko; Liu, Huan; Mladenić, Dunja; Flach, Peter A.; Garriga, Gemma C.; Toivonen, Hannu; Toivonen, Hannu (2011), 〈Feature Selection〉, Sammut, Claude; Webb, Geoffrey I., 《Encyclopedia of Machine Learning》 (영어), Boston, MA: Springer US, 402–406쪽, doi:10.1007/978-0-387-30164-8_306, ISBN 978-0-387-30768-8, 2021년 7월 13일에 확인함

[4] Kramer, Mark A. (1991). “Nonlinear principal component analysis using autoassociative neural networks”. 《AIChE Journal》 (영어) 37 (2): 233–243. doi:10.1002/aic.690370209. ISSN 1547-5905.

[5] Kratsios, Anastasis; Hyndman, Cody (2021). “NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation”. 《Journal of Machine Learning Research》 22 (92): 1–51. ISSN 1533-7928.

[6] Persello, Claudio; Bruzzone, Lorenzo (July 2014). 〈Relevant and invariant feature selection of hyperspectral images for domain generalization〉. 《2014 IEEE Geoscience and Remote Sensing Symposium》 (PDF). IEEE. 3562–3565쪽. doi:10.1109/igarss.2014.6947252. ISBN 978-1-4799-5775-0. S2CID 8368258.

[7] Hinkle, Jacob; Muralidharan, Prasanna; Fletcher, P. Thomas; Joshi, Sarang (2012). 〈Polynomial Regression on Riemannian Manifolds〉. Fitzgibbon, Andrew; Lazebnik, Svetlana; Perona, Pietro; Sato, Yoichi; Schmid, Cordelia. 《Computer Vision – ECCV 2012》. Lecture Notes in Computer Science (영어) 7574. Berlin, Heidelberg: Springer. 1–14쪽. arXiv:1201.2395. doi:10.1007/978-3-642-33712-3_1. ISBN 978-3-642-33712-3. S2CID 8849753.

[8] Yarotsky, Dmitry (2021년 4월 30일). “Universal Approximations of Invariant Maps by Neural Networks”. 《Constructive Approximation》 (영어) 55: 407–474. arXiv:1804.10306. doi:10.1007/s00365-021-09546-1. ISSN 1432-0940. S2CID 13745401.

[9] Hauberg, Søren; Lauze, François; Pedersen, Kim Steenstrup (2013년 5월 1일). “Unscented Kalman Filtering on Riemannian Manifolds”. 《Journal of Mathematical Imaging and Vision》 (영어) 46 (1): 103–120. doi:10.1007/s10851-012-0372-9. ISSN 1573-7683. S2CID 8501814.

[Bermingham-prolog-10] Kratsios, Anastasis; Hyndman, Cody (2021년 6월 8일). “NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation”. 《Journal of Machine Learning Research》 22: 10312. Bibcode:2015NatSR...510312B. doi:10.1038/srep10312. PMC 4437376. PMID 25988841.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]