Edwith 논문으로 짚어보는 딥러닝의 맥 정리 Step3(Regularization)

Notice

Recent Posts

Recent Comments

Link

github

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

koos808

Edwith 논문으로 짚어보는 딥러닝의 맥 정리 Step3(Regularization) 본문

Deep Learning/딥러닝 강의 정리

Edwith 논문으로 짚어보는 딥러닝의 맥 정리 Step3(Regularization)

koos808 2020. 10. 14. 19:56

728x90

※ STEP 3 : Overfitting을 막는 regularization

핵심 키워드
```
  Regularization

  Overfitting
```
Regularization Main purpose is to avoid OverFitting.
- Overfitting이란 것은 학습데이터를 너무 믿는 나머지 테스트 데이터를 잘 맞추지 못하는 것을 의미함.
- OverFitting
  - Literally, Fitting the data more than is warranted.
  - Things get worse with noise!
- Noise
  - Stochastic Noise ::: Comes form random measurement error(관측 에러)
  - Deterministic Noise ::: Cannot model this type of error -> 이런 에러는 모델링 할 수 없다.
  - Noise에는 위 두개의 Noise가 섞여있기 때문에 완벽히 decomposition할 수 없다.
  - mathematically
    - VC dimension - complexity of the algorithm(ex. 뉴럴렛의 Layer 수)
    - in-sample error - Training error
    - out-of-sample error - Test error
    - in-sample error와 out-of-sample error의 차이를 보편적으로 Generalization performance라고 함. Generalization performance가 높아지는게 진짜 제일 중요하다!!
    - Generalization error가 커지면 Over fitting이 나오게 되는 것임. 즉, VC dimension이 커지면(뉴럴렛을 복잡하게 할 수록) overfitting이 나올 확률이 커지게 된다.
- Preventing OverFitting?
  - Approach 1 : Get more data - 가장 중요!! 데이터가 부족하다면 data agumentation을 활용해 데이터를 뻥튀기 시켜야 한다.
  - Approach 2 : Use a model with the right capacity - 적절한 능력을 갖는 모델을 활용해야 한다.
  - Approach 3 : Average many different models (Ensemble) - 앙상블을 활용하는 것! 이게 바로 Bagging임. 앙상블을 사용하면 일반적으로 성능이 1~2%정도 향상된다.
  - Approach 4 : Use DropOut, DropConnect, or BatchNorm - 위 3개를 해보고나서 테크닉(technique)들이 들어가게 된다.
- Limiting the Capacity
  - Capacity를 줄인다는 것은 결국 네트워크 사이즈를 줄이는 것도 있지만 Early stopping도 Capacity에 속한다.
  - Architecture : Liit the number of hidden layers and units per layer
  - Early stopping : Stop the learning before it overfits using validation sets
  - Weight-decay : Penalize large weights using penalties or constraints on their squared values (L2 penalty) or absolute values (L1 penalty) - 학습하는 파라미터를 너무 크게 설정하고 싶지 않은 것! Weight가 너무 커지는 것을 방지하는 것을 추가함.
- DropOut
  - DropOut increases the generalization performance of the neural network by restricting the model capacity! - 한 Layer가 있을 때 그 Layer의 노드를 랜덤으로 몇개 꺼버리는 것(학습 할 때만 0으로 만들기, 테스트할 때는 모두 사용).
  - 장점은 매번 mini batch마다 다른 아키텍쳐를 학습하는 효과가 있음.
- DropConnect
  - Instead of turning the neurons off (DropOut), DropConnect disconnects the connections between neurons. - Layer의 값을 0으로 만드는 것이 아니라 weight를 끊어버리는 것임(0으로 주는 것).
- Batch Normalization(중요!) -> 왠만한 문제에서 가능한 다 사용하면 됨.
  - mini-batch learning을 할 때, batch마다 normalization을 해서 statics들을 맞추는 작업을 해주는 것.
  - Benefits of BN
    - 1. Increase learning rate(learning rate를 늘릴 수 있다.) - Interval covariate shift를 줄일 수 있다. dataset들 간의 다른 statistics 때문에 일반적으로 learning rate가 너무 크면 학습이 되지 않는다. 하지만, Batch Normalization은 이러한 문제를 정규화시키기 때문에 learning rate를 높여서 학습할 수 있다.
      1. Remove Dropout - Dropout을 안써도 된다.
      1. Reduce L2 weight decay - L2 weight decay를 안써도 된다.
      1. Accelerate leaning rate decay - learning rate를 빨리해도 학습이 잘된다.
      1. Remove Local Response Normalization - Local Response Normalization를 안써도 된다.
- Conclusion
  - 어떤 문제를 풀던 간에 Overfitting은 발생하기 때문에 Regularization은 항상 고려해야 한다. 아래는 순서.
  - 1.data argumentation -> data 늘리기
  - 2.Layer Size를 늘려가면서 학습해보기 & 데이터가 적을때는 적당한 Network architecture를 잡는게 진짜 중요하다.

Book review - Ian Goodfellow's books - http://www.deeplearningbook.org/ ::: Regularization chapter 7

List

1.Parameter Norm Penalties

2. Dataset Augmentation

3. Noise Robustness : to input, weights, and output

4. Semi-Supervised Learning = learning a "representation"

5. Multitask learning

6. Early Stopping

7. Parameter Tying and Parameter Sharing

8. Sparse representation

9. Bagging and Other Ensemble Methods

10. Dropout

11. Adversarial Training

Parameter Norm Penalties
- Many regularization approaches are based on limiting the capacity of models by adding a parameter norm penalty to the objective function.

728x90

저작자표시

'Deep Learning > 딥러닝 강의 정리' 카테고리의 다른 글

Edwith 논문으로 짚어보는 딥러닝의 맥 정리 Step2(CNN) (1)	2020.10.07
Edwith 논문으로 짚어보는 딥러닝의 맥 정리 Step1 (0)	2020.10.06

'Deep Learning/딥러닝 강의 정리' Related Articles

Comments

koos808

Edwith 논문으로 짚어보는 딥러닝의 맥 정리 Step3(Regularization) 본문

Edwith 논문으로 짚어보는 딥러닝의 맥 정리 Step3(Regularization)

※ STEP 3 : Overfitting을 막는 regularization

'Deep Learning > 딥러닝 강의 정리' 카테고리의 다른 글

티스토리툴바