1. PLM (Pre-trained Language Model)

Transfer Lerning
- 큰 데이터셋을 통해 pretrained된 모델을 target task에 fine-tuning
Self-supervised Lerning
- Unlabeled 데이터셋을 지도학습 scheme에서 학습
- 일부 정보를 통해 나머지 정보를 예측
Self-supervised Learning을 통해 좋은 weight parameter의 seed를 얻어 transfer learning을 통해 한정된 데이터셋에서도 훨씬 더 좋은 성능을 얻자.
Transformer
- 2017년 구글에서 제안
- 기존 seq2seq를 대체하는 형태로 제안
- Attention만을 활용해 아키텍처 구성
  - 기존 RNN기반 방식 (ex. LSTM)보다 뛰어난 성능
PLM in NLP
- General한 representation을 학습해, 전이학습에 활용할 수 없나?
- PLM을 통한 성능 향상
  - Feature-based Approach
    - 더 좋은 입력 representation을 갖게하자
  - Fine-tuning Approach
    - 더 좋은 weight parameter seed를 갖게하자
PLM의 장점
- 쉽게 SOTA에 근접한 성능달성가능
PLM의 한계/단점
- 새로운 구조의 모델이나 알고리즘이 아닌 단순한 scale-up 경쟁?
- 이로 인한 환경 파괴 (탄소 배출량 증가) 및 부익부빈익빈 가속화
- 세상의 지식을 배운 것이 아닌, 단순히 인간을 흉내내는 것에 불과함.
- 세상의 지식을 배운 것이 아닌, 단순히 인간을 흉내내는 것에 불과함.
Typical NLP input/output
- Many to One
  - Text Classification
- One to Many
  - NLG, Machine Translation
- Many to Many
  - POS Tagging, MRC (기계 독해)
Benchmatk Tests
- 각종 벤치마크 테스트 데이터셋을 통해, 실제 문제 해결 능력을 가늠하거나, PLM의 성능을 체크할 수 있음.
- 정량적인 성능평가를 위한 데이터셋
- GLUE : General Language Understanding Evaluation
  - Textual Entailment
    - MNLI(Multi-Genre Natural Language Inference)
      - entailment classfication task
      - 문장이 이어지는가? 문맥이해에 대한 예측 지표 제시
    - RTE(Recognizing Textual Entailment)
      - binary entailment task
  - Text Comparison
    - QQP(Quora Question Pairs)
      - binary classification task
      - sementically equivalent
    - STS-B(Sermantic Textual Simiarity Benchmark)
      - score from 1 to 5 de notiong how similar the two sentence
    - MRPC(Microsoft Research Paraphrase Corpus)
      - sementically equivalent
  - Question Answering
    - QNLI(Question Natural Language Inference)
      - converted to a binary classification task
      - 질문에 대한 이해, 답변에 대한 이해
  - Sentiment Classification
    - SST-2(The Stanford Sentiment Treebank)
      - binary single-sentence classification task
  - Linguistic Acceptability
    - CoLA(Corpus of Linguistic Acceptability
      - binary single-sentence classification task
      - 이 문장이 말이 되는가? → 문맥 파악에 대한 내용
- SQuAD 1.1 & 2.0 ( Stanford Question Answering Dataset)
  - predict the answer text span in passage
  - 문서에서 정답부분을 리턴해주는 테스트
테스트 방법
- Huggingface
- ParlAI
Korean Benchmark Test Datasets
- NSMC
  - 네이버 영화 리뷰 감성 분류
- KLUE (Korean Language Understanding Evaluation)
마무리
- Transfer Learning
  - 연관된 데이터 셋으로 Pre-trained 된 model을 통해 fine-tuning 을 하여 더 높은 성능을 얻기 위함.
  - How to
    1. Set seed weights and train as normal
    2. Fix loaded weights and train unloaded parts
    3. Train with different learning rate on each part
- Self-supervised Learning
  - Unlabeled 데이터셋 지도학습 scheme에서 학습
    - 일부 정보를 통해 나머지 정보 예측
  - General한 representation을 학습하여, Transfer Learning에 활용할 수 없을까?
- 결론
  - Transformer를 Back-bone으로 삼아 general representation을 학습, 이를 바탕으로 target task에 fine-tuning한다.
    1. Target task에 따라 Transformer 모델 일부 를 선택(Encoder or Decorder)
    2. 수많은 Unlabeled Corpus를 통해 general representation을 학습 → pretrain
    3. 이후 target task에 fine-tuning
  - PLM 장단점