[PAPER REVIEW] Diffusion Models Beat GANs on Image Synthesis

ANALYSIS/Paper Review

by koharin 2023. 7. 31. 22:02

728x90

Background

Diffusion Model

likelihood-base model
고해상도 이미지 생성함을 보임
한계: LSUN, ImageNet과 같은 어려운 Data Set에서 GAN보다 이미지 생성이 어려움을 보임

Summary

Motivation

기존 최신 생성 모델의 한계 (GAN)
- GAN은 최신 likelihood-based 모델보다 다양성이 부족함
- 학습이 어려움
- 심사숙고하게 hyperparameter와 regularizer를 선택하지 않으면 collapse 발생
⇒ 이러한 GAN의 한계로 확장성이 부족
likelihood-based model 한계
- 장점: likelihood-based model로 GAN과 비슷하게 샘플 품질을 높여옴
- 단점
  - likelihood-base model은 diversity는 있는데 visual fidelity가 부족함
  - GAN보다 모델 샘플링 속도가 느림
GAN의 장점을 diffusion model에 활용
1. 최신 GAN은 연구는 model architecture가 많이 탐색되고 refined됨 (많이 연구됨?)
  - UNet architecture (Ho et al.)
2. 최신 GAN 연구는 diversity와 fidelity 사이 trade off 가능하면서 고해상도 이미지 샘플 생성 가능 (모든 distrubution 커버하지 않으면서)

Contribution

diffusion model이 기존의 최신 생성 모델(GAN)보다 이미지 샘플 품질이 더 좋음을 보임
architecture improvement
- unconditional image generation
classifier guidance
- class-conditinal
- trade off diversity for fidelity
- 샘플링 시 diffusion model에 multiple forward pass가 있음에도 GAN과 diffusion model 사이 샘플링 시간 gap 줄임
- upsampling과 결합하여 고해상도로 샘플 품질 개선

Approach

1. unconditional image synthesis: model architecture

ablation study 통해 좋은 모델 아키텍처 찾음

2. conditional image synthesis: trade off diversity & fidelity

classifier에서의 gradient를 사용하여 효과적인 trade off 계산

Main Results

Data Set	ImageNet 128 X 128	ImageNet 256 X 256	ImageNet 512 X 512
FID	2.97	4.59	7.72

classifier guidance는 upsampling diffusion model과 잘 결합되어 FID를 향상시킴
unconditional image synthesis(model architecture improve)와 conditional image synthesis(classifier guidance achieve)관점에서 결과를 보임

Discussion

limitation

생성 모델에서 diffusion model은 여러 denoising step 때문에 GAN보다 샘플링 시간이 여전히 느림
논문에서 제안하는 diffusion model의 forward pass는 GAN generator보다 5-20배 더 길다. ⇒ DDIM sampling 과정을 하나의 스텝으로 줄일 수 있지만, 이 방법도 기존의 GAN보다는 좋지 않은 성능 (sampling speed gap 개선 필요)
diffusion model은 explicit latent rerpresentation을 학습하지 않아서 DDIM은 이미지를 implicit latent space로 인코딩하는데, 이 latent representation이 다른 GAN, Glow, VAEs 모델과 비교했을 때 semantically 의미가 있을지는 불확실함
제안하는 classifier guidance 기술은 라벨링된 데이터셋으로 제한됨 → 라벨링되지 않은 데이터셋에도 적용 가능하도록 개선 필요

728x90

저작자표시 비영리 변경금지

'ANALYSIS > Paper Review' 카테고리의 다른 글

[PAPER REVIEW] Generative Modeling by Estimating Gradients of the Data Distribution (0)	2023.08.01
[PAPER REVIEW] Classifier-Free Diffusion Guidance (0)	2023.07.31
[PAPER REVIEW] VecSeeds: Generate fuzzing testcases from latent vectors based on VAE-GAN (0)	2023.07.11
[PAPER REVIEW] MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing (0)	2023.07.11
[PAPER REVIEW] Neuzz: Efficientfuzzing with neural program learning (1)	2023.07.10

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

anonymous?

고정 헤더 영역

메뉴 레이어

메뉴 리스트

검색 레이어

검색 영역

상세 컨텐츠

본문 제목

본문

Background

Diffusion Model

Summary

Motivation

Contribution

Approach

1. unconditional image synthesis: model architecture

2. conditional image synthesis: trade off diversity & fidelity

Main Results

Discussion

limitation

'ANALYSIS > Paper Review' 카테고리의 다른 글

관련글 더보기

추가 정보

인기글

최신글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역