[논문] SRGAN 리뷰 : Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

AI/논문

[논문] SRGAN 리뷰 : Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

슈퍼짱짱 2018. 11. 5. 15:24

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network(SRGAN)

arXiv : 25 May 2017

논문 링크 : https://arxiv.org/pdf/1609.04802.pdf

Abstract

더 빠르고 깊은 CNN을 사용한 single image super-resolution의 정확도와 속도에도 불구하고, 한 가지 중요한 문제가 남아있다.

: large upscaling에서 미세한 texture details은 어떻게 복구할 것인가?

* upscaling : ex) 4X upscaling -> 16X pixel

최근 연구들은 meas squared reconstruction error(MSE)를 minimizing함으로써 super resolution method를 optimization 했다.

그 결과, 높은 high peak signal-to-noise ratios(PSNR - super resolution을 평가하는 수치)를 가지지만, high-frequency details가 결핍되어있고, perceptually 불만족스럽다.

* 즉, super resolution을 평가하는 수치는 높아도, 실제 눈으로 확인 했을 땐, 해상도가 그리 높지 않다.

본 논문에서, SRGAN(a generative adversarial network(GAN) for image super-resolution(SR))을 제안한다.
4X upscaling이 가능한 최초의 framework이다.

본 논문에서는 adversarial loss와 content loss를 포함하는 a perceptual loss function을 제안한다.
adversarial loss는 super-resolved images와 original photo-realistic images를 구별하는 discriminator network를 train한다.
content loss는 pixel space에서의 similarity 대신, perceptual similarity를 train한다.
우리의 deep residual network는 heavily downsampled된 이미지를 복구할 수 있다.

* 즉, 저해상도 이미지를 고해상도로 복구할 수 있다.

1. Introduction

low-resolution image(LS)를 high-resolution(HR)으로 추정하는 것을 super-resolution(SR)이라 한다.
SR의 문제는 특히 high upscaling에서 나타나는데, texture detail이 부족하다.
일반적으로 SR algorithm의 optimization target은 회복된 HR image와 original photo-realistic image의 MSE를 minimization하는 것인다.
하지만, MSE와 PSNR는 pixel-wise image의 차이 기반으로 정의되었기 때문에, high texture detail과 같은 지각적으로 관련있는 차이를 잡기에는 제한적이다.

* 즉, Figure 2와 같이 PSNR은 perceptual SR을 반영하지 못한다.

이전 연구와 다른 점은, 본 논문은 VGG network의 high-level feature maps와 discriminator를 결합한 새로운 perceptual loss를 제안한다.

1.1 Related work

1.1.3 Loss functions

MSE는 pixel-wise average loss이기 때문에, 과하게 smooth하고, 따라서 poor perceptual quality이다.

Figure 3과 같이 MSE는 평균을 내기 때문에 지나치게 smooth하지만, GAN은 natural image에서 reconstruction하기 때문에 더 설득력있는 solution이다.

1.2 Contribution

GAN은 reconstruction을 natural image를 포함할 가능성이 높은 영역으로 이동시킨다.

* GAN이 분포를 추정하는 모델이라 그런듯!

우리의 contribution은

We set a new state of the art for image SR with high upscaling factors(4X) as measured by PSNR and structural similarity(SSIM) with our 16 blocks deep ResNet(SRResNet) optimized for MSE.
We propose SRGAN which is a GAN-based network optimized for a new perceptual loss. Here we replace the MSE-based content loss with a loss calculated on feature maps of the VGG network, which are more invariant to changes in pixel space.
We confirm with an extensive mean opinion score(MOS) test on images from three public benchmark datasets that SRGAN is the new state of the art, by a large margin, for the estimation of photo-realistic SR images with high upscaling factors(4X).

* 즉, SRResNet과 비교해도 성능이 좋은, GAN base의 SR기술인 SRGAN을 제안하는데, 이는 MOS test에서도 좋은 성능을 보인다.

2. Method

single image super-resolution(SISR)의 목표는 low-resolution input image 에서 high-resolution image(super-resolved image )를 추정하는 것이다.
우리의 최고의 목표는 주어진 LR input image를 그에 상응하는 HR image 짝을 생성하는 generating function G를 train하는 것이다.

: 이 논문에서 design한 perceptual loss

2.1 Adversarial network architecture

Goodfellow가 제안한 GAN loss :

G에 의해 생성된 image가 D에 의해 진짜 image인지, 생성된 이미지인지 판별된다. 이것이 SR에서 MSE와 같은 pixel-wise error를 minimizing하는 것과 다른 점이다.

model architecture :

We increase the resolution of the input image with two trained sub-pixel convolution layers as proposed by Shi et al.

* super-resolution이라면 pixel의 수가 당연히 늘어날 텐데, 일반적으로 CNN filter를 거치면 image dimension은 줄거나 동일하다. 이 때, 여기서 pixel 수를 늘리는 즉, resolution을 increase하는 방법이 바로 저 sub-pixel인 것 같다. 논문에는 딱 저렇게 한 줄 나와있어서 sub-pixel convolution algorithm을 따로 찾아보았다.

CVPR에 2016년 9월에 발간된 Super-Resolution 논문이다. (Real-Time Single Image and Video Super-Resolution Using and Efficient Sub-Pixel Convolution Neural Network - https://arxiv.org/abs/1609.05158)

저 논문은 자세히 보진 않고 핵심 이미지만 가져와보았다.

input image의 feature map들을 이리저리 조합해서 pixel 수가 늘어나는 듯 하다.

2.2 Perceptual loss function

는 다음과 같이 정의한다.

2.2.1 Content loss

pixel-wise MSE loss는 다음과 같다.

하지만 이는 high PSNR은 얻을 지라도, 너무 smooth되어 high-frequency content에서는 문제가 될 수 있다.
따라서 pixel-wise loss 대신, VGG loss(based on ReLU activation layers of pre-trained 19 layer VGG network)를 정의한다.

: feature map obtained by the j-th convolution(after activation) before the i-th maxpooling layer within the VGG19 network
과 의 feature representation(VGG feature map)의 euclidean distance

2.2.2 Adversarial loss

discriminator network를 속임으로써, natural image와 비슷하게 generating하도록 한다.
은 discriminator가 를 natural HR image라고 구별할 확률 base로 정의된다.

3. Experiments

'AI > 논문' 카테고리의 다른 글

[논문] GoogleNet - Inception 리뷰 : Going deeper with convolutions (0)	2019.08.27
[논문] DBSCAN 리뷰 : Density Based Spatial Clustering of Applications with Noise (7)	2019.08.19
[논문] GAN 리뷰 : Generative Adversarial Nets (0)	2019.02.22
[논문] ADGAN 리뷰 : ANOMALY DETECTION WITH GENERATIVE ADVERSARIAL NETWORKS (1)	2018.12.04
[논문] DEC 리뷰 : Unsupervised Deep Embedding for Clustering Analysis (2)	2018.09.19

현재글[논문] SRGAN 리뷰 : Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

슈퍼짱짱 슈퍼짱짱 님의 블로그입니다.

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

슈퍼짱짱