Users prefer Guetzli JPEG over same-sized libjpeg (和訳)

J. Alakuijala, R. Obryk, Z. Szabadka, and J. Wassenberg
Google Research
March 14, 2017

https://arxiv.org/abs/1703.04416v1

Abstract

We report on pairwise comparisons by human raters of JPEG images from libjpeg and our new Guetzli encoder.

libjpegと我々の新しいGuetzliエンコーダのJPEG画像を対にした人間の評価者による比較をレポートします。

Although both files are size-matched, 75% of ratings are in favor of Guetzli.

両方のファイルはサイズが一致しますが、評価の75％がGuetzliに有利です。

This implies the Butteraugli psychovisual image similarity metric which guides Guetzli is reasonably close to human perception at high quality levels.

これは Butteraugliの視覚心理的画像類似性メトリックを示していて、Guetzl が高い品質レベルで人間の知覚に合理的に近くなるよう導きます。

We provide access to the raw ratings and source images for further analysis and study.

我々は、生の評価およびソース画像にアクセスして、さらに分析および研究を行うことができます。

1 Introduction

Guetzli is a new JPEG encoder that uses the recently introduced Butteraugli psychovisual similarity metric to make rate/distortion decisions.

Guetzli は最近導入された Butteraugli 視覚心理類似度メトリックを使用してレート/歪みの決定を行う新しいJPEGエンコーダです。

Thus, Guetzli produces images that Butteraugli believes to be ‘better’ than standard libjpeg.

したがって、Guetzli は Butteraugli が標準の libjpeg よりも「良い」と信じる画像を生成します。

We undertook an experiment to see whether humans agree with this assessment. The settings are very simple: we have pairwise comparisons between size-matched images, both in standard JPEG format, and see whether human raters prefer one or the other.

私たちは、人間がこの評価に同意しているかどうかを確認するための実験に着手しました。設定は非常に簡単です。標準的なJPEG形式のサイズマッチングされた画像同士をペアワイズで比較し、人間の評価者がどちらか一方を好むかどうかを確認します。

2 Materials and Methods

素材と手法。

2.1 Source images

元の画像。

To ensure a meaningful evaluation of compressor quality and output size, we create a 31-image dataset with known source and processing.

圧縮品質と出力サイズの有意義な評価を確実にするために、我々は、既知のソースとを有する31-画像データセットを作成する処理。

The images are publicly available [1].

画像は公開されています。

We reduce JPEG artifacts by capturing images at the highest JPEG quality level using a Canon EOS 600d camera and downsampling the resulting images by 4x4 using Lanczos resampling, as implemented in GIMP.

我々は、Canon EOS 600d カメラを使った最高 quarityレベルで撮影し、GIMP に実装された Lanczos リサンプリングを使って 4x4 ダウンサンプリングで、JPEG アーティファクトを抑えました。

Photographers often apply unsharp masking to compensate for downsampling, so we also apply it in most of the images before downsampling.

フォトグラファーは、アンシャープマスキングを適用してダウンサンプリングを補正することが多いため、ダウンサンプリング前のほとんどの画像にも適用します。

The degree of unsharp masking is chosen arbitrarily, but before any compression experiments.

アンシャープマスキングの度合いは任意の選択ですが、全て圧縮実験の前です。

The images are chosen to cover a wide range of contents, including nature, humans, smooth gradients, high-frequency detail, with relatively thorough coverage of the sRGB gamut.

画像は、自然、人間、スムーズなグラデーション、高周波ディテールなど、幅広いコンテンツをカバーするように選択されており、sRGB色域を比較的徹底してカバーしています。

2.2 Image degradation method

We are testing two compressors which offer a distortion vs size trade-off via a single quality parameter.

我々は、単一の quality パラメータを介して歪みとサイズのトレードオフを提供する2つの圧縮器を試験しています。

Guetzli is designed for high-quality, visually lossless compression, so we choose its quality parameter to be 94, which results in a rate of approximately 2.6 bits per pixel.

Guetzliは高品質で見た目のロスがない圧縮を目的として設計されているため、品質パラメータを94に設定すると、ピクセルあたり約2.6ビットのレートになります。

For each guetzli-compressed image, we generate a JPEG image for comparison by invoking ImageMagick 6.7 via convert -sampling-factor 1x1 with decreasing quality parameter until the libjpeg output is smaller than the guetzli JPEG.

guetzli 圧縮された画像毎に、我々は ImageMagick 6.7 で convert -sampling-factor 1x1 を呼び出し、 libjpeg の出力がguetzli JPEG より小さくなるまで quality パラメータを下げていって比較用の JPEG 画像を生成しました。

The final libjpeg JPEG file is generated at the next higher quality level, which guarantees it is at least as large as (and typically larger than) the Guetzli file.

最終的なlibjpeg JPEGファイルは、次に高い quality レベルで生成されます。これは、少なくともGuetzliファイルと同じ大きさ（通常はそれより大きい）であることを保証します。

Note that the scale of the resulting libjpeg quality parameters is slightly different; we see a minimum of 83, maximum 93, average 89.4 and median 90.

結果のlibjpeg品質パラメータのスケールはわずかに異なります。最小値は83、最大値は93、平均値は89.4、中央値は90です。

Both compressors produce a standard-conformant JPEG bitstream.

どちらの圧縮器も標準に準拠したJPEGビットストリームを生成します。

Before display, we upsample both images by a factor of two using nearest neighbor sampling (i.e. pixel replication) and crop them to 900x900 pixels starting at the top-left corner so that the images fit on our screen.

表示する前に、最近接サンプリング（すなわちピクセル複製）を使用して2つの画像を2倍にアップサンプリングし、画像が画面上に収まるように左上隅から900×900ピクセルにトリミングします。

2.3 Viewing environment

To reduce the variability of the comparison results, e.g. due to differences in monitor gamut, panel bit depth and processing/LUTs, we perform all tests on a single monitor. A calibrated 27” NEC PA272 includes a 10-bit panel while still being reasonably commercially available.

モニタの色域、パネルのビット深度、処理 / LUTの違いなどにより、比較結果のばらつきを低減するため、すべてのテストを1台のモニタで実行します。キャリブレート(校正)済みの27インチのNEC PA272で10ビットのパネル。今でもリーズナブルに販売されています。

We choose viewing conditions that match a typical office environment.

型的なオフィス環境に合った条件を選択します。

2.4 Experiment design

実験のデザイン(設計)

We choose a pairwise comparison model. To avoid the need for a break, we present the 31 images once in a single session, typically 20 to 30 minutes.

一対比較評価(pairwise comparison)モデルを選択します。休憩の必要性を避けるために、我々は31枚の画像を1回のセッションで一般的には20〜30分間提示します。

We provide a custom OpenGL viewer that alternates between the two compressed variants on the right half of the screen, while displaying the uncompressed original image on the left side.

我々はカスタムOpenGLで、圧縮されたものを右半分に、圧縮されてない元の画像を左側に表示して、交互に見るビューアを提供します。

The presentation order of each image is chosen randomly, and we swap between them at a fixed rate of 0.44 Hz.

To reset perception between the two images, we fade the screen to mid-gray over 250 ms and hold it at gray for 600 ms. Our instructions are to click upon the less preferable image, at the location of the most visible artifact.

To reduce the likelihood of random guesses, we leave open the option of skipping images when there is no discernible difference.

Subjects are asked to sit at a distance of 3-4 picture heights from the screen in order to reduce eye movements when comparing with the original image.

We provide a brief explanation and training, including the example of clicking on a location.

2.5 Subjects

23 raters participated in our experiment. Their age range spans 25-46 years (median 31). 56.5% (13) are women.

All but three subjects report correctedto-normal vision; one is red-green color blind, one has slightly higher visual acuity and one has slightly lower acuity.

Subjects are recruited via convenience sampling from Google employees working nearby, or known to us.

We attempt to equalize the gender ratio and include several experienced photographers, but most subjects are not experienced in the field of image compression and only informed that they are comparing two compressors.

3 Results

Each subject generated 11 to 31 answers, with a mean of 26 and median of 29.

The raw ratings are listed below. Overall, 75% of decisions were in favor of Guetzli, i.e. the rater decided that the corresponding libjpeg image was worse.

There was only moderate variation among images; the interquartile range is 85%-68%.

In one extreme, only 22% preferred the Guetzli encoding of the ‘cloth’ image, apparently due to loss of detail in the pink goggles. Conversely, raters unanimously preferred the Guetzli encoding of ‘bees’.

The number of ratings for each image was relatively consistent (quartiles 19, 20, 21), which implies raters skipped different images.

Assuming image preferences are ‘uncertain’ if the image was in the lower quartile of number of ratings, discarding those images raises the median preference for Guetzli to 80%.

Note that this analysis method was devised after the data was collected, so it is possible the sampling method is biased towards a particular outcome.

Although discarding images that received the fewest ratings seems reasonable, we must ignore this conclusion and only report the 75% preference from the full dataset, listed below for completeness.

The ratings are listed in matrix form, one row per image, in decreasing order of total rating decisions.

Each column (corresponding to the rater’s index) indicates which is worse: L for libjpeg, G for Guetzli or blank if the image was skipped.

out-of-focus: LLLLLLGLLLLLLLLLGG LLLL
white-yellow: LLLGLLGLL LLLLLG LLLLGG
 brake-light: LLLLLLLLG LGLGLLLG GL L
 pink-flower: LLLLLLLLL  LGLGG GLLLLL
    wollerau: LLLLGLGLLGLLLLLGLLLLLLL
 red-flowers: LLGGLGLLL LLL LLLL LLGG
   geranium2: LLLLGL LLLLLLGLGLLLLLLL
  green-rose: LGLLLLGGL G LLLG G LLLL
    bicycles: LLLLLLLLLGLLG GLL LLLLL
   blue-rose: LLLGLLGLLLG GLLL L LLLG
  pimpinelli: LLLGGL LL L LL GLLLLLLL
  minerology: LLLLLL LL L GLLG LLLLGG
    red-room: LLLLGLGLGL LL LL GLLLLL
     yellow2: LLLLLL LLGLLLGLL GLLLLL
        stp2: LLLLGLLLLLLLLLLLLLLLLLL
     vflower: LLLLLL LLLLLGLGG LLLLG
        port: LLLLGLLLLGLLLLLGLL LLLL
     station: LGLGLLGGLLGGLLG    LLLL
      yellow: LGLGLLLGLGLLLLG   LLLLG
      bench2: LGLGGLLLLL  GLGL LGLLLG
    red-rose: LLGLLL LGL GGGGG L GL
    geranium: LGLLGL LLGG GG L   GLLL
         stp: LGLLLLGLLGGLLGLL LLLLLL
     rainbow: GGGGLG LGLG G LL G GGLG
       bench: LLLGGLLLLL  GLGL L LLGG
         rgb: LLLLGG LLGLLLGL LLLLLLG
       cloth: GLGGGGGGLLLGG  G G GG G
      lichen: LGLGLLLLL GLG L  L  LGL
        bees: LLLLLLLLL LLL LL  LLLLL
       green: LLLLLL LL LLLLL    LLLG
        hand: GGL GL LL LGGLLLGL L LL