Encoding parameters

Introduction

After editing my DV recordings in iMovie I keep the DV-quality sources that are used as an input to iDVD. This should allow me to reprocess or import them later, taking advantage of future video formats and filters without quality loss. Unfortunately, Those files take a lot of space and the iMovie project often needs to be split across several dvd. Besides, the files can barely be played directly from the dvd.

I have been trying to encode those files to MPEG4 using for example 3IVX or XVID with very bad results, not even good enough for sharing over Internet. It should be possible however to achieve encodings which are good enough for archiving, since it works for professional movies. What's different in a DV recording might be typically interlacement, bad exposure, high noise level, low resolution, unsteady shooting...

Before trying to find out a way to compensate for those factors and preserve interlacement of a DV encode, I feel that I need to become at least capable to encode a good source (like a dvd movie). This is where this document starts. I will write down there my notes at the same time I make my progress with the hope that others will find them useful and contribute to them. This document is a work in progress, but there is already a useful conclusion.

Note: since my 800MHz iMac is obviously not a good platform for video encoding experiments, I used an Athlon PC running Linux. All the tools I used should however run under Mac OS X. By the way, if you don't have time to read this page, you should probably jut get Handbrake for your Mac!

Evaluation

Having heard a lot about MEncoder and libAVC, I started by collecting sets of encoding parameters for LibAVC in MEncoder. XVID is fast and doesn't seem to have so many parameters to tweak so I just gave it a try with what I expected to give the best results. The aim of the first evaluation is to point out the parameters that should systematically be used in order to build a good base for testing.

Test conditions

The source I picked for the test should be a good compromise for a start: not interlaced, not too complex pictures, yet not "flat" (somewhere between a movie and a cartoon), but a few tricky parts with swarms and steam for example. The sequence is about 8 minutes long and there are very few easy parts where to save bits, so the default 800kbits rate is surely stressing the codecs.

According to the "0.2-0.25 bit per pixel" rule, I should be encoding with a bitrate of 1400 to obtain good quality (the sequence is 624x368 px at 25 fps). In any case, most of the parameter sets in this test are meant to be use with high bitrates and some parameters might only be effective in that situation.

MEncoder can be configured to return a value that represents the quality of an encoding: the PSNR or peak signal to noise ratio. This value will be used to rank the different results.The sound is completely disabled during the encoding, since it is not part of the problem I want to solve. Also note that I tried the TURBO option for the first pass but decided not to use it because it introduced variations of the PSNR with a magnitude too close to the difference between two sets of parameters.

Results presentation

The following table is meant to give an overview of all sets of parameters in the test while making it easy to compare them. The sets of parameters are given names which come from the original documents where I found them. But to simplify the document organization they are also given an ID number. The parameters name are linked to their respective online documentation in order to simplify the navigation in this dense source of information. Since I have also gathered notes about them, you can find the default value linking to further unofficial information. Finally, most of the encodings are run in 2 pass and in those cases where the parameters are not the same between pass 1 and 2, I used the "pass1;pass2" notation.

The script used to run the different encodings can be found here.

Parameter (notes and doc)Default Tuxrip_Lavcbasicgoodfast slowreasonably fast, good quality for anime Somewhat better, very slowAbsolutely best but awfully slow some other better optionsHP example from the doc
standardextremeold extreme-1-pass 1pass 2pass 3
vqmin2







11




vqscale02;2;2;2;
2;2;2

2;2; 2;2;
vqmax3120202020



55




vmax_b_frames0
11 1
1111111 11
mbd012121 2;2
2222222
v4mvn.a.xxxx
x;x
xxxxxxx
vpassn.a.1;21;21;21;2
1;21;21331;21;21;2 1;21;2
vbitrate800;-;-;-;- -;-;-
--;-;- ;-;--
vqblur0.5














vqcomp0.5





;0.6
0.60.6




vlelim0



-4









vcelim0
7
7 7









lumi_mask0



0.05









dark_mask0



0.01









scplx_mask0














mbcmp0





2;2





3
precmp0
2
2







62
cmp0
2
2
2;2

2226 23
subcmp0
2
2
2;222626 623
predia1


3;3

33 3
dia1


-1
-12-13 3
trelln.a.
xxx
x;x
xxxx xxx
cbpn.a.
x
x




x
x


mv0n.a.
x
x




x
x


qprdn.a.








x
x


last_pred0




21;212 323


preme1




2;2

222


qns0








2
2


vqdiff3














qpeln.a.

x x
x








Test sequence 1 @ 800 kb/sPSNR38.0839.2039.4639.46 37.8739.7539.3639.1439.04 39.4839.3939.3039.2138.39
U.time7m15s27m42s29m23s39m24s3m19s 47m34s14m32s94m28s 32m26s135m26s95m51s25m16s23m27s
Test sequence 1 @ 1400 kb/sPSNR41.6642.7042.9643.00 41.1743.2042.8042.8842.88 42.9042.7542.7942.7342.20
U.time7m18s28m46s30m30s40m31s3m24s 47m43s15m22s92m53s 33m21s146m40s100m04s26m20s25m15s
ID1234567 8910111213

In addition to libAVC, one set of parameters using XVID was run: max_bframes=1, gmc, trellis, me_quality=6, vhq=4. It was given ID 14:

at 800 kbit/s at 1400 kbit/s

Results analysis

The graphic below helps to compare the results with respect to quality and cost. Here is the GNUPlot source.

At this point I wanted to rank the results and find a set of base parameters to always include. Then I could have chosen a shorter list of parameters to experiment with, and also continue with different kind of sources. But something is not right with the PSNR values I'm getting.

The 3 pass encode (ID 8 @ 800kb/s), for example, gives a lower PSNR after the third pass compared to the second one. In my opinion, the third one looked better by a very small margin. There is a bigger difference between ID 11 and 6 at 1400kb/s: the reasonably fast encode has a much better PSNR than the far-too-slow one. But I find that ID 11 is sharper than 6 while presenting the same quality level.

So the conclusion so far is that I can't reach a conclusion using just the PSNR. I need to find an alternative method to evaluate encoding quality.

Quality index

Visual evaluation

The best way to evaluate the quality of an encoding should be to use your subjectivity, I guess. But I cannot watch all sequences, one after the other, and then rank them. I would have to at least play them two by two and probably step picture by picture thru the complicated parts of the movie. That would take far too much time and be too inaccurate.

There is an ImageMagick example in image comparing at http://www.cit.gu.edu.au/~anthony/graphics/imagick6/compare/. In short, you could use the following command to highlight the differences between a frame from an encoded movie and the respective frame from the original material: convert a.png b.png -compose difference -composite -normalize x:. The "normalize" option is used to make sure you can locate the differences even when they are very small. The drawback of normalization is that you will no longer see how big the difference is.

In order to take advantage of normalization without loosing the ability to rank the difference I normalized the mosaic of all the non-normalized differences from the 14 experiments. Picking-up every fifth frame and animating the result back into MEncoder would have produced a navigable source to study quality thru the entire sequence. Unfortunately, this method still doesn't allow to rank the subtle artifacts of the best encodes. In addition to this, MEncoder failed to produce a valid movie out of the mosaic.

This method might however become useful when trying to visualize the effect of a specific parameter, so the commands used are provided here...

More statistics

So ImageMagick couldn't help with a visual evaluation, but it has more to offer: it can return other difference measurements than PSNR (Peak signal-to-noise ratio). The other metrics available are MAE (Mean Absolute Error), MSE (Mean-Square Error), PSE (Proportion of systematic error?), and RMSE (Root Mean-Square Error). MSE and RMSE are directly linked to PSNR so I will only consider PSNR, MAE and PSE in the future. The script I used to compare the different encodings to the original according to the different metrics is available here. It assumes that you have previously extracted the frames from the source as shown before and stored them in a directory called "opng".

In order to easily compile all the numbers that will be extracted and create readable graphics out of them I used "R" and GNUPlot. The script mentioned above is meant to generate data tables that can be imported into R or GNUPlot for further processing. In the case of ID 5 @ 800kbit it produces a file (a05-800.data) like this:

Test Frame MAE MSE PSE PSNR RMSE
05a 00000001 780.049 1.10525e+06 14392 35.8949 1051.31
05a 00000002 788.262 1.12225e+06 14392 35.8286 1059.37
05a 00000003 669.871 816500 10280 37.2099 903.604
...

In R you can import it with the commands test05b <- read.table("a05-1400.data", header=TRUE) and test05b$Test <- factor(test05b$Test). Then you can, for example, compare the different metrics to PSNR with commands like: plot(test05b$PSNR,test05b$MAE). You can save the graphics to an EPS file by surrounding the plot command with postscript("plot1.eps", horizontal=FALSE, onefile=FALSE ) and dev.off(). You can also generate better graphics with this GNUPlot script, the results are shown below. Obviously, PSE will be an interesting complement to PSNR.

PSNR versus MAE PSNR versus MSE PSNR versus PSE PSNR versus RMSE

In order to import all the statistics from the different encodings into R, you can merge the data files. But there must be only one title line:

grep ^[Test] a01-800.data > t800.data
grep -h ^[^Test] a??-800.data >> t800.data

Then you can import it with t800 <- read.table("t800.data", header=TRUE) and t800$Test <- factor(t800$Test). To set the "Test" column as a factor allows indexing the data in subsets. You can then use the "apply" function to compute, for example, the average PSNR of each test instead of an average of all the tests: tapply(t800$PSNR,t800$Test,mean). But the "summary" function is used instead since it provides minimum and maximum in addition to mean. The standard deviation can also be computed with tapply(t800$PSNR,t800$Test,sd).

With R it is possible to quickly generate relevant graphs of t800.data using commands like plot(t800$Test,t800$PSNR, main="comparison of PSNR variations for all experiments", xlab="Experiment No.", ylab="PSNR"). But GNUPlot generates nicer vues with more flexibility so it will be used instead. An other type of graph that can be interesting is showing the frequency of PSNR instead of just min, max, mean and sd. Isolate the PSNR (for example) with x <- tapply(t800$PSNR,t800$Test,'+') and trace an histogram for the set 08a (for example) with hist(x[["08a"]],nclass=100,prob=TRUE).

New evaluation

Results presentation

Now that I have a unified way to compute the average PSNR, it is easier to include the XVID set (ID 14) and compare it to the rest. Unfortunately, MEncoder doesn't seam to compute the PSNR like ImageMagick does. Either that or I don't compute the average value like MEncoder does. Anyway the results are similar, the ranking is nearly the same.

setMin.MeanMax.SD
01a25.0433.7340.762.551
02a26.4834.1940.702.024
03a26.2234.4740.711.969
04a26.5434.3641.681.967
05a25.5433.4540.052.567
06a26.3034.6541.681.927
07a27.2434.3042.991.826
08a26.0834.1541.951.971
09a25.9834.4841.401.986
10a28.3034.3942.841.926
11a26.5034.3440.701.993
12a26.4834.2740.702.005
13a24.6933.8741.302.441
14a28.7134.0141.851.426
01b27.8636.0741.101.801
02b29.8836.5942.191.480
03b29.7536.8041.711.426
04b30.2136.7742.221.418
05b29.1935.7940.992.083
06b29.8636.9542.301.404
07b30.3236.6942.211.425
08b30.8236.7142.211.359
09b29.6236.7742.361.469
10b31.1836.6442.361.373
11b29.9736.7042.421.467
12b29.7236.6642.111.483
13b28.5436.3442.291.676
14b30.9136.5142.201.261

Note that, on the graph, the sets have been sorted according to the average PSNR results from the 800kbit run.

The same process is used to produce MAE statistics, and the sets are kept in the PSNR order as before.

setMin.MeanMax.SD
01a430.5940.42491.0273.4321
02a467.8894.72123.0195.5090
03a439.4872.52175.0187.4920
04a410.7878.32110.0186.3319
05a456.3973.32302.0279.3477
06a409.4852.22183.0178.0003
07a330.8881.71961.0173.6005
08a388.7873.02103.0181.1457
09a420.0866.52258.0187.5298
10a342.7856.21597.0177.3255
11a465.0884.32119.0192.0117
12a468.8887.52123.0192.3748
13a393.6929.12540.0254.2437
14a399.7899.11645.0144.1259
01b437.6723.91842.0146.45086
02b388.9684.91432.0109.52070
03b412.4672.21458.0103.14029
04b387.5671.61392.0102.03105
05b440.5755.51639.0182.45569
06b382.5659.21441.099.10351
07b388.0677.31366.0103.85231
08b387.4666.51278.095.47843
09b379.5671.61478.0106.53219
10b378.2670.21202.096.41297
11b377.0678.71423.0107.34539
12b392.0679.91464.0109.13061
13b379.6705.71704.0132.51930
14b388.3685.21278.092.81201

Note that with MAE, smaller is better.

The same process is used to produce PSE statistics, where smaller is also better.

setMin.MeanMax.SD
01a514020270460006315.201
02a462618100537105509.423
03a436917800506305397.179
04a462617740498605385.221
05a488320360483206366.190
06a436916820498605072.728
07a462617530508904996.278
08a539720550506306977.532
09a436917150501205136.363
10a514018890514005540.230
11a436917500532005321.470
12a411217860514005463.287
13a591119270472906486.815
14a488317990367504228.322
01b514014710318704038.446
02b436913090364903339.131
03b385512890341803261.226
04b411212830341803237.822
05b488315050341804354.195
06b411212370341803106.595
07b436912790362403183.759
08b462613720347003537.770
09b436912630354703195.715
10b488313860341803636.962
11b436912790349503281.364
12b411212940349503311.374
13b488313550321303649.528
14b359813220329003243.092

While providing interesting details, the candle sticks make it difficult to compare the different quality metrics. I found the quality/cost diagram more practical for that:

Results analysis

PSE

Let's start with a major simplification: PSE is giving a really bad index to 08a compared to 13a or 08b compared to 07b, for example. And that is so obviously wrong when you check the actual sequence that I will not consider PSE any longer.

PSNR

Here is a comparison of the rankings obtained with MEncoder's PSNR and ImageMagick's PSNR...

07a and 11a, like 07b and 11b, 04a and 10a, or 10b and 12b, are permuted but were very close. 04b, on the other hand, was given a lower mark by ImageMagick than MEncoder. When comparing 04b and 09b I find that 04b seems to have less artifacts while 09b looks sharper, it is very close. When comparing 04b and 03b instead, I still find that 04b seems to look better than 03b, it is even closer. This leads to the conclusion that the PSNR from MEncoder is more relevant and that PSNR differences of 0.05dB are not decisive.

MAE

Here is a comparison of the rankings obtained with ImageMagick's PSNR and MAE...

07a and 11a, like 07b and 11b, are permuted but were very close. 08a, 08b, 10a and 10b, were given better marks by MAE than by PSNR. When comparing 08a and 02a or 07a, I would say that MAE is more fair than PSNR. Comparing 10b and 07b also leads to the same conclusion. 03b was placed behind 04b and 09b by MAE. When comparing 09b and 03b, MAE is again more fair.

XVID

The results from the XVID test have been ranked quite low. 07, for example, is ranked higher than 14 for a comparable cost. 14 appears to have a very small standard deviation, however. XVID may provide a more homogeneous quality at the expense of average quality. But it seems that a similar effect can be obtained in lavc with 3-pass encoding. In general, it looks like lavc allows you to reach higher quality than XVID... given enough time if needed.

qpel

To evaluate the effect of qpel is easily done with 02 and 04, since 04 is like 02 with qpel added up. It gives a huge quality boost according to PSNR and MAE, and a visual evaluation reveals that 04 is much sharper. 04a is not as well ranked as 04b, the documentation explains this by the fact that qpel requires more bits and may hurt at low bitrates. Looking at the results of 03, it seems that qpel helps a lot just by itself.

09 and 06 are also very similar, with only qpel and predia to differ. They show that qpel can still give a quality boost even with a set of parameters that already yields high quality.

Extended tests

vqscale

13 is the only set using a given bitrate instead of vqscale during pass 1 out of 2. It was meant to be used in a very high quality encode (at a much higher bitrate than 1400kbit) and the result is a disappointment. Let's see what difference vqscale can make here and derive a set called 13v from 13 by adding vqscale=2 in pass 1:

setUser timePSNRImagemagick PSNRImagemagick MAE
Min.MeanMax.SDMin.MeanMax.SD
13a23m27s38.3924.6933.8741.302.441393.6929.12540.0254.24
13va26m20s38.4024.9233.9241.682.480399.7899.11645.0259.35
13b25m15s42.2028.5436.3442.291.676379.6705.71704.0132.52
13vb27m03s42.1128.5236.2942.111.722389.9709.51706.0137.21

It looks like vqscale is not doing any good, except at low bit rates. There is only a significant difference between 13a and 13va where vqscale seems to help, but after a visual check I would say I prefer 13a. Now let's double check this on 09 and derive a set called 09v by removing vqscale from 09:

setUser timePSNRImagemagick PSNRImagemagick MAE
Min.MeanMax.SDMin.MeanMax.SD
09a32m26s39.4825.9834.4841.401.986420.0866.52258.0187.53
09va25m41s39.5026.0034.4840.911.975460.0866.62226.0186.09
09b33m21s42.9029.6236.7742.361.469379.5671.61478.0106.53
09vb27m36s42.9029.8636.7742.241.452387.0671.41436.0105.61

09v costs less for an equal quality with a smaller standard deviation. After a visual comparison of 09vb and 09b I decide to avoid using vqscale.

cbp, mv0

According to an earlier investigation, cbp and mv0 should yield better results but 09 is not using them. Let's derive 09c by adding cbp and mv0 to 09:

setUser timePSNRImagemagick PSNRImagemagick MAE
Min.MeanMax.SDMin.MeanMax.SD
09a32m26s39.4825.9834.4841.401.986420.0866.52258.0187.53
09ca34m39s39.4925.9834.4641.691.987408.7869.92258.0188.1
09b33m21s42.9029.6236.7742.361.469379.5671.61478.0106.53
09cb35m46s42.8829.6236.7442.291.454384.0674.41476.0106.0

The stats are not helping here. After a visual evaluation, 09c appears to be sharper. This must come at some cost: pixel-size "ringing" artifacts, but I decide to keep cbp and mv0 for the sharpness.

vqdiff

The documentation suggests to use 2 instead of 3 for vqdiff. Let's derive 09f by setting vqdiff to 2 in 09:

setUser timePSNRImagemagick PSNRImagemagick MAE
Min.MeanMax.SDMin.MeanMax.SD
09a32m26s39.4825.9834.4841.401.986420.0866.52258.0187.53
09fa32m35s39.4825.9834.4741.571.981413.6866.92258.0187.1
09b33m21s42.9029.6236.7742.361.469379.5671.61478.0106.53
09fb33m34s42.9029.6236.7741.981.467396.0671.71478.0106.5

The stats are not helping here either. And like earlier a visual evaluation reveals a sharper picture. This time the artifacts are more important at low bitrate but high bitrate looks ok.

vqcomp

The documentation suggests to test vqcomp in the range [0.5,0.7]. Let's derive 09p6 and 09p7 by setting respectively vqcomp to 0.6 and 0.7 in 09:

setUser timePSNRImagemagick PSNRImagemagick MAE
Min.MeanMax.SDMin.MeanMax.SD
09a32m26s39.4825.9834.4841.401.986420.0866.52258.0187.53
09p6a32m34s39.5026.7434.4240.711.842423.1870.72083.0174.4
09p7a32m34s39.4927.6534.3440.891.689476.5876.51864.0161.9
09b33m21s42.9029.6236.7742.361.469379.5671.61478.0106.53
09p6b33m31s42.8930.2736.7442.241.404387.2673.41384.0101.7
09p7b33m27s42.8630.8036.7042.271.349385.1676.11293.098.03

Higher vqcomp is reducing standard deviation but it is hard to say more without visual evaluation... As before, it increases sharpness and artifacts but 0.6 seems to be a good balance.

vqmax

A few examples are setting vqmax to 20, the documentation even suggests 6. Let's derive 09x2 and 09x6 by setting respectively vqmax to 20 and 6 in 09:

setUser timePSNRImagemagick PSNRImagemagick MAE
Min.MeanMax.SDMin.MeanMax.SD
09a32m26s39.4825.9834.4841.401.986420.0866.52258.0187.53
09x2a32m34s39.4825.9834.4841.401.986420.0866.52258.0187.5
09x6a34m2639.2826.0634.3241.661.967409.4879.62195.0186.4
09b33m21s42.9029.6236.7742.361.469379.5671.61478.0106.53
09x2b33m33s42.9029.6236.7742.361.469379.5671.61478.0106.5
09x6b33m35s42.9029.6936.7742.141.464391.5671.91478.0106.4

vqmax=20 made no difference at all (produced an identical .avi). vqmax=6 seems to reduce PSNR or MAE at low bitrate but, when comparing visually, it shows much less ringing artifacts as well as a more blur picture. At higher bitrate, on the other hand, I wouldn't use this parameter. vqmax=20 is probably a reasonable/harmless limit to use.

precmp

Since cmp is set to 2 for 09, let's try precmp=2 and call it 09pre:

setUser timePSNRImagemagick PSNRImagemagick MAE
Min.MeanMax.SDMin.MeanMax.SD
09a32m26s39.4825.9834.4841.401.986420.0866.52258.0187.53
09prea33m8s39.5325.9934.5141.321.985422.5863.22256.0186.9
09b33m21s42.9029.6236.7742.361.469379.5671.61478.0106.53
09preb34m10s42.9329.6336.7942.081.466393.8670.31475.0106.3

That was a cheap and tangible improvement.

dia

Let's try 1 instead of -1 for 09 and call it 09dia:

setUser timePSNRImagemagick PSNRImagemagick MAE
Min.MeanMax.SDMin.MeanMax.SD
09a32m26s39.4825.9834.4841.401.986420.0866.52258.0187.53
09diaa26m42s39.3326.5034.3640.701.981429.3878.62118.0188.0
09b33m21s42.9029.6236.7742.361.469379.5671.61478.0106.53
09diab27m45s42.8129.7436.7142.241.469386.0675.91460.0107.1

Oops.

B frames

According to the investigation mentioned earlier, 1 B frame is good but none can be better on anime. Let's derive 09xb by removing B frames from 09:

setUser timePSNRImagemagick PSNRImagemagick MAE
Min.MeanMax.SDMin.MeanMax.SD
09a32m26s39.4825.9834.4841.401.986420.0866.52258.0187.53
09xba22m32s38.6725.1034.1240.872.467434.0905.62483.0253.9
09b33m21s42.9029.6236.7742.361.469379.5671.61478.0106.53
09xbb23m27s42.2828.7136.4141.971.707397.0700.11673.0134.1

B frames are good.

vlelim, vcelim

I read once that vlelim=-4 and vcelim=7 was recommended by the Joint Video Team. Let's test this by deriving 09m:

setUser timePSNRImagemagick PSNRImagemagick MAE
Min.MeanMax.SDMin.MeanMax.SD
09a32m26s39.4825.9834.4841.401.986420.0866.52258.0187.53
09ma32m55s39.4126.2334.3841.642.016410.9875.42189.0192.7
09b33m21s42.9029.6236.7742.361.469379.5671.61478.0106.53
09mb33m57s42.8429.5636.7041.991.477398.1676.71483.0108.6

Not a good move.

New selection

Putting all this together would be like vmax_b_frames=1:mbd=2:v4mv:trell:precmp=2:cmp=2:subcmp=2:dia=-1:predia=1:last_pred=2:preme=2:vqmax=20:vqcomp=0.6:cbp:mv0. Let's give it ID 15. Since I am not sure about vqcomp=0.6, I also run another test without it: 15p...

setUser timePSNRImagemagick PSNRImagemagick MAE
Min.MeanMax.SDMin.MeanMax.SD
09a32m26s39.4825.9834.4841.401.986420.0866.52258.0187.53
15a27m52s39.5726.8334.4342.351.828378.7871.62038.0174.7
15pa28m13s39.5626.0734.5042.041.959393.0866.02212.0185.3
09b33m21s42.9029.6236.7742.361.469379.5671.61478.0106.53
15b30m00s42.9130.3936.7342.231.389387.2674.71376.0101.7
15pb30m01s42.9330.0636.7742.281.432384.1672.21429.0104.2

This set of parameter gives good quality in a very reasonable time. vqcomp=0.6 is good to use at low bitrate but in general it is better without. So the following tests based on 15 will be using vqcomp=0.6 at 800kbit but not at 1400kbit.

Computing PSNR and MAE statistics with Imagemagick is taking most of the test processing time, so better use should be made of the psnr*.log file. This file even contains the quantitizer of each frame and it can be interesting to examine its frequency. In order to import it into R, it would be enough to give it a header and strip it from the commas with this command: sed -e 's/,/ /g' psnr_235453.log.

last_pred

According to documentation, last_pred should be chosen between 1 and 3. Let's check values from 0 to 4 by deriving 15 into 15L0, ..., 15L4:

setUser timePSNRLog PSNRImagemagick MAELog quantitizer
Min.MeanMax.SDMin.MeanMax.SDMin.MeanMax.SD
15L0a26m38s39.5132.0539.9949.952.051458.3873.52047.0176.126.829162.015
15L1a26m41s39.5332.1140.0051.752.044393.4872.82037.0175.226.803162.008
15a27m52s39.57



378.7871.62038.0174.7



15L3a29m07s39.6032.1340.0749.972.030469.9867.22034.0174.726.721151.982
15L4a30m55s39.6232.1540.0949.572.014469.4865.32031.0173.226.703151.974
15L0b28m34s42.8836.0043.2252.121.659388.6674.41435.0105.924.15891.336
15L1b28m59s42.8936.0043.2251.721.656383.5674.01435.0105.624.15291.333
15pb30m01s42.93



384.1672.21429.0104.2



15L3b31m21s42.9436.2743.2651.451.640395.7671.71407.0104.024.12291.328
15L4b33m07s42.9636.4743.2851.651.637396.2670.91375.0103.424.11281.324

last_pred=4 is making a very small difference but is not as expensive as the doc said. 3 would be enough but 4 is safe.

preme

According to documentation, preme has to be chosen between 0 and 2. Let's check the other values by deriving 15 into 15m0 and 15m1:

setUser timePSNRLog PSNRImagemagick MAELog quantitizer
Min.MeanMax.SDMin.MeanMax.SDMin.MeanMax.SD
15m0a26m35s39.4932.0239.9849.982.080459.0874.32051.0177.526.855162.025
15m1a26m40s39.5132.1439.9951.762.056395.2873.52033.0176.126.844152.022
15a27m52s39.57



378.7871.62038.0174.7



15m0b28m52s42.8735.9943.2152.101.664389.3674.81436.0106.124.17291.340
15m1b28m54s42.8735.9943.2152.111.661390.1674.71436.0106.124.16791.339
15pb30m01s42.93



384.1672.21429.0104.2



preme=2 is best.

vqmax again

Now that I started to look into statistics for the quantitizer, I'm curious to experiment again with vqmax. Since the maximum we reach in this example is 16 there is no doubt left that vqmax=20 has no effect. Let's check the quantitizer frequency of the 2 bests sets: 15L4a and 15L4b, with or without limiting it at 10 and 6 respectively:

setUser timePSNRLog PSNRImagemagick MAELog quantitizer
Min.MeanMax.SDMin.MeanMax.SDMin.MeanMax.SD
15L4a30m55s39.6232.1540.0949.572.014469.4865.32031.0173.226.703151.974
15L4qa31m03s39.6332.5440.1049.792.016469.4865.11947.0173.526.669101.897
15L4b33m07s42.9636.4743.2851.651.637396.2670.91375.0103.424.11281.324
15L4qb33m13s42.9636.5343.2851.751.630383.3671.11370.0103.324.10461.300

Setting vqmax to avoid a few frames with higher quantitizer seems to have little impact on the rest of the movie. Unfortunately it is difficult to choose an appropriate value before encoding the movie. Let's assume that vqmax=12 is a more relevant value than 20.

qns

Now that I have turned everything to the max, let's experiment with qns. 15L4q will be used to derive 16x, 16y and 16z using qns equal 1, 2 and 3 respectively.

setUser timePSNRLog PSNRImagemagick MAELog quantitizer
Min.MeanMax.SDMin.MeanMax.SDMin.MeanMax.SD
16xa82m43s39.5231.94050.22.068470.18482000169.826.199101.802
16ya96m14s39.5131.994050.012.088467.3849.31985170.826.283101.823
16za131m21s39.5031.984050.172.091467.5849.41983171.226.297101.828
15L4qa31m03s39.6332.5440.1049.792.016469.4865.11947173.526.669101.897
16xb93m2s42.8436.7443.1851.731.664380.1665.51341101.123.90361.272
16yb110m56s42.8836.3743.2251.811.692368.9663.71364102.123.95261.277
16zb147m44s42.8836.3743.2351.781.689397663.3136510223.95961.277
15L4qb33m13s42.9636.5343.2851.751.630383.3671.11370103.324.10461.300

It is an interesting case: PSNR says that qns is bad but MAE and average quantitizer say it is good. The cost is for sure very high but after a visual check I would say that qns can do almost the same magic as qpel.

Conclusion

In order to further improve the result, it would be time to systematically test variations of all *dia and *cmp. This would be very time consuming and I will rather focus on that when encoding actual DV. The aim for now is to gather very good (yet fast) sets of parameters that worked on different style of video and use that base for further experiments on DV.

The set of parameters that was finally selected for this type of video was... vmax_b_frames=1:mbd=2:v4mv:trell:precmp=2:cmp=2:subcmp=2:dia=-1:predia=1:last_pred=4:preme=2:cbp:mv0:vqcomp=0.6:vqmax=12.

You may add qns=1 if you have time!

Noise

lavc parameters are important to the encoding quality of course, but in some situation they need extra help to make the source easier to encode. Noise is an example of detail that can easily confuse the encoder and waste its efforts. It is recommended to filter out noise before encoding and add some later during playback instead. The drawback is that you will risk to filter out small details as well.

There must be some optimum compromise between the details that you filter out and the details that you save with a better encoding quality. Since I have no idea how to use the denoise filter, I could run a brute-force evaluation of all combinations in a given range. The problem is that MEncoder calculates the PSNR with the filtered original as a reference: the stronger the filter, the higher the PSNR. Imagemagick will allow me to calculate MAE using instead the unfiltered original as a reference. The material used for the test is similar to the one I evaluated before, with the difference that the source has lots of analogue film noise and artifacts.

In order to run the tests, I used a script. And in order to extract the statistics, I wrote a little C program that uses libmba to parse PSNR logs and uses the Imagemagick library to calculate MAE. The entire statistics extraction is run with a script.

High bitrate

At 1400 kbits, noise can be encoded correctly without much quality loss. Here we can find out if there is a good compromise with a very light filter.

Given time

Obviously, you shouldn't use luma=1. It looks like the best results are obtained for the smallest values of luma and chroma.

Given chroma

It seems that 0:0:4 will be best, even if there is a local minimum for a stronger filter.

Given luma

hqdn3d=0:0:4 is best!

Low bitrate

At 800 kbits, noise may need to be removed to prevent quality loss. Here we can find out if there is a possibility to compromise with a the lightest possible filter.

Given chroma

You should still not use luma=1.

Given luma

hqdn3d=0:0:5 is best in this case, almost even with hqdn3d=0:0:4.

Recommendation

Chroma should be kept as low as possible in this case, and Luma=1 should not be used. hqdn3d=0:0:4 gives best results, even 0:0:5 can be used to help encoding further. There is a local optimum at 2:1:3, which is worth trying too.

Unfortunately, after I ran the same test on other types of cartoons, I realized that finding an optimum for hqdn3d=0:0:4 was probably just an exception. In general you would probably just find an optimum for 0:0:0! I believe that the recommendations above stay valid anyway: even when hqdn3d=0:0:4 is not an optimum it has at least a very low impact. One last note: hqdn3d=0:0:6 seems to be best on de-interlaced sources.