Introduction
After editing my DV recordings in iMovie I keep the DV-quality sources that are used as an input to iDVD. This should allow me to reprocess or import them later, taking advantage of future video formats and filters without quality loss. Unfortunately, Those files take a lot of space and the iMovie project often needs to be split across several dvd. Besides, the files can barely be played directly from the dvd.
I have been trying to encode those files to MPEG4 using for example 3IVX or XVID with very bad results, not even good enough for sharing over Internet. It should be possible however to achieve encodings which are good enough for archiving, since it works for professional movies. What's different in a DV recording might be typically interlacement, bad exposure, high noise level, low resolution, unsteady shooting...
Before trying to find out a way to compensate for those factors and preserve interlacement of a DV encode, I feel that I need to become at least capable to encode a good source (like a dvd movie). This is where this document starts. I will write down there my notes at the same time I make my progress with the hope that others will find them useful and contribute to them. This document is a work in progress, but there is already a useful conclusion.
Note: since my 800MHz iMac is obviously not a good platform for video encoding experiments, I used an Athlon PC running Linux. All the tools I used should however run under Mac OS X. By the way, if you don't have time to read this page, you should probably jut get Handbrake for your Mac!
Evaluation
Having heard a lot about MEncoder and libAVC, I started by collecting sets of encoding parameters for LibAVC in MEncoder. XVID is fast and doesn't seem to have so many parameters to tweak so I just gave it a try with what I expected to give the best results. The aim of the first evaluation is to point out the parameters that should systematically be used in order to build a good base for testing.
Test conditions
The source I picked for the test should be a good compromise for a start: not interlaced, not too complex pictures, yet not "flat" (somewhere between a movie and a cartoon), but a few tricky parts with swarms and steam for example. The sequence is about 8 minutes long and there are very few easy parts where to save bits, so the default 800kbits rate is surely stressing the codecs.
According to the "0.2-0.25 bit per pixel" rule, I should be encoding with a bitrate of 1400 to obtain good quality (the sequence is 624x368 px at 25 fps). In any case, most of the parameter sets in this test are meant to be use with high bitrates and some parameters might only be effective in that situation.
MEncoder can be configured to return a value that represents the quality of an encoding: the PSNR or peak signal to noise ratio. This value will be used to rank the different results.The sound is completely disabled during the encoding, since it is not part of the problem I want to solve. Also note that I tried the TURBO option for the first pass but decided not to use it because it introduced variations of the PSNR with a magnitude too close to the difference between two sets of parameters.
Results presentation
The following table is meant to give an overview of all sets of parameters in the test while making it easy to compare them. The sets of parameters are given names which come from the original documents where I found them. But to simplify the document organization they are also given an ID number. The parameters name are linked to their respective online documentation in order to simplify the navigation in this dense source of information. Since I have also gathered notes about them, you can find the default value linking to further unofficial information. Finally, most of the encodings are run in 2 pass and in those cases where the parameters are not the same between pass 1 and 2, I used the "pass1;pass2" notation.
The script used to run the different encodings can be found here.
| Parameter (notes and doc) | Default | Tuxrip_Lavc | basic | good | fast | slow | reasonably fast, good quality for anime | Somewhat better, very slow | Absolutely best but awfully slow | some other better options | HP example from the doc | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| standard | extreme | old extreme | -1- | pass 1 | pass 2 | pass 3 | ||||||||||
| vqmin | 2 | 1 | 1 | |||||||||||||
| vqscale | 0 | 2; | 2; | 2; | 2; | 2; | 2; | 2 | 2; | 2; | 2; | 2; | ||||
| vqmax | 31 | 20 | 20 | 20 | 20 | 5 | 5 | |||||||||
| vmax_b_frames | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |||
| mbd | 0 | 1 | 2 | 1 | 2 | 1 | 2 | ;2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
| v4mv | n.a. | x | x | x | x | x | ;x | x | x | x | x | x | x | x | ||
| vpass | n.a. | 1;2 | 1;2 | 1;2 | 1;2 | 1;2 | 1;2 | 1 | 3 | 3 | 1;2 | 1;2 | 1;2 | 1;2 | 1;2 | |
| vbitrate | 800 | ;- | ;- | ;- | ;- | - | ;- | ;- | - | - | ;- | ;- | ;- | ;- | - | |
| vqblur | 0.5 | |||||||||||||||
| vqcomp | 0.5 | ;0.6 | 0.6 | 0.6 | ||||||||||||
| vlelim | 0 | -4 | ||||||||||||||
| vcelim | 0 | 7 | 7 | 7 | ||||||||||||
| lumi_mask | 0 | 0.05 | ||||||||||||||
| dark_mask | 0 | 0.01 | ||||||||||||||
| scplx_mask | 0 | |||||||||||||||
| mbcmp | 0 | 2; | 2 | 3 | ||||||||||||
| precmp | 0 | 2 | 2 | 6 | 2 | |||||||||||
| cmp | 0 | 2 | 2 | 2 | ;2 | 2 | 2 | 2 | 6 | 2 | 3 | |||||
| subcmp | 0 | 2 | 2 | 2 | ;2 | 2 | 2 | 6 | 2 | 6 | 6 | 2 | 3 | |||
| predia | 1 | 3 | ;3 | 3 | 3 | 3 | ||||||||||
| dia | 1 | -1 | -1 | 2 | -1 | 3 | 3 | |||||||||
| trell | n.a. | x | x | x | x | ;x | x | x | x | x | x | x | x | |||
| cbp | n.a. | x | x | x | x | |||||||||||
| mv0 | n.a. | x | x | x | x | |||||||||||
| qprd | n.a. | x | x | |||||||||||||
| last_pred | 0 | 2 | 1;2 | 1 | 2 | 3 | 2 | 3 | ||||||||
| preme | 1 | 2 | ;2 | 2 | 2 | 2 | ||||||||||
| qns | 0 | 2 | 2 | |||||||||||||
| vqdiff | 3 | |||||||||||||||
| qpel | n.a. | x | x | x | ||||||||||||
| Test sequence 1 @ 800 kb/s | PSNR | 38.08 | 39.20 | 39.46 | 39.46 | 37.87 | 39.75 | 39.36 | 39.14 | 39.04 | 39.48 | 39.39 | 39.30 | 39.21 | 38.39 | |
| U.time | 7m15s | 27m42s | 29m23s | 39m24s | 3m19s | 47m34s | 14m32s | 94m28s | 32m26s | 135m26s | 95m51s | 25m16s | 23m27s | |||
| Test sequence 1 @ 1400 kb/s | PSNR | 41.66 | 42.70 | 42.96 | 43.00 | 41.17 | 43.20 | 42.80 | 42.88 | 42.88 | 42.90 | 42.75 | 42.79 | 42.73 | 42.20 | |
| U.time | 7m18s | 28m46s | 30m30s | 40m31s | 3m24s | 47m43s | 15m22s | 92m53s | 33m21s | 146m40s | 100m04s | 26m20s | 25m15s | |||
| ID | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | |||
In addition to libAVC, one set of parameters using XVID was run: max_bframes=1, gmc, trellis, me_quality=6, vhq=4. It was given ID 14:
at 800 kbit/s- Average PSNR y : 38.12 dB, u : 41.78 dB, v : 42.00 dB
- User time: 16m10s
- Average PSNR y : 41.72 dB, u : 44.61 dB, v : 44.83 dB
- User time: 17m14s
Results analysis
The graphic below helps to compare the results with respect to
quality and cost. Here is the GNUPlot source.

At this point I wanted to rank the results and find a set of base parameters to always include. Then I could have chosen a shorter list of parameters to experiment with, and also continue with different kind of sources. But something is not right with the PSNR values I'm getting.
The 3 pass encode (ID 8 @ 800kb/s), for example, gives a lower PSNR after the third pass compared to the second one. In my opinion, the third one looked better by a very small margin. There is a bigger difference between ID 11 and 6 at 1400kb/s: the reasonably fast encode has a much better PSNR than the far-too-slow one. But I find that ID 11 is sharper than 6 while presenting the same quality level.
So the conclusion so far is that I can't reach a conclusion using just the PSNR. I need to find an alternative method to evaluate encoding quality.
Quality index
Visual evaluation
The best way to evaluate the quality of an encoding should be to use your subjectivity, I guess. But I cannot watch all sequences, one after the other, and then rank them. I would have to at least play them two by two and probably step picture by picture thru the complicated parts of the movie. That would take far too much time and be too inaccurate.
There is an ImageMagick example in image comparing at http://www.cit.gu.edu.au/~anthony/graphics/imagick6/compare/.
In short, you could use the following command to highlight the
differences between a frame from an encoded movie and the respective
frame from the original material: convert a.png b.png
-compose difference -composite -normalize x:.
The "normalize" option is used to make sure you can locate the
differences even when they are very small. The drawback of
normalization is that you will no longer see how big the difference is.
In order to take advantage of normalization without loosing the ability to rank the difference I normalized the mosaic of all the non-normalized differences from the 14 experiments. Picking-up every fifth frame and animating the result back into MEncoder would have produced a navigable source to study quality thru the entire sequence. Unfortunately, this method still doesn't allow to rank the subtle artifacts of the best encodes. In addition to this, MEncoder failed to produce a valid movie out of the mosaic.
This method might however become useful when trying to visualize the effect of a specific parameter, so the commands used are provided here...
- First you will have to extract the frames from the
source:
mplayer -dvd-device . dvd://1 -chapter 5-6 -nosound -vo png:z=9 -vf crop=680:572:20:0,scale=624:368 -frames 12252 - Use this kind of command to compose a 4x4 mosaic:
convert -background black -page +0+0 1.png -page +312+0 2.png -page +624+0 3.png -page +936+0 4.png -page +0+184 5.png -page +312+184 6.png -page +624+184 7.png -page +936+184 8.png -page +0+368 9.png -page +312+368 10.png -page +624+368 11.png -page +936+368 12.png -page +0+552 13.png -page +312+552 14.png -page +624+552 15.png -page +936+552 16.png -mosaic -normalize -resize 50% diff.png. The black background is needed if you have less than 16 pictures. - A more complete script to extract the frames, generate half-sized differences (one of them is normalized), and compose a mosaic with the frame, the differences and some statistic data is available here. It also includes the command to generate an animation from the mosaic.
More statistics
So ImageMagick couldn't help with a visual evaluation, but it has more to offer: it can return other difference measurements than PSNR (Peak signal-to-noise ratio). The other metrics available are MAE (Mean Absolute Error), MSE (Mean-Square Error), PSE (Proportion of systematic error?), and RMSE (Root Mean-Square Error). MSE and RMSE are directly linked to PSNR so I will only consider PSNR, MAE and PSE in the future. The script I used to compare the different encodings to the original according to the different metrics is available here. It assumes that you have previously extracted the frames from the source as shown before and stored them in a directory called "opng".
In order to easily compile all the numbers that will be extracted and create readable graphics out of them I used "R" and GNUPlot. The script mentioned above is meant to generate data tables that can be imported into R or GNUPlot for further processing. In the case of ID 5 @ 800kbit it produces a file (a05-800.data) like this:
Test Frame MAE MSE PSE PSNR RMSE
05a 00000001 780.049 1.10525e+06 14392 35.8949 1051.31
05a 00000002 788.262 1.12225e+06 14392 35.8286 1059.37
05a 00000003 669.871 816500 10280 37.2099 903.604
...
In
R you can import it with the commands test05b <-
read.table("a05-1400.data", header=TRUE) and test05b$Test
<- factor(test05b$Test). Then you can, for example,
compare the different metrics to PSNR with commands like: plot(test05b$PSNR,test05b$MAE).
You can save the graphics to an EPS file by surrounding the plot
command with postscript("plot1.eps", horizontal=FALSE,
onefile=FALSE ) and dev.off().
You can also generate better graphics with this GNUPlot script, the results are
shown below. Obviously, PSE will be an interesting complement to PSNR.

In order to import all the statistics from the different encodings into R, you can merge the data files. But there must be only one title line:
grep ^[Test] a01-800.data > t800.data
grep -h ^[^Test] a??-800.data >> t800.data
Then
you can import it with t800 <-
read.table("t800.data", header=TRUE) and t800$Test
<- factor(t800$Test).
To set the "Test" column as a factor allows indexing the data in
subsets. You can then use the "apply" function to compute, for example,
the average PSNR of each test instead of an average of all the tests: tapply(t800$PSNR,t800$Test,mean).
But the "summary" function is used instead since it provides minimum
and maximum in addition to mean. The standard deviation can also be
computed with tapply(t800$PSNR,t800$Test,sd).
With R it is possible to quickly generate relevant graphs of
t800.data using commands like plot(t800$Test,t800$PSNR,
main="comparison of PSNR variations for all experiments",
xlab="Experiment No.", ylab="PSNR").
But GNUPlot generates nicer vues with more flexibility so it will be
used instead. An other type of graph that can be interesting is showing
the frequency of PSNR instead of just min, max, mean and sd. Isolate
the PSNR (for example) with x <-
tapply(t800$PSNR,t800$Test,'+') and trace an histogram for
the set 08a (for example) with hist(x[["08a"]],nclass=100,prob=TRUE).
New evaluation
Results presentation

Now that I have a unified way to compute the average PSNR, it is easier to include the XVID set (ID 14) and compare it to the rest. Unfortunately, MEncoder doesn't seam to compute the PSNR like ImageMagick does. Either that or I don't compute the average value like MEncoder does. Anyway the results are similar, the ranking is nearly the same.
| set | Min. | Mean | Max. | SD |
|---|---|---|---|---|
| 01a | 25.04 | 33.73 | 40.76 | 2.551 |
| 02a | 26.48 | 34.19 | 40.70 | 2.024 |
| 03a | 26.22 | 34.47 | 40.71 | 1.969 |
| 04a | 26.54 | 34.36 | 41.68 | 1.967 |
| 05a | 25.54 | 33.45 | 40.05 | 2.567 |
| 06a | 26.30 | 34.65 | 41.68 | 1.927 |
| 07a | 27.24 | 34.30 | 42.99 | 1.826 |
| 08a | 26.08 | 34.15 | 41.95 | 1.971 |
| 09a | 25.98 | 34.48 | 41.40 | 1.986 |
| 10a | 28.30 | 34.39 | 42.84 | 1.926 |
| 11a | 26.50 | 34.34 | 40.70 | 1.993 |
| 12a | 26.48 | 34.27 | 40.70 | 2.005 |
| 13a | 24.69 | 33.87 | 41.30 | 2.441 |
| 14a | 28.71 | 34.01 | 41.85 | 1.426 |
| 01b | 27.86 | 36.07 | 41.10 | 1.801 |
| 02b | 29.88 | 36.59 | 42.19 | 1.480 |
| 03b | 29.75 | 36.80 | 41.71 | 1.426 |
| 04b | 30.21 | 36.77 | 42.22 | 1.418 |
| 05b | 29.19 | 35.79 | 40.99 | 2.083 |
| 06b | 29.86 | 36.95 | 42.30 | 1.404 |
| 07b | 30.32 | 36.69 | 42.21 | 1.425 |
| 08b | 30.82 | 36.71 | 42.21 | 1.359 |
| 09b | 29.62 | 36.77 | 42.36 | 1.469 |
| 10b | 31.18 | 36.64 | 42.36 | 1.373 |
| 11b | 29.97 | 36.70 | 42.42 | 1.467 |
| 12b | 29.72 | 36.66 | 42.11 | 1.483 |
| 13b | 28.54 | 36.34 | 42.29 | 1.676 |
| 14b | 30.91 | 36.51 | 42.20 | 1.261 |
Note that, on the graph, the sets have been sorted according to the average PSNR results from the 800kbit run.

The same process is used to produce MAE statistics, and the sets are kept in the PSNR order as before.
| set | Min. | Mean | Max. | SD |
|---|---|---|---|---|
| 01a | 430.5 | 940.4 | 2491.0 | 273.4321 |
| 02a | 467.8 | 894.7 | 2123.0 | 195.5090 |
| 03a | 439.4 | 872.5 | 2175.0 | 187.4920 |
| 04a | 410.7 | 878.3 | 2110.0 | 186.3319 |
| 05a | 456.3 | 973.3 | 2302.0 | 279.3477 |
| 06a | 409.4 | 852.2 | 2183.0 | 178.0003 |
| 07a | 330.8 | 881.7 | 1961.0 | 173.6005 |
| 08a | 388.7 | 873.0 | 2103.0 | 181.1457 |
| 09a | 420.0 | 866.5 | 2258.0 | 187.5298 |
| 10a | 342.7 | 856.2 | 1597.0 | 177.3255 |
| 11a | 465.0 | 884.3 | 2119.0 | 192.0117 |
| 12a | 468.8 | 887.5 | 2123.0 | 192.3748 |
| 13a | 393.6 | 929.1 | 2540.0 | 254.2437 |
| 14a | 399.7 | 899.1 | 1645.0 | 144.1259 |
| 01b | 437.6 | 723.9 | 1842.0 | 146.45086 |
| 02b | 388.9 | 684.9 | 1432.0 | 109.52070 |
| 03b | 412.4 | 672.2 | 1458.0 | 103.14029 |
| 04b | 387.5 | 671.6 | 1392.0 | 102.03105 |
| 05b | 440.5 | 755.5 | 1639.0 | 182.45569 |
| 06b | 382.5 | 659.2 | 1441.0 | 99.10351 |
| 07b | 388.0 | 677.3 | 1366.0 | 103.85231 |
| 08b | 387.4 | 666.5 | 1278.0 | 95.47843 |
| 09b | 379.5 | 671.6 | 1478.0 | 106.53219 |
| 10b | 378.2 | 670.2 | 1202.0 | 96.41297 |
| 11b | 377.0 | 678.7 | 1423.0 | 107.34539 |
| 12b | 392.0 | 679.9 | 1464.0 | 109.13061 |
| 13b | 379.6 | 705.7 | 1704.0 | 132.51930 |
| 14b | 388.3 | 685.2 | 1278.0 | 92.81201 |
Note that with MAE, smaller is better.

The same process is used to produce PSE statistics, where smaller is also better.
| set | Min. | Mean | Max. | SD |
|---|---|---|---|---|
| 01a | 5140 | 20270 | 46000 | 6315.201 |
| 02a | 4626 | 18100 | 53710 | 5509.423 |
| 03a | 4369 | 17800 | 50630 | 5397.179 |
| 04a | 4626 | 17740 | 49860 | 5385.221 |
| 05a | 4883 | 20360 | 48320 | 6366.190 |
| 06a | 4369 | 16820 | 49860 | 5072.728 |
| 07a | 4626 | 17530 | 50890 | 4996.278 |
| 08a | 5397 | 20550 | 50630 | 6977.532 |
| 09a | 4369 | 17150 | 50120 | 5136.363 |
| 10a | 5140 | 18890 | 51400 | 5540.230 |
| 11a | 4369 | 17500 | 53200 | 5321.470 |
| 12a | 4112 | 17860 | 51400 | 5463.287 |
| 13a | 5911 | 19270 | 47290 | 6486.815 |
| 14a | 4883 | 17990 | 36750 | 4228.322 |
| 01b | 5140 | 14710 | 31870 | 4038.446 |
| 02b | 4369 | 13090 | 36490 | 3339.131 |
| 03b | 3855 | 12890 | 34180 | 3261.226 |
| 04b | 4112 | 12830 | 34180 | 3237.822 |
| 05b | 4883 | 15050 | 34180 | 4354.195 |
| 06b | 4112 | 12370 | 34180 | 3106.595 |
| 07b | 4369 | 12790 | 36240 | 3183.759 |
| 08b | 4626 | 13720 | 34700 | 3537.770 |
| 09b | 4369 | 12630 | 35470 | 3195.715 |
| 10b | 4883 | 13860 | 34180 | 3636.962 |
| 11b | 4369 | 12790 | 34950 | 3281.364 |
| 12b | 4112 | 12940 | 34950 | 3311.374 |
| 13b | 4883 | 13550 | 32130 | 3649.528 |
| 14b | 3598 | 13220 | 32900 | 3243.092 |
While providing interesting details, the candle sticks make it difficult to compare the different quality metrics. I found the quality/cost diagram more practical for that:



Results analysis
PSE
Let's start with a major simplification: PSE is giving a really bad index to 08a compared to 13a or 08b compared to 07b, for example. And that is so obviously wrong when you check the actual sequence that I will not consider PSE any longer.
PSNR
Here is a comparison of the rankings obtained with MEncoder's PSNR and ImageMagick's PSNR...
- MEncoder PSNR: 05a, 01a, 13a, 08a, 02a, 12a, 11a, 07a, 10a, 04a, 03a, 09a, 06a, 05b, 01b, 13b, 02b, 12b, 10b, 11b, 07b, 08b, 09b, 03b, 04b, 06b.
- ImageMagick PSNR: 05a, 01a, 13a, 14a, 08a, 02a, 12a, 07a, 11a, 04a, 10a, 03a, 09a, 06a, 05b, 01b, 13b, 14b, 02b, 10b, 12b, 07b, 11b, 08b, 04b, 09b, 03b, 06b.
07a and 11a, like 07b and 11b, 04a and 10a, or 10b and 12b, are permuted but were very close. 04b, on the other hand, was given a lower mark by ImageMagick than MEncoder. When comparing 04b and 09b I find that 04b seems to have less artifacts while 09b looks sharper, it is very close. When comparing 04b and 03b instead, I still find that 04b seems to look better than 03b, it is even closer. This leads to the conclusion that the PSNR from MEncoder is more relevant and that PSNR differences of 0.05dB are not decisive.
MAE
Here is a comparison of the rankings obtained with ImageMagick's PSNR and MAE...
- PSNR: 05a, 01a, 13a, 14a, 08a, 02a, 12a, 07a, 11a, 04a, 10a, 03a, 09a, 06a, 05b, 01b, 13b, 14b, 02b, 10b, 12b, 07b, 11b, 08b, 04b, 09b, 03b, 06b.
- MAE: 05a, 01a, 13a, 14a, 02a, 12a, 11a, 07a, 04a, 08a, 03a, 09a, 10a, 06a, 05b, 01b, 13b, 14b, 02b, 12b, 11b, 07b, 03b, 04b, 09b, 10b, 08b, 06b.
07a and 11a, like 07b and 11b, are permuted but were very close. 08a, 08b, 10a and 10b, were given better marks by MAE than by PSNR. When comparing 08a and 02a or 07a, I would say that MAE is more fair than PSNR. Comparing 10b and 07b also leads to the same conclusion. 03b was placed behind 04b and 09b by MAE. When comparing 09b and 03b, MAE is again more fair.
XVID
The results from the XVID test have been ranked quite low. 07, for example, is ranked higher than 14 for a comparable cost. 14 appears to have a very small standard deviation, however. XVID may provide a more homogeneous quality at the expense of average quality. But it seems that a similar effect can be obtained in lavc with 3-pass encoding. In general, it looks like lavc allows you to reach higher quality than XVID... given enough time if needed.
qpel
To evaluate the effect of qpel is easily done with 02 and 04, since 04 is like 02 with qpel added up. It gives a huge quality boost according to PSNR and MAE, and a visual evaluation reveals that 04 is much sharper. 04a is not as well ranked as 04b, the documentation explains this by the fact that qpel requires more bits and may hurt at low bitrates. Looking at the results of 03, it seems that qpel helps a lot just by itself.
09 and 06 are also very similar, with only qpel and predia to differ. They show that qpel can still give a quality boost even with a set of parameters that already yields high quality.
Extended tests
vqscale
13 is the only set using a given bitrate instead of vqscale during pass 1 out of 2. It was meant to be used in a very high quality encode (at a much higher bitrate than 1400kbit) and the result is a disappointment. Let's see what difference vqscale can make here and derive a set called 13v from 13 by adding vqscale=2 in pass 1:
| set | User time | PSNR | Imagemagick PSNR | Imagemagick MAE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 13a | 23m27s | 38.39 | 24.69 | 33.87 | 41.30 | 2.441 | 393.6 | 929.1 | 2540.0 | 254.24 |
| 13va | 26m20s | 38.40 | 24.92 | 33.92 | 41.68 | 2.480 | 399.7 | 899.1 | 1645.0 | 259.35 |
| 13b | 25m15s | 42.20 | 28.54 | 36.34 | 42.29 | 1.676 | 379.6 | 705.7 | 1704.0 | 132.52 |
| 13vb | 27m03s | 42.11 | 28.52 | 36.29 | 42.11 | 1.722 | 389.9 | 709.5 | 1706.0 | 137.21 |
It looks like vqscale is not doing any good, except at low bit rates. There is only a significant difference between 13a and 13va where vqscale seems to help, but after a visual check I would say I prefer 13a. Now let's double check this on 09 and derive a set called 09v by removing vqscale from 09:
| set | User time | PSNR | Imagemagick PSNR | Imagemagick MAE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 09a | 32m26s | 39.48 | 25.98 | 34.48 | 41.40 | 1.986 | 420.0 | 866.5 | 2258.0 | 187.53 |
| 09va | 25m41s | 39.50 | 26.00 | 34.48 | 40.91 | 1.975 | 460.0 | 866.6 | 2226.0 | 186.09 |
| 09b | 33m21s | 42.90 | 29.62 | 36.77 | 42.36 | 1.469 | 379.5 | 671.6 | 1478.0 | 106.53 |
| 09vb | 27m36s | 42.90 | 29.86 | 36.77 | 42.24 | 1.452 | 387.0 | 671.4 | 1436.0 | 105.61 |
09v costs less for an equal quality with a smaller standard deviation. After a visual comparison of 09vb and 09b I decide to avoid using vqscale.
cbp, mv0
According to an earlier investigation, cbp and mv0 should yield better results but 09 is not using them. Let's derive 09c by adding cbp and mv0 to 09:
| set | User time | PSNR | Imagemagick PSNR | Imagemagick MAE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 09a | 32m26s | 39.48 | 25.98 | 34.48 | 41.40 | 1.986 | 420.0 | 866.5 | 2258.0 | 187.53 |
| 09ca | 34m39s | 39.49 | 25.98 | 34.46 | 41.69 | 1.987 | 408.7 | 869.9 | 2258.0 | 188.1 |
| 09b | 33m21s | 42.90 | 29.62 | 36.77 | 42.36 | 1.469 | 379.5 | 671.6 | 1478.0 | 106.53 |
| 09cb | 35m46s | 42.88 | 29.62 | 36.74 | 42.29 | 1.454 | 384.0 | 674.4 | 1476.0 | 106.0 |
The stats are not helping here. After a visual evaluation, 09c appears to be sharper. This must come at some cost: pixel-size "ringing" artifacts, but I decide to keep cbp and mv0 for the sharpness.
vqdiff
The documentation suggests to use 2 instead of 3 for vqdiff. Let's derive 09f by setting vqdiff to 2 in 09:
| set | User time | PSNR | Imagemagick PSNR | Imagemagick MAE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 09a | 32m26s | 39.48 | 25.98 | 34.48 | 41.40 | 1.986 | 420.0 | 866.5 | 2258.0 | 187.53 |
| 09fa | 32m35s | 39.48 | 25.98 | 34.47 | 41.57 | 1.981 | 413.6 | 866.9 | 2258.0 | 187.1 |
| 09b | 33m21s | 42.90 | 29.62 | 36.77 | 42.36 | 1.469 | 379.5 | 671.6 | 1478.0 | 106.53 |
| 09fb | 33m34s | 42.90 | 29.62 | 36.77 | 41.98 | 1.467 | 396.0 | 671.7 | 1478.0 | 106.5 |
The stats are not helping here either. And like earlier a visual evaluation reveals a sharper picture. This time the artifacts are more important at low bitrate but high bitrate looks ok.
vqcomp
The documentation suggests to test vqcomp in the range [0.5,0.7]. Let's derive 09p6 and 09p7 by setting respectively vqcomp to 0.6 and 0.7 in 09:
| set | User time | PSNR | Imagemagick PSNR | Imagemagick MAE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 09a | 32m26s | 39.48 | 25.98 | 34.48 | 41.40 | 1.986 | 420.0 | 866.5 | 2258.0 | 187.53 |
| 09p6a | 32m34s | 39.50 | 26.74 | 34.42 | 40.71 | 1.842 | 423.1 | 870.7 | 2083.0 | 174.4 |
| 09p7a | 32m34s | 39.49 | 27.65 | 34.34 | 40.89 | 1.689 | 476.5 | 876.5 | 1864.0 | 161.9 |
| 09b | 33m21s | 42.90 | 29.62 | 36.77 | 42.36 | 1.469 | 379.5 | 671.6 | 1478.0 | 106.53 |
| 09p6b | 33m31s | 42.89 | 30.27 | 36.74 | 42.24 | 1.404 | 387.2 | 673.4 | 1384.0 | 101.7 |
| 09p7b | 33m27s | 42.86 | 30.80 | 36.70 | 42.27 | 1.349 | 385.1 | 676.1 | 1293.0 | 98.03 |
Higher vqcomp is reducing standard deviation but it is hard to say more without visual evaluation... As before, it increases sharpness and artifacts but 0.6 seems to be a good balance.
vqmax
A few examples are setting vqmax to 20, the documentation even suggests 6. Let's derive 09x2 and 09x6 by setting respectively vqmax to 20 and 6 in 09:
| set | User time | PSNR | Imagemagick PSNR | Imagemagick MAE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 09a | 32m26s | 39.48 | 25.98 | 34.48 | 41.40 | 1.986 | 420.0 | 866.5 | 2258.0 | 187.53 |
| 09x2a | 32m34s | 39.48 | 25.98 | 34.48 | 41.40 | 1.986 | 420.0 | 866.5 | 2258.0 | 187.5 |
| 09x6a | 34m26 | 39.28 | 26.06 | 34.32 | 41.66 | 1.967 | 409.4 | 879.6 | 2195.0 | 186.4 |
| 09b | 33m21s | 42.90 | 29.62 | 36.77 | 42.36 | 1.469 | 379.5 | 671.6 | 1478.0 | 106.53 |
| 09x2b | 33m33s | 42.90 | 29.62 | 36.77 | 42.36 | 1.469 | 379.5 | 671.6 | 1478.0 | 106.5 |
| 09x6b | 33m35s | 42.90 | 29.69 | 36.77 | 42.14 | 1.464 | 391.5 | 671.9 | 1478.0 | 106.4 |
vqmax=20 made no difference at all (produced an identical .avi). vqmax=6 seems to reduce PSNR or MAE at low bitrate but, when comparing visually, it shows much less ringing artifacts as well as a more blur picture. At higher bitrate, on the other hand, I wouldn't use this parameter. vqmax=20 is probably a reasonable/harmless limit to use.
precmp
Since cmp is set to 2 for 09, let's try precmp=2 and call it 09pre:
| set | User time | PSNR | Imagemagick PSNR | Imagemagick MAE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 09a | 32m26s | 39.48 | 25.98 | 34.48 | 41.40 | 1.986 | 420.0 | 866.5 | 2258.0 | 187.53 |
| 09prea | 33m8s | 39.53 | 25.99 | 34.51 | 41.32 | 1.985 | 422.5 | 863.2 | 2256.0 | 186.9 |
| 09b | 33m21s | 42.90 | 29.62 | 36.77 | 42.36 | 1.469 | 379.5 | 671.6 | 1478.0 | 106.53 |
| 09preb | 34m10s | 42.93 | 29.63 | 36.79 | 42.08 | 1.466 | 393.8 | 670.3 | 1475.0 | 106.3 |
That was a cheap and tangible improvement.
dia
Let's try 1 instead of -1 for 09 and call it 09dia:
| set | User time | PSNR | Imagemagick PSNR | Imagemagick MAE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 09a | 32m26s | 39.48 | 25.98 | 34.48 | 41.40 | 1.986 | 420.0 | 866.5 | 2258.0 | 187.53 |
| 09diaa | 26m42s | 39.33 | 26.50 | 34.36 | 40.70 | 1.981 | 429.3 | 878.6 | 2118.0 | 188.0 |
| 09b | 33m21s | 42.90 | 29.62 | 36.77 | 42.36 | 1.469 | 379.5 | 671.6 | 1478.0 | 106.53 |
| 09diab | 27m45s | 42.81 | 29.74 | 36.71 | 42.24 | 1.469 | 386.0 | 675.9 | 1460.0 | 107.1 |
Oops.
B frames
According to the investigation mentioned earlier, 1 B frame is good but none can be better on anime. Let's derive 09xb by removing B frames from 09:
| set | User time | PSNR | Imagemagick PSNR | Imagemagick MAE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 09a | 32m26s | 39.48 | 25.98 | 34.48 | 41.40 | 1.986 | 420.0 | 866.5 | 2258.0 | 187.53 |
| 09xba | 22m32s | 38.67 | 25.10 | 34.12 | 40.87 | 2.467 | 434.0 | 905.6 | 2483.0 | 253.9 |
| 09b | 33m21s | 42.90 | 29.62 | 36.77 | 42.36 | 1.469 | 379.5 | 671.6 | 1478.0 | 106.53 |
| 09xbb | 23m27s | 42.28 | 28.71 | 36.41 | 41.97 | 1.707 | 397.0 | 700.1 | 1673.0 | 134.1 |
B frames are good.
vlelim, vcelim
I read once that vlelim=-4 and vcelim=7 was recommended by the Joint Video Team. Let's test this by deriving 09m:
| set | User time | PSNR | Imagemagick PSNR | Imagemagick MAE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 09a | 32m26s | 39.48 | 25.98 | 34.48 | 41.40 | 1.986 | 420.0 | 866.5 | 2258.0 | 187.53 |
| 09ma | 32m55s | 39.41 | 26.23 | 34.38 | 41.64 | 2.016 | 410.9 | 875.4 | 2189.0 | 192.7 |
| 09b | 33m21s | 42.90 | 29.62 | 36.77 | 42.36 | 1.469 | 379.5 | 671.6 | 1478.0 | 106.53 |
| 09mb | 33m57s | 42.84 | 29.56 | 36.70 | 41.99 | 1.477 | 398.1 | 676.7 | 1483.0 | 108.6 |
Not a good move.
New selection
Putting all this together would be like vmax_b_frames=1:mbd=2:v4mv:trell:precmp=2:cmp=2:subcmp=2:dia=-1:predia=1:last_pred=2:preme=2:vqmax=20:vqcomp=0.6:cbp:mv0. Let's give it ID 15. Since I am not sure about vqcomp=0.6, I also run another test without it: 15p...
| set | User time | PSNR | Imagemagick PSNR | Imagemagick MAE | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 09a | 32m26s | 39.48 | 25.98 | 34.48 | 41.40 | 1.986 | 420.0 | 866.5 | 2258.0 | 187.53 |
| 15a | 27m52s | 39.57 | 26.83 | 34.43 | 42.35 | 1.828 | 378.7 | 871.6 | 2038.0 | 174.7 |
| 15pa | 28m13s | 39.56 | 26.07 | 34.50 | 42.04 | 1.959 | 393.0 | 866.0 | 2212.0 | 185.3 |
| 09b | 33m21s | 42.90 | 29.62 | 36.77 | 42.36 | 1.469 | 379.5 | 671.6 | 1478.0 | 106.53 |
| 15b | 30m00s | 42.91 | 30.39 | 36.73 | 42.23 | 1.389 | 387.2 | 674.7 | 1376.0 | 101.7 |
| 15pb | 30m01s | 42.93 | 30.06 | 36.77 | 42.28 | 1.432 | 384.1 | 672.2 | 1429.0 | 104.2 |
This set of parameter gives good quality in a very reasonable time. vqcomp=0.6 is good to use at low bitrate but in general it is better without. So the following tests based on 15 will be using vqcomp=0.6 at 800kbit but not at 1400kbit.
Computing PSNR and MAE
statistics with Imagemagick is taking most of the test processing time,
so better use should be made of the psnr*.log file. This file even
contains the quantitizer
of each frame and it can be interesting to examine its frequency. In
order to import it into R, it would be enough to give it a header and
strip it from the commas with this command: sed -e 's/,/
/g' psnr_235453.log.
last_pred
According to documentation, last_pred should be chosen between 1 and 3. Let's check values from 0 to 4 by deriving 15 into 15L0, ..., 15L4:
| set | User time | PSNR | Log PSNR | Imagemagick MAE | Log quantitizer | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 15L0a | 26m38s | 39.51 | 32.05 | 39.99 | 49.95 | 2.051 | 458.3 | 873.5 | 2047.0 | 176.1 | 2 | 6.829 | 16 | 2.015 |
| 15L1a | 26m41s | 39.53 | 32.11 | 40.00 | 51.75 | 2.044 | 393.4 | 872.8 | 2037.0 | 175.2 | 2 | 6.803 | 16 | 2.008 |
| 15a | 27m52s | 39.57 | 378.7 | 871.6 | 2038.0 | 174.7 | ||||||||
| 15L3a | 29m07s | 39.60 | 32.13 | 40.07 | 49.97 | 2.030 | 469.9 | 867.2 | 2034.0 | 174.7 | 2 | 6.721 | 15 | 1.982 |
| 15L4a | 30m55s | 39.62 | 32.15 | 40.09 | 49.57 | 2.014 | 469.4 | 865.3 | 2031.0 | 173.2 | 2 | 6.703 | 15 | 1.974 |
| 15L0b | 28m34s | 42.88 | 36.00 | 43.22 | 52.12 | 1.659 | 388.6 | 674.4 | 1435.0 | 105.9 | 2 | 4.158 | 9 | 1.336 |
| 15L1b | 28m59s | 42.89 | 36.00 | 43.22 | 51.72 | 1.656 | 383.5 | 674.0 | 1435.0 | 105.6 | 2 | 4.152 | 9 | 1.333 |
| 15pb | 30m01s | 42.93 | 384.1 | 672.2 | 1429.0 | 104.2 | ||||||||
| 15L3b | 31m21s | 42.94 | 36.27 | 43.26 | 51.45 | 1.640 | 395.7 | 671.7 | 1407.0 | 104.0 | 2 | 4.122 | 9 | 1.328 |
| 15L4b | 33m07s | 42.96 | 36.47 | 43.28 | 51.65 | 1.637 | 396.2 | 670.9 | 1375.0 | 103.4 | 2 | 4.112 | 8 | 1.324 |
last_pred=4 is making a very small difference but is not as expensive as the doc said. 3 would be enough but 4 is safe.
preme
According to documentation, preme has to be chosen between 0 and 2. Let's check the other values by deriving 15 into 15m0 and 15m1:
| set | User time | PSNR | Log PSNR | Imagemagick MAE | Log quantitizer | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 15m0a | 26m35s | 39.49 | 32.02 | 39.98 | 49.98 | 2.080 | 459.0 | 874.3 | 2051.0 | 177.5 | 2 | 6.855 | 16 | 2.025 |
| 15m1a | 26m40s | 39.51 | 32.14 | 39.99 | 51.76 | 2.056 | 395.2 | 873.5 | 2033.0 | 176.1 | 2 | 6.844 | 15 | 2.022 |
| 15a | 27m52s | 39.57 | 378.7 | 871.6 | 2038.0 | 174.7 | ||||||||
| 15m0b | 28m52s | 42.87 | 35.99 | 43.21 | 52.10 | 1.664 | 389.3 | 674.8 | 1436.0 | 106.1 | 2 | 4.172 | 9 | 1.340 |
| 15m1b | 28m54s | 42.87 | 35.99 | 43.21 | 52.11 | 1.661 | 390.1 | 674.7 | 1436.0 | 106.1 | 2 | 4.167 | 9 | 1.339 |
| 15pb | 30m01s | 42.93 | 384.1 | 672.2 | 1429.0 | 104.2 | ||||||||
preme=2 is best.
vqmax again
Now that I started to look into statistics for the quantitizer, I'm curious to experiment again with vqmax. Since the maximum we reach in this example is 16 there is no doubt left that vqmax=20 has no effect. Let's check the quantitizer frequency of the 2 bests sets: 15L4a and 15L4b, with or without limiting it at 10 and 6 respectively:
| set | User time | PSNR | Log PSNR | Imagemagick MAE | Log quantitizer | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 15L4a | 30m55s | 39.62 | 32.15 | 40.09 | 49.57 | 2.014 | 469.4 | 865.3 | 2031.0 | 173.2 | 2 | 6.703 | 15 | 1.974 |
| 15L4qa | 31m03s | 39.63 | 32.54 | 40.10 | 49.79 | 2.016 | 469.4 | 865.1 | 1947.0 | 173.5 | 2 | 6.669 | 10 | 1.897 |
| 15L4b | 33m07s | 42.96 | 36.47 | 43.28 | 51.65 | 1.637 | 396.2 | 670.9 | 1375.0 | 103.4 | 2 | 4.112 | 8 | 1.324 |
| 15L4qb | 33m13s | 42.96 | 36.53 | 43.28 | 51.75 | 1.630 | 383.3 | 671.1 | 1370.0 | 103.3 | 2 | 4.104 | 6 | 1.300 |
Setting vqmax to avoid a few frames with higher quantitizer seems to have little impact on the rest of the movie. Unfortunately it is difficult to choose an appropriate value before encoding the movie. Let's assume that vqmax=12 is a more relevant value than 20.
qns
Now that I have turned everything to the max, let's experiment with qns. 15L4q will be used to derive 16x, 16y and 16z using qns equal 1, 2 and 3 respectively.
| set | User time | PSNR | Log PSNR | Imagemagick MAE | Log quantitizer | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | Min. | Mean | Max. | SD | |||
| 16xa | 82m43s | 39.52 | 31.9 | 40 | 50.2 | 2.068 | 470.1 | 848 | 2000 | 169.8 | 2 | 6.199 | 10 | 1.802 |
| 16ya | 96m14s | 39.51 | 31.99 | 40 | 50.01 | 2.088 | 467.3 | 849.3 | 1985 | 170.8 | 2 | 6.283 | 10 | 1.823 |
| 16za | 131m21s | 39.50 | 31.98 | 40 | 50.17 | 2.091 | 467.5 | 849.4 | 1983 | 171.2 | 2 | 6.297 | 10 | 1.828 |
| 15L4qa | 31m03s | 39.63 | 32.54 | 40.10 | 49.79 | 2.016 | 469.4 | 865.1 | 1947 | 173.5 | 2 | 6.669 | 10 | 1.897 |
| 16xb | 93m2s | 42.84 | 36.74 | 43.18 | 51.73 | 1.664 | 380.1 | 665.5 | 1341 | 101.1 | 2 | 3.903 | 6 | 1.272 |
| 16yb | 110m56s | 42.88 | 36.37 | 43.22 | 51.81 | 1.692 | 368.9 | 663.7 | 1364 | 102.1 | 2 | 3.952 | 6 | 1.277 |
| 16zb | 147m44s | 42.88 | 36.37 | 43.23 | 51.78 | 1.689 | 397 | 663.3 | 1365 | 102 | 2 | 3.959 | 6 | 1.277 |
| 15L4qb | 33m13s | 42.96 | 36.53 | 43.28 | 51.75 | 1.630 | 383.3 | 671.1 | 1370 | 103.3 | 2 | 4.104 | 6 | 1.300 |
It is an interesting case: PSNR says that qns is bad but MAE and average quantitizer say it is good. The cost is for sure very high but after a visual check I would say that qns can do almost the same magic as qpel.
Conclusion
In order to further improve the result, it would be time to systematically test variations of all *dia and *cmp. This would be very time consuming and I will rather focus on that when encoding actual DV. The aim for now is to gather very good (yet fast) sets of parameters that worked on different style of video and use that base for further experiments on DV.
The set of parameters that was finally selected for this type of video was... vmax_b_frames=1:mbd=2:v4mv:trell:precmp=2:cmp=2:subcmp=2:dia=-1:predia=1:last_pred=4:preme=2:cbp:mv0:vqcomp=0.6:vqmax=12.
You may add qns=1 if you have time!
Noise
lavc parameters are important to the encoding quality of course, but in some situation they need extra help to make the source easier to encode. Noise is an example of detail that can easily confuse the encoder and waste its efforts. It is recommended to filter out noise before encoding and add some later during playback instead. The drawback is that you will risk to filter out small details as well.
There must be some optimum compromise between the details that you filter out and the details that you save with a better encoding quality. Since I have no idea how to use the denoise filter, I could run a brute-force evaluation of all combinations in a given range. The problem is that MEncoder calculates the PSNR with the filtered original as a reference: the stronger the filter, the higher the PSNR. Imagemagick will allow me to calculate MAE using instead the unfiltered original as a reference. The material used for the test is similar to the one I evaluated before, with the difference that the source has lots of analogue film noise and artifacts.
In order to run the tests, I used a script. And in order to extract the statistics, I wrote a little C program that uses libmba to parse PSNR logs and uses the Imagemagick library to calculate MAE. The entire statistics extraction is run with a script.
High bitrate
At 1400 kbits, noise can be encoded correctly without much quality loss. Here we can find out if there is a good compromise with a very light filter.
Given time








Obviously, you shouldn't use luma=1. It looks like the best results are obtained for the smallest values of luma and chroma.
Given chroma





It seems that 0:0:4 will be best, even if there is a local minimum for a stronger filter.
Given luma






hqdn3d=0:0:4 is best!
Low bitrate
At 800 kbits, noise may need to be removed to prevent quality loss. Here we can find out if there is a possibility to compromise with a the lightest possible filter.
Given chroma






You should still not use luma=1.
Given luma






hqdn3d=0:0:5 is best in this case, almost even with hqdn3d=0:0:4.
Recommendation
Chroma should be kept as low as possible in this case, and Luma=1 should not be used. hqdn3d=0:0:4 gives best results, even 0:0:5 can be used to help encoding further. There is a local optimum at 2:1:3, which is worth trying too.
Unfortunately, after I ran the same test on other types of cartoons, I realized that finding an optimum for hqdn3d=0:0:4 was probably just an exception. In general you would probably just find an optimum for 0:0:0! I believe that the recommendations above stay valid anyway: even when hqdn3d=0:0:4 is not an optimum it has at least a very low impact. One last note: hqdn3d=0:0:6 seems to be best on de-interlaced sources.

