1、什么是ROI编码
大家可能经常听到ROI这个词,特别是在FFmpeg微信群(@大师兄)。但是通常说的ROI是return on investment,而视频编码ROI是region of interest。简而言之,ROI编码是提高特定区域的视频编码质量。这个特定区域,99%的案例中是人脸区域。
2、ROI编码的实现
实现ROI编码总共需要两步:
- 找出特定区域
- 提高特定区域编码质量
第一步基本上由AI包办,第二步交给编码器完成。
编码器调整特定区域编码质量的基本原理是调整量化参数qp(Quantization Parameter)。简单来说,qp越大,量化误差越大,编码质量越差;反之,qp越小,量化误差越小,编码质量越高。想增加某个区域的编码质量,即减小某个区域的qp。
可以看下x264 ROI编码相关的API:
typedef struct x264_image_properties_t
{
/* All arrays of data here are ordered as follows:
* each array contains one offset per macroblock, in raster scan order. In interlaced
* mode, top-field MBs and bottom-field MBs are interleaved at the row level.
* Macroblocks are 16x16 blocks of pixels (with respect to the luma plane). For the
* purposes of calculating the number of macroblocks, width and height are rounded up to
* the nearest 16. If in interlaced mode, height is rounded up to the nearest 32 instead. */
/* In: an array of quantizer offsets to be applied to this image during encoding.
* These are added on top of the decisions made by x264.
* Offsets can be fractional; they are added before QPs are rounded to integer.
* Adaptive quantization must be enabled to use this feature. Behavior if quant
* offsets differ between encoding passes is undefined. */
float *quant_offsets;
/* In: optional callback to free quant_offsets when used.
* Useful if one wants to use a different quant_offset array for each frame. */
void (*quant_offsets_free)( void* );
quant_offsets是一个表格,长度等于视频宏块的个数。通过传给编码器qp的offset值,达到调整对应宏块编码质量的目标,即ROI编码。
其他编码器ROI编码接口与x264差不多。FFmpeg libavcodec编码器封装也提供了ROI的支持,关键数据结构是作为AVFrame sidedata的AVRegionOfInterest:
/**
* Structure describing a single Region Of Interest.
*
* When multiple regions are defined in a single side-data block, they
* should be ordered from most to least important - some encoders are only
* capable of supporting a limited number of distinct regions, so will have
* to truncate the list.
*
* When overlapping regions are defined, the first region containing a given
* area of the frame applies.
*/
typedef struct AVRegionOfInterest {
/**
* Must be set to the size of this data structure (that is,
* sizeof(AVRegionOfInterest)).
*/
uint32_t self_size;
/**
* Distance in pixels from the top edge of the frame to the top and
* bottom edges and from the left edge of the frame to the left and
* right edges of the rectangle defining this region of interest.
*
* The constraints on a region are encoder dependent, so the region
* actually affected may be slightly larger for alignment or other
* reasons.
*/
int top;
int bottom;
int left;
int right;
/**
* Quantisation offset.
*
* Must be in the range -1 to +1. A value of zero indicates no quality
* change. A negative value asks for better quality (less quantisation),
* while a positive value asks for worse quality (greater quantisation).
*
* The range is calibrated so that the extreme values indicate the
* largest possible offset - if the rest of the frame is encoded with the
* worst possible quality, an offset of -1 indicates that this region
* should be encoded with the best possible quality anyway. Intermediate
* values are then interpolated in some codec-dependent way.
*
* For example, in 10-bit H.264 the quantisation parameter varies between
* -12 and 51. A typical qoffset value of -1/10 therefore indicates that
* this region should be encoded with a QP around one-tenth of the full
* range better than the rest of the frame. So, if most of the frame
* were to be encoded with a QP of around 30, this region would get a QP
* of around 24 (an offset of approximately -1/10 * (51 - -12) = -6.3).
* An extreme value of -1 would indicate that this region should be
* encoded with the best possible quality regardless of the treatment of
* the rest of the frame - that is, should be encoded at a QP of -12.
*/
AVRational qoffset;
} AVRegionOfInterest;
可以查看一下有哪些编码器实现了ROI:
grep 'AVRegionOfInterest' -rl libavcodec/
libavcodec/libx264.c
libavcodec/libvpxenc.c
libavcodec/libx265.c
libavcodec/vaapi_encode.c
libavcodec/qsvenc.c
3、测试ROI编码
FFmpeg命令行可以用来测试ROI编码。libavfilter里有AI的支持,可以实现人脸识别,刚跑了一下,结果openvino崩溃了……libavfilter有个addroi的filter,手动指定一个区域,用来做测试更简单。
正向优化不一定能看出明显效果,负优化更容易看出效果,例如:
./ffmpeg -i input.mp4 -an -c:v libx264 -vf addroi=x=350:y=270:w=300:h=400:qoffset=0.9 -frames 1 output.mp4
这里针对[x, y, x + w, y + h]区域做了负优化,设置qoffset = 0.9,基本等于抹除了内容。
正常业务使用ROI编码,一般只会小幅调整qp。
4、分析ROI编码
前面说了,正向优化小幅调整qp,效果如何需要仔细对比分析。除了直接看主观效果,我们还想要分析下qp的变化。看qp变化,可以用专业的编码分析工具来做。没专业工具的情况下,我们也可以借助FFmpeg来分析。
看一个正向优化的例子,把前面的命令改改成qoffset=-0.5(仅做示例)
./ffmpeg -i input.mp4 -an -c:v libx264 -vf addroi=x=350:y=270:w=300:h=400:qoffset=-0.5 -frames 3 out.mp4
打印出qp来:
./ffmpeg -threads 1 -debug qp -i out.mp4 -f null -
可以看到,ROI区域的qp非常小。再放大看下:
注意,FFmpeg debug qp当前只支持H.264和mpeg,不支持H.265,TODO。并且打印时带上坐标是我最近加的功能,建议用FFmpeg每日构建来测试。
FFmpeg libavfilter中的codecview支持用chroma plane来显示qp,只支持mpeg编码。我试着加了H.264的支持,但是因为显示效果太差,还不如直接看打印qp的值,放弃了。
可能通过drawtext的功能,在图像上叠加打印qp的值,效果更好,TODO。
作者:quink
来源:Fun With FFmpeg
原文:https://mp.weixin.qq.com/s/BYNgg00Z6E7criOF87Yywg
版权声明:本文内容转自互联网,本文观点仅代表作者本人。本站仅提供信息存储空间服务,所有权归原作者所有。如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至1393616908@qq.com 举报,一经查实,本站将立刻删除。