Inter-layer Prediction with Adoptive Motion Refinement in Scalable Video Coding


2006.03.27 ~ 2007.01.26


1. Introduction

 Currently, ISO/IEC MPEG and ITU-T VCEG are jointly making a scalable video codec (SVC) standard that is based on the H.264/AVC. The objective of SVC is to generate temporal, spatial, and SNR scalable coded stream, which provides users with QoS (Quality of Service) guaranteed streaming independent of video consuming devices in heterogeneous network environments. Layered codec structure, Hierarchical B picture prediction, and FGS (Fine Granular Scale) coding scheme are employed for the spatial, temporal, and SNR scalability.

However, SVC sacrifices coding efficiency to enable the scalability. One of the reasons for lowering coding efficiency is the layered coding structure for the spatial scalability. The layered coding approach adopted in SVC is similar to MPEG-2, H.263 and MPEG-4 visual. The video with lowest resolution is encoded in the base layer, and the video with a higher resolution is encoded in the next layer, which results in inter-layer redundancy. In order to reduce the redundancy due to the layered structure, inter-layer prediction for motion and texture/residual are used in SVC. The motion, texture and residual of lower resolution are upsampled, and used as the predictions. By using these inter-layer predictions, the coding efficiency of SVC is increased. But encoding with multi-layer configuration still shows less coding efficiency than that with single layer configuration, which means that there still exists inter-layer redundancy even with current inter-layer coding scheme in SVC.

Recently, adoptive motion refinement that performs motion estimation/ compensation in the FGS layer has been proposed to enhance the performance of FGS codec, and it is adopted as an optional encoding scheme in SVC. If adoptive motion refinement is enabled in FGS, motion based inter-frame coding is performed with newly estimated motion data in the FGS codec. With the adoptive motion refinement, motion is estimated according to the bitrate of its FGS layer, and the motion information could be a new candidate for the inter-layer motion prediction in addition to the motion information in the base quality layer.

In our research, we propose improved inter-layer motion prediction with adoptive motion refinement.

2. Inter-layer Prediction with Adoptive Motion Refinement in Scalable Video Coding

 Figure 1 shows the structure of SVC with the proposed inter-layer motion prediction when two spatial resolution video is encoded and SNR scalability is supported by FGS layer in the base layer. Inter-layer motion prediction is performed in the shaded block in Figure 1. The bold lines in Figure 1 indicate new information inputted to the inter-layer motion prediction for the proposed method. With the adoptive motion refinement, each FGS layer has its own motion information, thus the motion information in FGS layers is newly inputted to the inter-layer prediction. Because it is necessary to select a SNR layer for inter-layer motion prediction, quality level is also needed in the inter-layer motion prediction.

The shaded block in Figure 1 is depicted in detail in Figure 2, where the proposed inter-layer motion prediction with refined motion of FGS layer is performed. The motion information of each SNR layer is inputted into inter-layer prediction module. The motion information includes motion vector, reference index, and macroblock partition, and that of the layer selected by quality level is used for the inter-layer prediction. The processing of each element of motion information is as follows. The macroblock partition of the selected layer is scaled according to the resolution ratio between the base and the enhancement layers, and the reference indexes of selected layer are directly used in the enhancement layer. The motion vectors of the selected layer are upscaled according to the resolution ratio between the enhancement and the base layer. Then, in the motion coding module in Figure 1, the quarter-sample motion vector refinement is searched with the predicted motion vector. The cost of the inter-layer motion prediction is compared with those of the other block modes, and the mode that has minimum cost is used for macroblock encoding.

Inter-layer motion prediction is also performed in the decoder, and as shown in Figure 2, no additional computational processing is required in the decoder except selecting motion information among base and FGS layers in the lower resolution layer.



                        Fig. 1.      The structure of the SVC codec with the proposed inter-layer motion prediction



 Fig. 2. The proposed inter-layer motion prediction

3. Experimental Results

 The proposed method was implemented in JSVM (Joint Scalable Video Model) 6 that is software model of SVC. The following SVC test sequences are used for the experiment: Bus, Soccer, Crew, and City. We measured the bitrate savings of the bitstreams provided by our approach with respect to the bitstreams provided by JSVM 6. In the experiment, A two layer configuration—{QCIF, 15 fps (frame per second)}, {CIF, 30fps}—is used for encoding the sequences, and QCIF layer has three FGS layers with enabling adoptive motion refinement. The GOP (Group of Picture) size is set to 16, and quality level that is used for inter-layer prediction in CIF layer is set to the highest FGS layer of QCIF.


Fig. 3. Average bitrate reduction of CIF layer according to the QPs in CIF and QCIF

Fig. 4. Average bitrate reduction of the CIF layer according to the number of FGS layer







Fig. 5. Rate-Distortion curves of JSVM 6 and proposed method: (a) Crew, (b) Soccer, (c) Bus, and (d) City sequences



Inter-layer Prediction with Adaptive Motion Refinement in Scalable Video Coding ( submitted), ETRI journal

Improvement of Inter-layer Motion Prediction in Scalable Video Coding, IEICE Letter(submitted)

D.S. Jun, J.W. Kang, H. Choi, J.-G. Kim, T.M. Bae, Y.M. Ro, ISO/IEC JTC1/SC29/WG11, Dynamic spatial layer switching support at the point of IDR picture, JVT-R060, 2006.