Multiple ROI support in Scalable Video Coding
2005.04~current
This project aims at proposing a new functionality to Scalable Video Coding (SVC), which is the support of multiple ROIs for heterogeneous display resolution. Scalable video coding is targeted by giving temporal, spatial, and quality scalability for the encoded bit stream. The region of interest (ROI) is an area that is semantically important to a particular user, especially users with heterogeneous display resolutions. The bitstream containing the ROIs could be extracted without any transcoding operations, which may be one of the ways to satisfy QoS.
Introduction
Region of Interest (ROI) could be considered as one of the semantic scalability in spatial dimension. Fig 1. shows ROI combined scalability in the SVC.

 

Fig. 1. Spatial scalability with ROI

Due to the restriction of resolution or display size such as mobile environment, spatially downscaled video is provided to the use. But user may not satisfy the small sized video, and may wants to what he/she wants to see with enough large resolution. Defining ROI in the picture and providing only ROI containing video stream would be a good solution to the user in that situation. In this case, even display size is the same, semantically meaningful region can be provided with better resolution. Most videos have ROI that is semantically more meaningful than other regions in the picture. For example, people in the picture are more meaningful than background. For this reason, ROI is one of the SVC requirements. Fig 2. shows the usage of the ROI scalability.

Currently, MPEG and ITU-T are jointly making a standard scalable video codec that is based on Motion Compensated Temporal Filtering (MCTF) and H.264. The objectives of this scalable video codec are generating temporal, spatial, and quality scalable coded stream, therefore users can be provided QoS guaranteed streaming services independent of video consuming device in heterogeneous network environment. But, even in the restriction of resolution or display size, user wants to see what he/she wants to see with enough large resolution. Defining Region of Interest (ROI) in the picture and providing only ROI containing video stream would be good solution to the user in that situation. Therefore, we assume the case when there exist more than one ROI in the picture, and each ROI can be decoded independently with spatial, temporal, and quality scalabilities. To accomplish the objective, we used FMO (Flexible Macroblock Ordering) in the H.264 to describe ROIs. But FMO is not enough for independent decoding of multiple ROIs. The first problem is related with overlapping between ROIs. The second problem is related with slice group boundary in independent decoding of ROI. In this paper, we present solutions to address these difficulties.

1. ROI description in SVC
In our approach, we adapt FMO (Flexible Macroblock Ordering) in describing ROIs. FMO is a tool of H.264 that enables the grouping of macroblocks into a slice group and decodes the slice group independently, making it possible to decode remaining parts of the picture when there is a loss of slice group that composes a picture . FMO provides six types of macroblock to slice group map. Among them map type 2, named “foreground and leftover”, groups macroblocks in rectangular regions into slice groups, and the macroblocks not belonging to a rectangular region are grouped into a slice group. We use map type 2 to describe ROIs in the picture.
If more than one ROI is defined in the picture, we should consider the overlapped region between ROIs. The poposed approach is assgning a slice group to overlapped region, and encoding/decoding the region earler than related ROIs like Fig. 2.

 

                                  (a)                                                                  (b)
Fig. 2. ROI description : (a) ROI to slice group mapping (b) assignment of slice group id
2. Independent ROI decoding

FMO is not enough for independent decoding of ROI due to the characteristics of predictive coding and inter-pixel dependent processing. To prevent decoding dependency between slice groups, FMO disables intra-prediction from the macroblocks outside of a slice group. However, it only avoids the decoding dependency that resides in the current picture; there still exists decoding dependency in temporal direction by motion compensation. And, in the boundary of ROI, half sample interpolation for motion estimation (ME) / compensation (MC) and upsampling for Intra Base mode also cause a problem due to interdependency between slice groups.

Fig.3 and Fig. 4 describe the handling of ROI boundary for halp-pel interpolation and upsampling.

 

Fig. 3. Handling ROI boudnary in half-pel interpolation

Fig. 4. ROI description handling ROI boudnary in upsampling for Intra_Base mode

Tae Meon Bae, Truong Cong Thang, Duck Yeon Kim, Yong Man Ro, and Jae-Gon Kim, “SPATIAL SCALABILITY of MULTIPLE ROIs in SCALABLE VIDEO CODING,” SPIE2006(accepted)
Truong Cong Thang, Tae Meon Bae, Yong Ju Jung, Yong Man Ro, Jae-Gon Kim, Haechul Choi, Jin-Woo Hong ,“Spatial Scalability of Multiple ROIs in Surveillance Video” ISO/IEC JTC1/SC29/WG11 M12010, April 2005, Busan, Korea
Truong Cong Thang, Tae Meon Bae, Yong Ju Jung, Yong Man Ro, Jung Won Kang, Haechul Choi, Jae-Gon Kim, Jin-Woo Hong, “SVC CE8 report: Spatial scalability of multiple ROIs,” m12321, July, 2005, Poznan, Poland
Truong Cong Thang, Tae Meon Bae, Yong Man Ro, Jung Won Kang, Jae-Gon Kim, “Improvements of Scalability Information SEI message,” m12635, Oct. Nice, France
Truong Cong Thang, Tae Meon Bae, Yong Man Ro, Jung Won Kang, Jae-Gon Kim, “Boundary handling for ROI scalability,” m12636, Oct. Nice, France
Truong Cong Thang, Tae Meon Bae, Yong Man Ro, Jung Won Kang, Jae-Gon Kim, “Show case of ROI extraction using scalability information SEI message,” m12637, Oct. Nice, France
김덕연, 배태면, 노용만, 강정원, 김재곤,“스케일러블 비디오 코딩에서 ROI 추출에 관한 연구”, 2005년도 한국멀티미디어학회 추계학술발표대회, Nov. 2005, (Accepted).
배태면, 김덕연, 노용만, 강정원, 김재곤,“실시간 스케일러빌러티 변환 SVC 비트스트림에 대한 연구”, 2005년도 한국방송공학회 학술대회, pp.163-166, Nov. 2005.
김덕연, 배태면, 김영석, 노용만, 최해철, 김재곤, “QoS를 위한 실시간 가변 스케일러빌러티를 가진 SVC 비트스트림 추출기에 대한 연구”, 2005년도 대한전자공학회 추계종합학술대회, Nov. 2005, (Accepted)
김덕연, 배태면, 노용만, 강정원, 김재곤, “스케일러블 비디오 코딩에서의 다중 ROI의 구현”, 2005년도 추계신호처리합동학술대회 논문집, p.114, Oct. 2005.