▒ ▒ ▒ Project Demo : Semantic Features for Near-Duplicate Video Clip Detection and Video Annotation ▒ ▒ ▒

	Semantic Features for Near-Duplicate Video Clip Detection and Video Annotation

	2009 . 08 ~ Current

1. Introduction

Methods for video copy detection are typically based on the use of low-level visual features. However, low-level features may vary significantly for near-duplicates, which are video sequences that have been the subject of spatial or temporal modifications. As such, the use of low-level visual features may be inadequate for detecting near-duplicates. In this research, we present a new video copy detection method that aims to identify near-duplicates for a given query video sequence. More specifically, the proposed method is based on identifying semantic concepts along the temporal axis of a particular video sequence, resulting in the construction of a so-called semantic video signature. The semantic video signature is then used for the purpose of similarity measurement. The main advantage of the proposed method lies in the fact that the presence of semantic concepts is highly robust to spatial and temporal video transformations.

2. Near-duplicate video

A significant amount of user-generated video content is available on social media applications, such as ‘YouTube’ and ‘LiveLeak’. Since digital video content can be easily edited and redistributed, websites for video sharing commonly suffer from a high number of duplicates (identical copies) and near-duplicates (copies that were the subject of at least one transformation) [1]. The detection of duplicates and near-duplicates can limit redundancy when presenting video search results (e.g., by grouping duplicates and near-duplicates in the retrieval results). Further, the detection of duplicates and near-duplicates is important for the protection of intellectual property (IP).

• Near-duplicates video is approximately identical videos

– photometric variations

• e.g., change of color and lighting

– editing operations

• e.g., insertion of captions, logos, and borders

– speed changes

• e.g., e.g., addition or removal of frames

• Examples of Near-Duplicates

Fig. 1 Examples of Near-Duplicates

3. Video signature

The performance of video copy detection techniques is dependent on the representation of a video segment with a unique set of features. This representation is commonly referred to as a video signature. Video signatures need to be robust with respect to significant spatiotemporal transformations. Also, video signatures need to allow for efficient matching between a query video and video segments in the reference video database.

Video signatures are often created by extracting low-level visual features from video frames. The extracted low-level visual features may for instance describe color, motion, the spatial distribution of intensity information, or interest points. However, it is well-known that video signatures using low-level visual features are highly sensitive to spatiotemporal transformations. This implies that video signatures using low-level visual features do not perform well for the purpose of detecting near-duplicates. Fig. 2 illustrate the problem of video signatures using low-level visual features

Fig. 2. Use of Low-level Visual Features for Creating a Video Signature

4. Semantic Video Signature Creation

Although near-duplicates may have been the subject of several spatial and temporal transformations, they essentially do not contain any new or modified semantic information. In other words, although near-duplicates may have significantly modified low-level features (e.g., color manipulation), they still convey the same information from a high-level point-of-view (e.g., a scene depicting a beach). Therefore, it is worth investigating whether the detection of near-duplicates can be realized by relying on the semantic information conveyed by the video sequence. Consequently, taking into account the observation that the presence of semantic information is generally unaffected by low-level modifications of the video content, we propose a new video copy detection method that makes use of semantic information in order to construct a video signature.

Fig. 3. Use of Low-level Visual Features for Creating a Video Signature

Automatic detection of semantic concepts has already been extensively investigated. Conventional methods for semantic concept detection classify video clips into several predefined concepts. Based on the semantic annotations that are the result of semantic concept detection, users are then able to find video sequences that contain the predefined concepts. However, many video sequences exist that have similar semantic concepts. Moreover, the number of semantic concepts used in automatic concept detection is limited. Therefore, only using the number of semantic concepts for the purpose of detecting video copies is insufficient due to a lack of discriminative power.

To resolve the above problems, we propose a video sequence matching algorithm based on the detection of a number of popular semantic concepts along the temporal axis. Although many video sequences exist that contain similar semantic concepts, and despite the fact that the number of semantic concepts used is limited, the temporal pattern of the semantic concepts is different from video sequence to video sequence. The number of semantic concepts used in our research is 34, including concepts such as ‘road’, ‘sand’, and ‘snow’. These concepts typically represent background information (i.e., general concepts); they typically do not represent objects in the scene (i.e., specific concepts). The variation in terms of background information is usually smaller than the variation in terms of the objects appearing in a scene, intuitively leading to detection rates that are more stable. It should be clear that we treat the problem of video copy detection as a semantic concept sequence matching problem. Therefore, as our matching process is based on the use of semantic concepts, it is less sensitive to variation in terms of low-level features.

Fig. 4. Semantic Video Signature Creation

5. Signature Matching and Copy Detection

Video copy detection aims at determining whether a given query video sequence appears in a target video sequence, and if so, at what location. In a next step, similarity is measured between a query video sequence and target video sequences stored in a database. By matching the semantic signature of the different video sequences, we are able to measure the similarity between a query video sequence and the target video sequences. Using the outcome of the similarity measurement, the last step determines whether the input video is a near-duplicate. Specifically, if the similarity is higher than a predefined threshold, then the query video sequence is regarded as a near-duplicate of the target video sequence.

Fig. 5. Near-duplicate video detection using Semantic video signature

* Contact Person: Prof. Yong Man Ro (ymro@kaist.ac.kr)

	Hyun-seok Min, Se Min Kim, Wesley De Neve, and Yong Man Ro, "Video Copy Detection using Inclined Video Tomography and Bag-of-Visual-Words," Proceedings of the IEEE International Conference on Multimedia and Expo, 2012, Melbourne
	Hyun-seok Min, Jae Young Choi, Wesley De Neve, and Yong Man Ro, "Near-Duplicate Video Clip Detection Using Model-Free Semantic Concept Detection and Adaptive Semantic Distance Measurement," IEEE Transactions on Circuits and Systems for Video Technology, 2012
	Hyun-seok Min, Jae Young Choi, Wesley De Neve, and Yong Man Ro, "TOWARDS A BETTER UNDERSTANDING OF MODEL-FREE SEMANTIC CONCEPT DETECTION FOR ANNOTATION AND NEAR-DUPLICATE VIDEO CLIP DETECTION," IEEE International Conference on Image Processing, 2011, Brussels
	Hyun-seok Min, Jaeyoung Choi, Wesley De Neve, Yong Man Ro, "Leveraging and image folksonomy and the signature quadratic form distance for semantic-based detection of near-duplicate video clips," IEEE International Conference on Multimedia and Expo, 2011, Barcelona
	Hyun-seok Min, Jae Young Choi, Wesley De Neve, and Yong Man Ro, "Bimodal Fusion of Low-level Visual Features and High-level Semantic Features for Near-duplicate Video Clip Detection," Signal Processing: Image Communication, 2011
	Hyun-seok Min, Wesley De Neve, and Yong Man Ro, "Exploiting Collective Knowledge in an Image Folksonomy for Semantic-based Near-duplicate Video Detection," IEEE International Conference on Image Processing, 2010, Hong Kong
	Hyun-seok Min, Wesley De Neve, and Yong Man Ro, "Towards using semantic features for near~duplicate video detection," IEEE International Conference on Multimedia & Expo, 2010, Singapore
	Hyun-Seok Min, Wesley De Neve, and Yong Man Ro, "Use of Semantic Features for Filtering of Malicious Content in an IPTV Environment," IEEE International Conference on Consumer Electronics, 2010, Las Vegas
	Hyun-seok Min, Sihyoung Lee, Wesley De Neve, Yong Man Ro, "Semantic Concept Detection for User-generated Video Content using a Refined Image Folksonomy," The International Conference on Multi-Media Modeling, 2010, Chongquing
	Hyun-Seok Min, JaeYoung Choi, Wesley De Neve, and Yong Man Ro, “Near-duplicate video detection using temporal patterns of semantic concepts,” IEEE International Symposium on Multimedia, 2009, San diego, USA
	민현석, 최재영, Wesley De Neve, 노용만, "의미 기반 유사 복제 비디오 검출을 위한 의미간 거리 측정 방법의 비교," 대한전자공학회 하계종합학술대회, 2011, 제주
	민현석, Wesley De Neve, 노용만, "유사 중복 비디오 검출에서 저수준 비주얼 특징과 의미 특징 비교 연구," 전자공학회 하계학술발표대회, 2010, 제주
	민현석, 최재영, Wesley De Neve, 노용만, "효과적인 비디오 정합을 위한 의미기반 비디오 식별자에 관한 연구," 대한전자공학회 추계종합학술대회, 2009, 서울대학교
	민현석, 노용만, "압축 영역에서의 효율적인 얼굴 검출을 위한 조명 효과 개선에 관한 연구", 대한전자공학회 추계학술대회, 2009, 서울대학교