|
|
|
¡¡
|
|
Semantic Features for Near-Duplicate Video Clip Detection
and Video Annotation
|
|
|
2009 . 08 ~ Current
|
|
|
|
|
1.
Introduction
Methods for video copy detection are typically based on the use
of low-level visual features. However, low-level features may vary
significantly for near-duplicates, which are video sequences that have been
the subject of spatial or temporal modifications. As such, the use of
low-level visual features may be inadequate for detecting near-duplicates.
In this research, we present a new video copy detection method that aims to
identify near-duplicates for a given query video sequence. More
specifically, the proposed method is based on identifying semantic concepts
along the temporal axis of a particular video sequence, resulting in the
construction of a so-called semantic video signature. The semantic video
signature is then used for the purpose of similarity measurement. The main
advantage of the proposed method lies in the fact that the presence of
semantic concepts is highly robust to spatial and temporal video
transformations.
2. Near-duplicate video
A significant amount of user-generated video content is
available on social media applications, such as ¡®YouTube¡¯ and ¡®LiveLeak¡¯. Since digital video content can be easily
edited and redistributed, websites for video sharing commonly suffer from
a high number of duplicates (identical copies) and near-duplicates
(copies that were the subject of at least one transformation) [1]. The detection of duplicates and
near-duplicates can limit redundancy when presenting video search results
(e.g., by grouping duplicates and near-duplicates in the retrieval
results). Further, the detection of duplicates and near-duplicates is
important for the protection of intellectual property (IP).
•
Near-duplicates video is approximately identical
videos
–
photometric
variations
•
e.g.,
change of color and lighting
–
editing
operations
•
e.g.,
insertion of captions, logos, and borders
–
speed
changes
•
e.g.,
e.g., addition or removal of frames
•
Examples
of Near-Duplicates
Fig.
1 Examples of Near-Duplicates
3. Video signature
The performance of video copy detection techniques is
dependent on the representation of a video segment with a unique set of
features. This representation is commonly referred to as a video
signature. Video signatures need to be robust with respect to significant
spatiotemporal transformations. Also, video signatures need to allow for
efficient matching between a query video and video segments in the
reference video database.
Video signatures are often created by extracting low-level
visual features from video frames. The extracted low-level visual
features may for instance describe color, motion, the spatial
distribution of intensity information, or interest points. However, it is
well-known that video signatures using low-level visual features are
highly sensitive to spatiotemporal transformations. This implies that
video signatures using low-level visual features do not perform well for
the purpose of detecting near-duplicates. Fig. 2 illustrate the problem
of video signatures using low-level visual features
Fig. 2. Use of Low-level Visual
Features for Creating a Video Signature
4. Semantic Video Signature Creation
Although near-duplicates may have been the subject of
several spatial and temporal transformations, they essentially do not
contain any new or modified semantic information. In other words,
although near-duplicates may have significantly modified low-level
features (e.g., color manipulation), they still convey the same
information from a high-level point-of-view (e.g., a scene depicting a
beach). Therefore, it is worth investigating whether the detection of
near-duplicates can be realized by relying on the semantic information
conveyed by the video sequence. Consequently, taking into account the
observation that the presence of semantic information is generally
unaffected by low-level modifications of the video content, we propose a
new video copy detection method that makes use of semantic information in
order to construct a video signature.
Fig. 3. Use of Low-level Visual Features for
Creating a Video Signature
Automatic detection of semantic concepts has already been
extensively investigated. Conventional methods for semantic concept
detection classify video clips into several predefined concepts. Based on
the semantic annotations that are the result of semantic concept
detection, users are then able to find video sequences that contain the
predefined concepts. However, many video sequences exist that have
similar semantic concepts. Moreover, the number of semantic concepts used
in automatic concept detection is limited. Therefore, only using the
number of semantic concepts for the purpose of detecting video copies is
insufficient due to a lack of discriminative power.
To resolve the above problems, we propose a video sequence
matching algorithm based on the detection of a number of popular semantic
concepts along the temporal axis. Although many video sequences exist
that contain similar semantic concepts, and despite the fact that the
number of semantic concepts used is limited, the temporal pattern of the
semantic concepts is different from video sequence to video sequence. The
number of semantic concepts used in our research is 34, including
concepts such as ¡®road¡¯, ¡®sand¡¯, and ¡®snow¡¯. These concepts typically
represent background information (i.e., general concepts); they typically
do not represent objects in the scene (i.e., specific concepts). The
variation in terms of background information is usually smaller than the
variation in terms of the objects appearing in a scene, intuitively
leading to detection rates that are more stable. It should be clear that
we treat the problem of video copy detection as a semantic concept
sequence matching problem. Therefore, as our matching process is based on
the use of semantic concepts, it is less sensitive to variation in terms
of low-level features.
Fig.
4. Semantic Video Signature Creation
5. Signature Matching and Copy Detection
Video copy detection aims at determining whether a given
query video sequence appears in a target video sequence, and if so, at
what location. In a next step, similarity is measured between a query
video sequence and target video sequences stored in a database. By
matching the semantic signature of the different video sequences, we are
able to measure the similarity between a query video sequence and the
target video sequences. Using the outcome of the similarity measurement,
the last step determines whether the input video is a near-duplicate.
Specifically, if the similarity is higher than a predefined threshold,
then the query video sequence is regarded as a near-duplicate of the
target video sequence.
|
|
|
|
Fig.
5. Near-duplicate video detection using Semantic video signature
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* Contact Person: Prof. Yong Man
Ro (ymro@kaist.ac.kr)
|
|
|
|
¡¡
|
|
Hyun-seok Min, Se Min Kim, Wesley De Neve, and Yong Man
Ro, "Video Copy Detection using Inclined Video Tomography and
Bag-of-Visual-Words," Proceedings of the IEEE International
Conference on Multimedia and Expo, 2012, Melbourne
|
|
Hyun-seok Min, Jae Young Choi, Wesley De Neve, and Yong
Man Ro, "Near-Duplicate Video Clip Detection Using Model-Free
Semantic Concept Detection and Adaptive Semantic Distance
Measurement," IEEE Transactions on Circuits and Systems for Video
Technology, 2012
|
|
Hyun-seok Min, Jae Young Choi, Wesley De Neve, and Yong
Man Ro, "TOWARDS A BETTER UNDERSTANDING OF MODEL-FREE SEMANTIC
CONCEPT DETECTION FOR ANNOTATION AND NEAR-DUPLICATE VIDEO CLIP
DETECTION," IEEE International Conference on Image Processing, 2011,
Brussels
|
|
Hyun-seok Min, Jaeyoung Choi,
Wesley De Neve, Yong Man Ro, "Leveraging and image folksonomy and
the signature quadratic form distance for semantic-based detection of
near-duplicate video clips," IEEE International Conference on
Multimedia and Expo, 2011, Barcelona
|
|
Hyun-seok Min, Jae Young Choi, Wesley De Neve, and Yong
Man Ro, "Bimodal Fusion of Low-level Visual Features and High-level
Semantic Features for Near-duplicate Video Clip Detection," Signal
Processing: Image Communication, 2011
|
|
Hyun-seok Min, Wesley De Neve, and Yong Man Ro,
"Exploiting Collective Knowledge in an Image Folksonomy for
Semantic-based Near-duplicate Video Detection," IEEE International
Conference on Image Processing, 2010, Hong Kong
|
|
Hyun-seok Min, Wesley De Neve, and Yong Man Ro,
"Towards using semantic features for near~duplicate
video detection," IEEE International Conference on Multimedia &
Expo, 2010, Singapore
|
|
Hyun-Seok Min, Wesley De Neve, and Yong Man Ro, "Use
of Semantic Features for Filtering of Malicious Content in an IPTV
Environment," IEEE International Conference on Consumer Electronics,
2010, Las Vegas
|
|
Hyun-seok Min, Sihyoung Lee,
Wesley De Neve, Yong Man Ro, "Semantic Concept Detection for
User-generated Video Content using a Refined Image Folksonomy," The
International Conference on Multi-Media Modeling, 2010, Chongquing
|
|
Hyun-Seok Min, JaeYoung Choi,
Wesley De Neve, and Yong Man Ro, ¡°Near-duplicate video detection using
temporal patterns of semantic concepts,¡± IEEE International Symposium on
Multimedia, 2009, San diego, USA
|
|
¹ÎÇö¼®, ÃÖÀ翵, Wesley De Neve, ³ë¿ë¸¸, "ÀÇ¹Ì ±â¹Ý À¯»ç º¹Á¦ ºñµð¿À °ËÃâÀ» À§ÇÑ Àǹ̰£ °Å¸® ÃøÁ¤ ¹æ¹ýÀÇ ºñ±³," ´ëÇÑÀüÀÚ°øÇÐȸ ÇÏ°èÁ¾ÇÕÇмú´ëȸ, 2011, Á¦ÁÖ
|
|
¹ÎÇö¼®, Wesley De Neve, ³ë¿ë¸¸, "À¯»ç Áߺ¹ ºñµð¿À °ËÃâ¿¡¼ Àú¼öÁØ ºñÁÖ¾ó Ư¡°ú ÀÇ¹Ì Æ¯Â¡ ºñ±³ ¿¬±¸," ÀüÀÚ°øÇÐȸ ÇÏ°èÇмú¹ßÇ¥´ëȸ, 2010, Á¦ÁÖ
|
|
¹ÎÇö¼®, ÃÖÀ翵, Wesley De Neve, ³ë¿ë¸¸, "È¿°úÀûÀÎ ºñµð¿À Á¤ÇÕÀ» À§ÇÑ Àṉ̀â¹Ý ºñµð¿À ½Äº°ÀÚ¿¡ °üÇÑ ¿¬±¸," ´ëÇÑÀüÀÚ°øÇÐȸ Ãß°èÁ¾ÇÕÇмú´ëȸ, 2009, ¼¿ï´ëÇб³
|
|
¹ÎÇö¼®, ³ë¿ë¸¸, "¾ÐÃà ¿µ¿ª¿¡¼ÀÇ È¿À²ÀûÀÎ ¾ó±¼ °ËÃâÀ» À§ÇÑ Á¶¸í È¿°ú °³¼±¿¡ °üÇÑ ¿¬±¸", ´ëÇÑÀüÀÚ°øÇÐȸ Ãß°èÇмú´ëȸ, 2009, ¼¿ï´ëÇб³
|
|
|
|
|
|
|