• Overview: Enhanced Indexing Method: Feature Extraction from Compressed Domain

      In all these areas, one critical and challenging issue is to extract features from multimedia data. Low-level features are compact, mathematical representations of the physical properties of the video and audio data. They greatly reduce the amount of data to be analyzed, provide metrics for comparison, and serve as the foundation for indexing, analysis, high-level understanding, and classification. Much research work has been done to investigate various visual and audio features, their extraction methods, and their application in various domains. In fact, the ISO/IEC International Standard MPEG-7 has specified a set of descriptors (i.e., features) and description schemes for the description of multimedia content.

      The reasons that we are interested in compressed-domain features are as follows.

    • Much of the multimedia content available today is in compressed format already, and most of the new video and audio data produced and distributed will be in standardized, compressed format. Using compressed-domain features directly makes it possible to build efficient and real-time video indexing and analysis systems.
    • Some features, such as motion information, are easier to extract from compressed data without the need of extra, expensive computation. Of course, most features can be obtained from uncompressed data as well, usually with a higher precision but at a much higher computational cost.
  • Fast MPEG-7 Descriptor Extraction from DCT Compressed Domain

      The MPEG-7 visual descriptors can be extrcted from DCT compressed domain with lower computational complexity. The procedure of the method is like Fig. 1.When, DCT coefficients are extracted from the DCT compressed image, like JPEG image, desired to obtain visual features.

Fig 1. Procedure of fast descriptor extraction

      The image size can be converted using the obtained DCT coefficients. In this research, the down-sampling with ratio 1/4 is performed to enhance the extraction time with consisting the accuracy in Fig. 2.

Fig. 2. Down-sampling over DCT compressed domain

      The spatial domain information is extracted from down-sampled DCT coefficients with considering the relation between DCT blocks and their sub-block like Fig. 3. With these efforts, by the computational complexity can be lowered, the total feature extraction time can be reduced.

Fig. 3. The extraction of spatial information from down-sampled compressed information

  • DC Sequence Extraction for Video Indexing
  • DC Image Extraction from I-Frame or JPEG compressed image

        ......................       eq. 1                              

dc : dc value of a DCT block, IDCT : A DCT block,  i: pixel value of 8x8 spatial image block

Conceptually, when DCT(Discrete Cosine Transform) is performed for a 8x8 block, the a DC value and 63 AC values are generated. The DC value is same as to average value of 64 spatial information value. So, we can extract a pixel value from a DCT block by eq. 1. For 8x8 block DCT, the size of DC image is 1/64 of the size of original one.(1/8 height, 1/8 width) like Figure 2.

 

DC Image Example

Fig. 4. Example of DC Image

 

  • DC Sequence extraction from B, P - Frame

      DC Sequence from B, P- frame is extracted by combination of motion vector and DC image from I-Frame because I-Frame is encoded by intra-coding and B, P-Frame are done by inter-coding, such as difference image and motion vector.

 

Fig. 5. Extraction of DC sequence from MPEG-1,2 compressed video

  • Conclusion & Future Works

   In practical systems, trade-off between efficiency and accuracy can be explored. Compressed-domain and uncompressed-domain approaches can also be combined. For example, the compressed-domain approach is used to select candidates while the uncompressed-domain approach is used to find the most accurate results. More experimental and research for compressed image and video will be continued.