In all these areas, one critical and challenging issue is to extract features from multimedia data. Low-level features are compact, mathematical representations of the physical properties of the video and audio data. They greatly reduce the amount of data to be analyzed, provide metrics for comparison, and serve as the foundation for indexing, analysis, high-level understanding, and classification. Much research work has been done to investigate various visual and audio features, their extraction methods, and their application in various domains. In fact, the ISO/IEC International Standard MPEG-7 has specified a set of descriptors (i.e., features) and description schemes for the description of multimedia content.
The reasons that we are interested in compressed-domain features are as follows.
The MPEG-7 visual descriptors can be extrcted from DCT compressed domain with lower computational complexity. The procedure of the method is like Fig. 1.When, DCT coefficients are extracted from the DCT compressed image, like JPEG image, desired to obtain visual features.
Fig 1. Procedure of fast descriptor extraction
The image size can be converted using the obtained DCT coefficients. In this research, the down-sampling with ratio 1/4 is performed to enhance the extraction time with consisting the accuracy in Fig. 2.
Fig. 2. Down-sampling over DCT compressed domain
The spatial domain information is extracted from down-sampled DCT coefficients with considering the relation between DCT blocks and their sub-block like Fig. 3. With these efforts, by the computational complexity can be lowered, the total feature extraction time can be reduced.
Fig. 3. The extraction of spatial information from down-sampled compressed information
...................... eq. 1
dc : dc value of a DCT block, IDCT : A DCT block, i: pixel value of 8x8 spatial image block
Conceptually, when DCT(Discrete Cosine Transform) is performed for a 8x8 block, the a DC value and 63 AC values are generated. The DC value is same as to average value of 64 spatial information value. So, we can extract a pixel value from a DCT block by eq. 1. For 8x8 block DCT, the size of DC image is 1/64 of the size of original one.(1/8 height, 1/8 width) like Figure 2.
Fig. 4. Example of DC Image
DC Sequence from B, P- frame is extracted by combination of motion vector and DC image from I-Frame because I-Frame is encoded by intra-coding and B, P-Frame are done by inter-coding, such as difference image and motion vector.
Fig. 5. Extraction of DC sequence from MPEG-1,2 compressed video
In practical systems, trade-off between efficiency and accuracy can be explored. Compressed-domain and uncompressed-domain approaches can also be combined. For example, the compressed-domain approach is used to select candidates while the uncompressed-domain approach is used to find the most accurate results. More experimental and research for compressed image and video will be continued.