In a video presentation about the future of video search, covered a few weeks ago by Google Tech Talks, John R Smith (PM at IBM Research) talked about the need for a true classification method centered around automatic machine learning of the content within a video itself using multiple techniques. In addition, John discusses the roll of MPEG-7 format as an ideal standard for storage of metadata.
MPEG-7 is actually a standard developed by Moving Picture Experts Group (MPEG). It was formally named as Multimedia Content Description Interface. It is a standard for the multimedia content data that supports interpretation of the information, which can be passed onto, or accessed by a device or a computer code. MPEG-7 is not targeted at any one application in particular; rather it supports a range of applications as possible. Accessing audio and video used to be a simple matter earlier because of the simplicity of the access mechanisms and lack of the sources. A large amount of audiovisual information is readily available in digital form nowadays in digital archives on internet, and this amount is growing rapidly. The valid solution to handle this problem is MPEG-7.
The MPEG-7 provides a rich set of standardized tools to describe multimedia content for human users and automatic systems that process audiovisual information. MPEG-7 offers a comprehensive set of metadata elements and their structure and relationships that are defined by the standard in the form of Descriptors and Description Schemes to create descriptions, which forms the basis for applications enabling the needed effective and efficient access to multimedia content.
Audiovisual information plays an important role in our society. More and more audiovisual information is available from various sources around the world and represented in various forms of media, audio and visual information used to be consumed directly by the human being; there are an increasing number of cases where the audiovisual information is created, exchanged, retrieved, and re-used by computational systems. Audiovisual sources will play an increasingly pervasive role in our lives, and there will be a growing need to have these sources processed further. Forms of representation that allow some degree of interpretation of the information meaning are necessary. These forms can be passed onto, or accessed by, a device or a computer code.
MPEG-7 is a standard for describing the multimedia content data that will support these operational requirements. These requirements apply to both real-time and non real-time applications. MPEG-7 does not standardize or evaluate applications. MPEG-7 is not aimed at any one application in particular; rather, the elements that MPEG-7 standardizes support as wide range of applications.
Audiovisual data contents having MPEG-7 descriptions associated with it, may include various forms of media, about how these elements are combined in a multimedia presentation. MPEG-7 descriptions. It is possible to create an MPEG-7 description of an analogue movie or of a picture that is printed on paper, in the same way as of digitized content. It allows different granularity in its descriptions, offering the possibility to have different levels of separation. MPEG-7 can make use of the benefits provided by MPEG-4 coded content. Many low-level features can be extracted in fully automatic ways, whereas high level features need greater human interaction. Next to having a description of what is depicted in the content, it is also required to include other types of information about the multimedia data such as the form, conditions for accessing the material, classification, links to other relevant material, context etc.
MPEG-7 standard includes many features such as Descriptors (D) that define the syntax and the semantics of metadata element; and Description Schemes (DS), that specify the structure and semantics of the relationships between their components. A Description Definition Language (DDL) is incorporated to define the syntax and to allow the creation of new Description Schemes and to allow the extension and modification of existing Description Schemes; System tools, to support binary coded representation for efficient storage and transmission, transmission mechanisms, multiplexing of descriptions, synchronization of descriptions with content, management in MPEG-7 descriptions, etc.
The descriptions of content in MPEG-7 may include the information describing the creation and production processes of the content. Information pertaining to the usage of the content. Information of the storage features of the content. Structural information on spatial, temporal components of the content. Information about low level features in the content. Conceptual information of the reality captured by the content. Information about how to browse the content in an efficient way. Information about collections of objects. Information about the interaction of the user with the content. All these descriptions are coded in an efficient way for searching and filtering.
To accommodate content descriptions, MPEG-7 advances towards the description of content from several viewpoints. The Description Tools developed on those viewpoints are separate entities. However, they are interrelated and can be combined in many ways. Depending on the application, some will present and others can be absent or only partially present. MPEG-7 data may be physically located with the concerned AV files, in the same data stream or on the same storage system, but the descriptions could also lie somewhere else. MPEG-7 addresses many different applications in various circumstances and it needs to provide a flexible and extensible framework for describing such audiovisual data. MPEG-7 does not define a monolithic system for content description. MPEG-7 has been developed as generic. MPEG-7 uses XML as the language of choice for the textual representation of content description, as XML Schema has been the base for the DDL that is used for the syntactic definition of MPEG-7 Description Tools and for allowing extensibility of Description Tools. Considering the popularity of XML, usage of it facilitates the interoperability with other metadata standards.
MPEG-7 addresses applications that can be stored on-line or off-line and it can operate in both real-time and non real-time environments. To fully exploit the possibilities of MPEG-7 descriptions, automatic extraction of features is extremely useful. Automatic extraction is not always possible. The higher the level of abstraction, the more difficult automatic extraction is, and interactive extraction tools work here excellently. Also the search engines filter agents, or any other program that can make use of the description, are not specified within the scope of MPEG-7. The DDL allows the definition of the MPEG-7 description tools, both Descriptors and Description Schemes, providing the means for structuring the Ds into DSs. The DDL also allows the extension for specific applications of particular DSs.
From the multimedia content an audiovisual description is obtained via manual or semi-automatic extraction. The AV description may be stored or streamed directly. In a pull scenario, client applications will submit queries to the descriptions repository and will receive a set of descriptions matching the query for browsing. In a push scenario a filter will select descriptions from the available ones and perform the programmed actions afterwards. In both scenarios, all the modules may handle descriptions coded in MPEG-7 formats, but only at the indicated conformance points it is required to be MPEG-7 conformant. The emphasis of MPEG-7 is the provision of novel solutions for audio-visual content description. Thus, addressing text-only documents was not among the goals of MPEG-7. However, audio-visual content may include or refer to text in addition to its audio-visual information. MPEG-7 therefore has standardized different Description Tools for textual annotation and controlled vocabularies.
The elements that MPEG-7 standardizes provide support to a broad range of applications. MPEG-7 is going to make the web as searchable for multimedia content as it is searchable for text today. This would apply especially to large content archives, which are being made accessible to the public to identify content for purchase. Additionally, MPEG-7 descriptions will allow fast and cost-effective usage of the underlying data, by enabling semi-automatic multimedia presentation and editing.
All domains making use of multimedia will benefit from MPEG-7 such as Architecture, real estate, and interior design, radio channels and TV channels. Cultural services like history museums, art galleries. Digital libraries such as image cataloges, musical dictionaries, bio-medical imaging cataloges, film, video and radio archives etc. In E-Commerce like personalized advertising, on-line cataloges, directories of e-shops etc. In education as repositories of multimedia courses, multimedia search for support material etc. In Home Entertainment like systems for the management of personal multimedia collections including manipulation of content, home video editing, searching a game, karaoke etc. In investigation services like in human characteristics recognition and forensics. In Journalism for searching speeches of a certain politician using his name, his voice or his face. In multimedia directory services like yellow pages, Tourist information guides, Geographical information systems etc. In · Multimedia editing as personalized electronic news service and media authoring. In remote sensing like in cartography, ecology, natural resources management etc. In shopping for searching clothes you like. In social sites like dating services and surveillance for traffic control, surface transportation, non-destructive testing in hostile environments etc.
MPEG-7 includes the major functionalities offered by the different parts of the MPEG-7 standard like the binary format for encoding MPEG-7 descriptions and the terminal architecture. The DDL is based on XML Language. But there are certain MPEG-7 extensions which have been added. As a consequence, the DDL can be broken down into the logical normative components like the XML Schema structural language components; the XML Schema data type language components; the MPEG-7 specific extensions.
MPEG-7 Visual Description Tools consist of basic structures and Descriptors that cover the basic visual features such as color, texture, shape, motion, localization, and face recognition. Each category consists of elementary and sophisticated Descriptors. Audio provides structures that are a set of low-level Descriptors, for audio features that cut across many applications and high-level Description Tools that are more specific to a set of applications. Those high-level tools include general sound recognition and indexing Description Tools, instrumental timbre Description Tools, spoken content Description Tools, an audio signature Description Scheme, and melodic Description Tools to facilitate query-by-humming.
MPEG-7 Multimedia Description Schemes comprises the set of Description Tools dealing with generic as well as multimedia entities. Generic entities are features, which are used in audio and visual descriptions, and therefore generic to all media. Apart from this set of generic Description Tools, more complex Description Tools are standardized. They are used whenever more than one medium needs to be described. These Description Tools can be grouped into 5 different classes according to their functionality as Content description: representation of perceivable information. Content management: information about the media features, the creation and the usage of the AV content. Content organization: representation the analysis and classification of several AV contents. Navigation and access: specification of summaries and variations of the AV content. User interaction: description of user preferences and usage history pertaining to the consumption of the multimedia material.
The MPEG-7 includes informative material about the extraction and use of some of the Description Tools, both providing additional insight into MPEG-7 Reference Software implementation as well as alternative approaches. The MPEG-7 Profiles collects standard profiles and levels for MPEG-7, specified across ISO/IEC 15938 parts. The current Profiles concentrate on the Description Definition Language [ISO/IEC 15938-2], Visual [ISO/IEC 15938-3], Audio [ISO/IEC 15938-4], Multimedia Description Schemes [ISO/IEC 15938-5], which are based on the namespace versioning defined in Schema Definition [ISO/IEC 15938-10]. Another profile is included with MPEG-7 standard known as Simple Metadata Profile; this profile describes simple metadata tagging for single instances of an image, or an audio or video clip. This profile can be used in the areas such as music, images, and mobile applications. No levels are current defined for this profile.
The days have just begun for MPEG-7 standard and it shows clearly that there is lot of fire present in it for future applications to be developed on its basis. It promises a very bright future for the presentation of the multimedia as well as other Medias of expressing the data on the computer.