Multimedia content analysis and context recognition

Image search UI


Today, multimedia - video, audio, photos, text - plays an important part in people's everyday life. Vast amounts of content is created and uploaded to the Internet, and provided online on a wide variety of services. The storage, analysis, enrichment and utilization of multimedia content is both an active research topic and a potential business opportunity.

The goal of the research was to provide algorithms, methods, specifications and example use cases that can be used in the design, develop and implementation of systems that handle multimedia content in a scalable, user-friendly, automatic or semi-automatic, context-aware, and open way.


The results are illustrated by the four showcase demonstrations presented here. Each design is described below with links to additional information.


VisualLabel is a Big Media Analytics Platform, which can be used to to perform text, photo and video analysis, automatic and semi-automatic metadata generation, content-based queries, and user feedback on the result accuracy and quality. The service specification and the demo implementation support both on-demand analysis and the analysis of for content previously uploaded to an external service (such as, to user's Google, Facebook or Twitter account). Further information can be found from:

The Resource API

Resource API is a framework that allows developers declaratively define resources and resource relationships. The framework is designed to maintain referential integrity for all defined relations and resources regardless of data storage model.

The Multimedia Context Engine

The Multimedia Context Engine (MCE) is a system for context-aware multimodal data fusion, modelling and prediction, which can take advantage of the built-in sensors (e.g. acceleration, audio, GPS...) and internal system states (e.g. time, application states) of a mobile device in order to predict the most likely user behavior in varios conditions. MCE is designed to to be scalable for the utilization of large and continuously increasing data sets and for any number of parallel data streams.

The Unsupervised Prominence Detector for Speech

The Unsupervised Prominence Detector for Speech is software for detecting prominent (stressed) words in spoken audio. The software is designed to be language universal (tested for the English and Dutch languages) and customizable.


The results of the research will be utilized in future projects by the participating companies and academic institutions. Additionally, the platform and software components have been released as Open Source (Apache 2.0 License), and are available for public use and further development.