Welcome to the Computer Vision and Multimedia Lab website.GET IN TOUCH
- Digital Humanities projects
- Visual Attention Mechanisms
- Perceptive Interfaces
- Eye Tracking
- Visual Systems for Browsing Large Collections of Images
- Visual Data Mining for e-learning Applications
- Artificial Vision for Mobile Mapping
- Image Analysis
- Proteomic and bioinformatics
- 3D Vision
- Distributed Sensor Networks
- Hierarchical Architectures
ResearchIl gruppo del laboratorio di Visione Artificiale & Multimedia (CVML Computer Vision & Multimedia Lab) di Pavia, inizialmente impegnato in attività di ricerca concentrate sull’elaborazione delle immagini e sulle architetture parallele per la visione, attualmente è attivo nelle seguenti aree di ricerca: Riconoscimento di Persone, Deep Reinforcement Learning per la robotica, Proteomica, Eye Tracking per l’interazione uomo-macchina e la biometria, Digital Heritage e Digital Humanities con l’impiego del 3D per promuovere la divulgazione e l’accessibilità del patrimonio culturale.
> a brief summary of current topics (poster)
Digital Humanities projects. Activities carried out for analysis, promotion, dissemination and accessibility of cultural heritage include: 3D reconstructions, applications for eye and gesture human-computer interaction, 3D printing, production of tactile images, realization of Cd-rom, writing analysis of relics, image processing and 3D modeling of historical musical instruments, etc. For further details, please visit the dedicated page.
Visual Attention Mechanisms. Images are collected by a commercial camera at the rate of millions of pixel per second. Most of the data collected is useless, and the small amount of profitable data must be extracted from this cumbersome set. Conventional computers are unable to manage this selection problem. Several proposals were put towards specializing computer vision systems so that they effectively identify regions and events of interest in a manner analogous to human vision systems. The attention mechanism in humans effectively operates by directing (the fovea retina part with greater acuity) to just the most useful areas to accomplish the task at hand. Multi-resolution representation have been developed as one of the possible method for emulating the focus of attention strategy typical of biological systems. A new approach for object recognition has been implemented based on the multi-resolution paradigm.
Perceptive Interfaces. They provide the computer with perceptive capabilities. Through perception, the machine becomes able to sense its environment, acquiring implicit and/or explicit information about users and what happens around them. Vision-based user interfaces, which receive data input through cameras, are an important part of this category. We consider two kinds of interactions, which can be classified into two groups according to the active or passive role played by the user in the communication process. While in explicit communication users are fully aware that their actions will be interpreted as direct commands (e.g. hand or head gesture recognition), in implicit communication one's behavior is indirectly observed to draw information on what he/she is doing (e.g. phoning, talking to someone, etc.) or about emotional states (derived, for example, from the analysis of face expressions or eye activities). In general, our main goal is to develop and test interface systems in which computer vision technology is really useful, also (and above all) in everyday PC-based operations. One important field on which we are concentrating, however, is e-learning, where the use of natural forms of communication may make a big difference. As our research is aimed at developing vision-based user interfaces and at exploring whether, when and how vision technology can be useful for PC-related tasks, we pay special attention to usability issues: of course, what we obtain must be really useful and usable.
It means detecting the user's gaze direction. Interfaces operated through the eyes are of great help for people with severe disabilities, allowing them to use their gaze to identify, or even move, objects on the screen, as well as to write. But eye tracking can be also used to improve ordinary keyboard- and mouse-based interaction: several explicit-communication eye-based interfaces have been developed to date, and might become popular in the future if costs of eye trackers will sufficiently go down. Eye-tracking is also studied and applied in several contexts besides that of an input means for interfaces, and many applications can be found, for instance, in Psychology, Psychophysics, Neuroscience, Usability, and Advertising. In our research we consider eye tracking both for implementing explicit/implicit interfaces and as a helpful means for the evaluation of web sites, information presentation modes and visual interactions in general. For example, we have developed Eye-S, a system that allows input to be provided to the computer through a pure eye-based approach. Another project is e5Learning, an e-learning environment where eye tracking is used to observe user behavior, so as to adapt content presentation in real-time. Also, we are studying the effectiveness of existing and new RSVP (Rapid Serial Visual Presentation) image visualization methods, which involve strong eye activity.
Visual Systems for Browsing Large Collections of Images. Within the so-called Information Presentation field, the problem of effective browsing of large image databases has so-far received relatively limited attention. Although many have been the systems developed for fast identification of pictures with specific characteristics (classified manually, by associating textual metadata with them, or automatically, through complex mathematical measures of color, texture and shape), "ordinary" image presentation usually occurs by displaying pictures arranged in a grid. In point of fact, this is a good solution for a limited number of items, but it may not scale properly when dealing with much more images (in the order of thousands). Starting from the assumption that in many cases the user does not know precisely what to search, we concentrate on methods which allow all the images of a collection to be presented in short times. In other words, we want the user to be provided with "global views" of the whole database, so as to be able to select those images he or she "likes" more.
Visual Data Mining for e-learning Applications. Various programs have emerged that provide statistical analysis of WWW access logs. These programs typically detail the number of accesses for a file and so on, so these approaches are not suitable for the specific e-learning applications. This research concerns visual e-learning log mining as a novel and specific application of visual data mining of log data provided by e-learning commercial courses. In this way it is available a set of graphics for observation of hundred of learners at a glance in order to discriminate between sequence of learning activities that yield good results and sequence that are not so effective. Moreover, using the proposed visualizations of learning track data, instructors identify individuals that need special attention.
Artificial Vision for Mobile Mapping. This research presents the technology of a vehicle-based mobile mapping system to maintain an updated transportation database for road and railway inventory. The mobile mapping system integrates digital cameras developed to collect data on position and attributes of infrastructure and signs. This study discusses detecting and identifying road and railway signs from images based on neural networks, color image processing, parameters that present geometric characteristics. With this combined method it is possible to detect and identify signs.
Image Analysis. Nowadays low cost acquisition devices are available, so the classical Image Analysis is a common task not only in research studies, but also in industrial application. Our interests in this area cover many aspects that can be grouped into two main categories: offline analysis of still images acquired in indoor (user controlled) environments and real-time interpretation of sequences of frames in outdoor environments (e.g. moving robots or vehicles). Among others researches, there is a project in collaboration with Centro Ricerche FIAT; its main goal is the fusion of data produced by different sources (in particular vision cameras) to obtain an affordable data interpretation. Such results are the basis for software module to embed in common road vehicles. As examples: multiple sensors for pre-crash warning, and actuation systems based on obstacle detection, classification and tracking; driver assistance systems, based on driver status monitoring, manoeuver-area sensing and critical situation detection. Another research regards the segmentation of MPEG-coded digital video sequences in homogeneous and consecutive frame sets, in collaboration with CRIMTA - Centro di Ricerca Interpartimentale Multimediale sul Teatro Antico, in order to obtain MPEG video regarding ancient theatre and content description to detect shot and frame. The importance of this step is evident as it turns out to be necessary for a correct implementation of any successive elaboration such as classification, compression, restoration, transmission, and analysis of video content and reproduction.
Proteomic and Bioinformatics. The main task of this project is to study new approaches, strategies and capabilities of pattern recognition techniques for improving some aspects in the field of bioinformatics and computational biology. In particular modeling and comparison of proteins 3D structures will be considered. Considering the state of the art, it is scheduled to investigate three particular approaches that are not yet being followed (or just partially). 1) Usage of the generalized 3D Hough transform for protein comparison. The main scope of this topic is to search proteins structural similarities in database (PDB). This could be applied to different levels of protein representation (atomic, secondary structure or geometric 3D surfaces). Let's consider, for example, the secondary representation of a protein (made of Helix and Strand). Every detected "evidence" (say Helix or Strand) votes (with a weighted contribution) for a set of "edition" of the model each one corresponding to a given Helix or Strand of the searched protein. By collecting all the contribution, the edition of the model that gathers sufficient votes, determinates the similarity. Notice that G-Hough transform is a parallel algorithm (elements can vote in parallel, proteins can be compared in parallel ...). This could provide good performances on parallel architectures (like CELL BE processor). 2) Usage of the EGI (Extended Gaussian Images) or CEGI (Complex EGI) for solving proteins surface docking (protein-protein or protein-ligand). The objective here is to find complementary regions. EGI is the histogram of surface orientation represented on the unitary sphere. EGI are computationally simple to evaluate and therefore it is worth verifying if necessary condition for docking applied to EGI are enough selective. The Complex EGI is an enriched version in which the orientation is combined with the distance. The advantages/disadvantages of this different approach will be investigated. 3) Applying mathematical morphology (by Serra-Motheron - also known as "image algebra" for others) for surface modeling. The objective is to develop new computational methods that can be applied also to prediction and validation of protein-protein interactions. Simple operands like dilation or erosion can be applied to protein molecular surface modeling. An application can be a comparative analysis finalized to the individuation of topological regions preserved in different species. see more
Reconstruction, analysis, synthesis and manipulation of virtual objects or virtual representations of real objects are usually addressed as "3D Vision" problems.
Our interests in this area cover many aspects that can be grouped into two categories: 3D analysis and 3D synthesis.
Analysis includes all processes that can retrieve informations about objects in a given environment, meanwhile Synthesis refers to methods that allow the building (rendering) of a virtual scene.
A very interesting project in 3D Vision started two years ago in collaboration between Vision Lab and RAI, focusing on the application of such techniques inside a Virtual Set environment. The main goal of this project is to find a solution for two common problems such as 3D reconstruction and illumination of virtual objects. The former tries to build fast and accurate models of real objects present on the scene, starting from informations acquired by non-invasive methods. The latter has its goal in defining accurate illumination models to gather realism in virtual environments.
Another main research area in 3D Vision is the GPGPU, that is the development of General Purpose Algorithms that can be implemented directly on the GPU using fast and dedicated hardware initially thought only for fast rendering but that can be used also as an alternate elaboration unit.
The least but not last research area is dedicated to 3D Graphics, especially in developments of dynamics and particle systems for real-time processing.
Ricostruzione virtuale di Pavia nel XVI secolo / Virtual reconstruction of Pavia in the 16th century
CVMLab e Expo 2015 / CVMLab and Expo 2015
La Macchina Vasariana
Pavia città d'Arte
Distributed Sensor Networks. Today one emerging issue of research on sensor systems and data analysis is that of distributed sensor networks. Wireless technology has recently made possible to connect light and cheap sensor boards together, opening future scenarios like ubiquitous sensing or sentient objects to the most forward looking observers. The terms Distributed Sensor Networks (DSN), Wireless Sensor Networks (WSN), or more and more often simply sensor networks, are today converging in identifying the technology originally known as motes. From a lexical point of view, it is interesting how sensor networks is progressively coming to denote exclusively the technological area of WSN. A typical sensor node is a small battery powered board including a microprocessor, a memory, a RF transceiver and an antenna. These elements are reduced to the limit of the lowest energy consumption and dimensions to make the sensor nodes ubiquitous and of long-lasting autonomy. Wireless connection facilitates deployment and makes the memory remotely accessible by creating a network layer between the sensors.
Hierarchical Architectures. Image processing is a very heavy computational task, and achieving reasonable throughput is unlikely on conventional architectures. A good platform to deal with images is the Papia2, a processor array able to reconfigure itself as a pyramid; the Papia2 array allows a one-to-one mapping between processors and image pixels. A low level and a high level simulator have been developed for the machine. The low-level simulator can execute Papia2 instructions and is used to code and test basic routines. The high-level simulator is devoted to the realization of more complex algorithms using a high level language. The programming environment embeds a source-level debugger to trace running programs and a visualization module to monitor machine status in the form of evolving images. With the 'imget' metaphor, active images capable of modifying their contents, it is possible to offer a high level access to the Papia2 array. Application development can thus ignore hardware details and deal only with image transformations. More recently a new solution, with off-the shelf hardware, has been developed, pursuing the hierarchical implementation of attention mechanisms by multiresolution techniques.
Poster Collaborative slide screening for the diagnosis of breast cancer metastases in lymph nodes
Poster Applicazioni multimediali per Time-of-Flight Camera, Proteomica, Applicazioni metodologia Eye-Tracking