Université de Trente, Italie

Visual features for linguists

Mardi 26 novembre 2013 à 14h en salle C221

Résumé :

Features automatically extracted from images constitute a new and rich source of semantic knowledge that can complement information extracted from text. The convergence between vision- and text-based information can be exploited in scenarios where the two modalities must be combined to solve a target task (e.g., generating verbal descriptions of images, or finding the right images to illustrate a story). However, the potential applications for integrated visual features go beyond mixed-media scenarios : Because of their complementary nature with respect to language, visual features might provide perceptually grounded semantic information that can be exploited in purely linguistic domains.

In this talk I will first introduce basic techniques to encode image contents in terms of low-level features, such as the widely adopted SIFT descriptors. I will then show how these low-level descriptors are used to induce more abstract features, focusing on the well-established bags-of-visual-words method to represent images, but also briefly introducing more recent developments, that include capturing spatial information with pyramid representations, soft visual word clustering via Fisher encoding and attribute-based image representation.

Finally, I will discuss how to use visual features to construct visually informed distributional semantics spaces.


