Search

Now showing items 1-10 of 13

Fast, invariant representation for human action in the visual system

Isik, Leyla; Tacchetti, Andrea; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-01-06)

The ability to recognize the actions of others from visual input is essential to humans' daily lives. The neural computations underlying action recognition, however, are still poorly understood. We use magnetoencephalography ...

Representation Learning in Sensory Cortex: a theory

Anselmi, Fabio; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2014-11-14)

We review and apply a computational theory of the feedforward path of the ventral stream in visual cortex based on the hypothesis that its main function is the encoding of invariant representations of images. A key ...

View-tolerant face recognition and Hebbian learning imply mirror-symmetric neural tuning to head orientation

Leibo, Joel Z.; Liao, Qianli; Freiwald, Winrich; Anselmi, Fabio; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-06-03)

The primate brain contains a hierarchy of visual areas, dubbed the ventral stream, which rapidly computes object representations that are both specific for object identity and relatively robust against identity-preserving ...

Seeing What You’re Told: Sentence-Guided Activity Recognition In Video

Siddharth, Narayanaswamy; Barbu, Andrei; Siskind, Jeffrey Mark (Center for Brains, Minds and Machines (CBMM), arXiv, 2014-05-29)

We present a system that demonstrates how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, ...

Unsupervised learning of clutter-resistant visual representations from natural videos

Liao, Qianli; Leibo, Joel Z; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2015-04-27)

Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning ...

Learning Real and Boolean Functions: When Is Deep Better Than Shallow

Mhaskar, Hrushikesh; Liao, Qianli; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-03-08)

We describe computational tasks - especially in vision - that correspond to compositional/hierarchical functions. While the universal approximation property holds both for hierarchical and shallow networks, we prove that ...

Complexity of Representation and Inference in Compositional Models with Part Sharing

Yuille, Alan L.; Mottaghi, Roozbeh (Center for Brains, Minds and Machines (CBMM), arXiv, 2015-05-05)

This paper performs a complexity analysis of a class of serial and parallel compositional models of multiple objects and shows that they enable efficient representation and rapid inference. Compositional models are generative ...

Robust Estimation of 3D Human Poses from a Single Image

Wang, Chunyu; Wang, Yizhou; Lin, Zhouchen; Yuille, Alan L.; Gao, Wen (Center for Brains, Minds and Machines (CBMM), arXiv, 2014-06-10)

Human pose estimation is a key step to action recognition. We propose a method of estimating 3D human poses from a single image, which works in conjunction with an existing 2D pose/joint detector. 3D pose estimation is ...

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

Berzak, Yevgeni; Barbu, Andrei; Harari, Daniel; Katz, Boris; Ullman, Shimon (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-06-10)

Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a ...

When Computer Vision Gazes at Cognition

Gao, Tao; Harari, Daniel; Tenenbaum, Joshua; Ullman, Shimon (Center for Brains, Minds and Machines (CBMM), arXiv, 2014-12-12)

Joint attention is a core, early-developing form of social interaction. It is based on our ability to discriminate the third party objects that other people are looking at. While it has been shown that people can accurately ...