Browsing CBMM Memo Series by Title

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Mao, Junhua; Xu, Wei; Yang, Yi; Wang, Jiang; Huang, Zhiheng; e.a. (Center for Brains, Minds and Machines (CBMM), arXiv, 2015-05-07)

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. ...

Deep compositional robotic planners that follow natural language commands

Kuo, Yen-Ling; Katz, Boris; Barbu, Andrei (Center for Brains, Minds and Machines (CBMM), Computation and Systems Neuroscience (Cosyne), 2020-05-31)

We demonstrate how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands in a continuous configuration space to move and manipu- late objects. Our approach combines ...

Deep Convolutional Networks are Hierarchical Kernel Machines

Anselmi, Fabio; Rosasco, Lorenzo; Tan, Cheston; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2015-08-05)

We extend i-theory to incorporate not only pooling but also rectifying nonlinearities in an extended HW module (eHW) designed for supervised learning. The two operations roughly correspond to invariance and selectivity, ...

Deep Nets: What have they ever done for Vision?

Yuille, Alan L.; Liu, Chenxi (Center for Brains, Minds and Machines (CBMM), 2018-05-10)

This is an opinion paper about the strengths and weaknesses of Deep Nets. They are at the center of recent progress on Artificial Intelligence and are of growing importance in Cognitive Science and Neuroscience since they ...

Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning

Lotter, William; Kreiman, Gabriel; Cox, David (Center for Brains, Minds and Machines (CBMM), arXiv, 2017-03-01)

While great strides have been made in using deep learning algorithms to solve supervised learning tasks, the problem of unsupervised learning—leveraging unlabeled examples to learn about the structure of a domain — remains ...

Deep Regression Forests for Age Estimation

Shen, Wei; Guo, Yilu; Wang, Yan; Zhao, Kai; Wang, Bo; e.a. (Center for Brains, Minds and Machines (CBMM), 2018-06-01)

Age estimation from facial images is typically cast as a nonlinear regression problem. The main challenge of this problem is the facial feature space w.r.t. ages is inhomogeneous, due to the large variation in facial ...

A Deep Representation for Invariance And Music Classification

Zhang, Chiyuan; Evangelopoulos, Georgios; Voinea, Stephen; Rosasco, Lorenzo; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2014-17-03)

Representations in the auditory cortex might be based on mechanisms similar to the visual ventral stream; modules for building invariance to transformations and multiple layers for compositionality and selectivity. In this ...

Deep vs. shallow networks : An approximation theory perspective

Mhaskar, Hrushikesh; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-08-12)

The paper briefly reviews several recent results on hierarchical architectures for learning from examples, that may formally explain the conditions under which Deep Convolutional Neural Networks perform much better in ...

DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial Occlusion

Zhang, Zhishuai; Xie, Cihang; Wang, Jianyu; Xie, Lingxi; Yuille, Alan L. (Center for Brains, Minds and Machines (CBMM), 2018-06-19)

In this paper, we study the task of detecting semantic parts of an object, e.g., a wheel of a car, under partial occlusion. We propose that all models should be trained without seeing occlusions while being able to transfer ...

Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts

Chen, Xianjie; Mottaghi, Roozbeh; Liu, Xiaobai; Fidler, Sanja; Urtasun, Raquel; e.a. (Center for Brains, Minds and Machines (CBMM), arXiv, 2014-06-10)

Detecting objects becomes difficult when we need to deal with large shape deformation, occlusion and low resolution. We propose a novel approach to i) handle large deformations and partial occlusions in animals (as examples ...

Detecting Semantic Parts on Partially Occluded Objects

Wang, Jianyu; Xe, Cihang; Zhang, Zhishuai; Zhu, Jun; Xie, Lingxi; e.a. (Center for Brains, Minds and Machines (CBMM), 2017-09-04)

In this paper, we address the task of detecting semantic parts on partially occluded objects. We consider a scenario where the model is trained using non-occluded images but tested on occluded images. The motivation is ...

Discriminate-and-Rectify Encoders: Learning from Image Transformation Sets

Tachetti, Andrea; Voinea, Stephen; Evangelopoulos, Georgios (Center for Brains, Minds and Machines (CBMM), arXiv, 2017-03-13)

The complexity of a learning task is increased by transformations in the input space that preserve class identity. Visual object recognition for example is affected by changes in viewpoint, scale, illumination or planar ...

Do Deep Neural Networks Suffer from Crowding?

Volokitin, Anna; Roig, Gemma; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), arXiv, 2017-06-26)

Crowding is a visual effect suffered by humans, in which an object that can be recognized in isolation can no longer be recognized when other objects, called flankers, are placed close to it. In this work, we study the ...

Do Neural Networks for Segmentation Understand Insideness?

Villalobos, Kimberly; Štih, Vilim; Ahmadinejad, Amineh; Sundaram, Shobhita; Dozier, Jamell; e.a. (Center for Brains, Minds and Machines (CBMM), 2020-04-04)

The insideness problem is an image segmentation modality that consists of determining which pixels are inside and outside a region. Deep Neural Networks (DNNs) excel in segmentation benchmarks, but it is unclear that they ...

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

Berzak, Yevgeni; Barbu, Andrei; Harari, Daniel; Katz, Boris; Ullman, Shimon (Center for Brains, Minds and Machines (CBMM), arXiv, 2016-06-10)

Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a ...

Double descent in the condition number

Poggio, Tomaso; Kur, Gil; Banburski, Andrzej (Center for Brains, Minds and Machines (CBMM), 2019-12-04)

In solving a system of n linear equations in d variables Ax=b, the condition number of the (n,d) matrix A measures how much errors in the data b affect the solution x. Bounds of this type are important in many inverse ...

Dreaming with ARC

Banburski, Andrzej; Ghandi, Anshula; Alford, Simon; Dandekar, Sylee; Chin, Peter; e.a. (Center for Brains, Minds and Machines (CBMM), 2020-11-23)

Current machine learning algorithms are highly specialized to whatever it is they are meant to do –– e.g. playing chess, picking up objects, or object recognition. How can we extend this to a system that could solve a ...

The Effects of Image Distribution and Task on Adversarial Robustness

Kunhardt, Owen; Deza, Arturo; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2021-02-18)

In this paper, we propose an adaptation to the area under the curve (AUC) metric to measure the adversarial robustness of a model over a particular ε-interval [ε_0, ε_1] (interval of adversarial perturbation strengths) ...

Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas

Kuo, Yen-Ling; Katz, Boris; Barbu, Andrei (Center for Brains, Minds and Machines (CBMM), The Ninth International Conference on Learning Representations (ICLR), 2020-10-25)

We demonstrate a reinforcement learning agent which uses a compositional recurrent neural network that takes as input an LTL formula and determines satisfying actions. The input LTL formulas have never been seen before, ...

Exact Equivariance, Disentanglement and Invariance of Transformations

Liao, Qianli; Poggio, Tomaso (2017-12-31)

Invariance, equivariance and disentanglement of transformations are important topics in the field of representation learning. Previous models like Variational Autoencoder [1] and Generative Adversarial Networks [2] attempted ...