Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
We present a self-supervised method to improve an agent's abilities in describing arbitrary objects while actively exploring a generic environment. This is a challenging problem, as current models ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results