IDTransformer: Transformer for Intrinsic Image Decomposition

Abstract

The aim of intrinsic image decomposition (IID) is to recover reflectance and the shading from a given image. As different combinations are possible, IID is an under constrained problem. Previous approaches try to constrain the search space using hand crafted priors. However, these priors are based on strong imaging assumptions and fall short when these do not hold. Deep learning based methods learn the problem end-to-end from the data. But these networks lack any explicit information about the image formation model.
In this paper, an IID transformer approach (IDTransformer) is proposed by learning photometric invariant attention, derived from the image formation model, integrated in the transformer framework. The combination of invariant features in both a global and local setting allows the network to not only learn reflectance transitions, but also to group similar reflectance regions, irrespective of the spatial arrangement. Illumination and geometry invariant attention is exploited to generate the reflectance map, while illumination invariant and geometry variant attention is used to compute the shading map.
Enabling physics-based explicit attention allows the network to be trained on a relatively small dataset. Ablation studies show that adding invariant attention improves the performance. Experiments on the Intrinsic In the Wild dataset shows competitive results with competing methods.