Options
S<sup>3</sup>DMT-Net: Improving soft sharing based multi-task CNN using task-specific distillation and cross-task interactions
Journal
ACM International Conference Proceeding Series
Date Issued
2021-11-20
Author(s)
Jha, Ankit
Banerjee, Biplab
Chaudhuri, Subhasis
Abstract
We deal with the problem of multi-task learning (MTL) in the context of performing multiple related visual dense prediction tasks from single image inputs. The soft-sharing-based deep MTL Convnets (CNN) have separate networks for each of the tasks with additional constraints on the model parameters. Although such MTL models have shown convincing performances for tasks including semantic segmentation, depth estimation, and surface normal estimation from monocular images, they have two inherent bottlenecks: i) the constraints imposed on such models do not in general leverage the inter-task information which otherwise can boost the joint training, and ii) the performances of the individual task-specific networks are not explicitly optimized. We hypothesize that the MTL performance can be enhanced comprehensively if the aforesaid issues are taken into account in soft-sharing-based MTL models. To this end, we introduce a novel MTL architecture in this paper called S3 DMT-Net, where i) the task-specific networks are trained under the notion of self-distillation, which aims at inheriting the features from the deeper layers into the shallower layers, thus enriching the capacity of the network, and ii) the idea of cross-task interactions are utilized where the feature-maps of the task-specific encoders are shared amongst each other. We validate the model performance on two scenes: indoor (NYUv2 and Mini Taskonomy) and urban (CityScapes and ISPRS), where substantial improvements are recorded for all the tasks over the existing literature.
Subjects