Face Processing in Humans and DCNNs: Comparing the Reliance on Holistic and Local Feature-Based Information
Abstract
It is intensively debated whether Deep Convolutional Neural Networks (DCNNs) constitute appropriate models for human vision. Here, we investigated whether DCNNs show a typical characteristic of human face perception, namely holistic processing. In Experiment 1, we compared unfamiliar face matching performance between a DCNN trained on face recognition and N = 32 human participants for different types of face images: Normal faces (with intact holistic and local feature-based information), Mooney faces (with intact holistic and degraded local feature-based information), and scrambled faces (with intact local feature-based information and degraded holistic information). The DCNN showed significantly larger performance decrements for both Mooney and scrambled faces than human participants. In Experiment 2, we trained three DCNN architectures on face recognition, one with unrestricted field size and two with field sizes restricted to approximately 1/9 and 1/16 of the input image, respectively. Subsequently, we compared unfamiliar face matching performance between these DCNNs and N = 36 human participants who viewed face images either in an unrestricted fashion or through a movable spotlight- like viewing aperture covering approximately 1/9 or 1/16 of the face images. While human face matching accuracy was substantially impaired by restricting the visual input with apertures, DCNN performance was not affected by restriction of the receptive field size. These results suggest that (a) DCNNs are able to achieve high face matching accuracy without using holistic information (b) the reliance of holistic information in DCNNs depends on the specific optimization conditions under which models are trained.
Related articles
Related articles are currently not available for this article.