Аннотация:Automatic deception detection is a challenging issue since human behaviors are too complex to establish any standard behavioral signs that would explicitly indicate that a person is lying. Furthermore, it is difficult to collect naturalistic datasets for supervised learning as both external and self-annotation may be unreliable for deception annotation. For these purposes, we collected the TRuLie dataset that consists of synchronously recorded videos (34 hours in total) and data received from contact photoplethysmography (PPG) and hardware eye-tracker of ninety three subjects who tried to feign innocence during interrogation after they committed mock crimes. Thus, we had multimodal fragments with lie (n=3380) and truth (n=6444). We trained an end-to-end convolutional neural network (CNN) on this dataset to predict lie and truth from audio and video, and also built classifiers on combined features extracted from video, audio, PPG, eye-tracker, and predictions from CNN. The best classifier (LightGBM) showed a mean balanced accuracy of 0.64 and an F1-score of 0.76 on a 5-fold cross-validation.