Abstract: Recognizing interactions in multi-person videos, known as Video Interaction Recognition (VIR), is crucial for understanding video content. Often the human skeleton pose (skeleton, for short) ...