Important Dates


   Call for Papers

   Call for Workshop Proposals


   Keynote Speakers


   Accepted  Papers

   Camera-ready Submissions

   Paper Submissions

   Technical Program


   Visa Assistance 

   Venue & Accomodation

   General Information


Title: Deep Learning for Vision and Language Reasoning


Understanding visual information along with the natural language appears to be desiderata in recent research communities. Notable efforts have made toward bridging the fields of computer vision and natural language processing, and have opened the door to methods from visual question answering to video-grounded dialogue. However, it is widely accepted that in order to develop truly intelligent AI systems, we need to bridge the gap between perception and cognition. The purpose of this tutorial is to present the history and recent approaches of various vision-and-language reasoning tasks including visual/video question answering, and visual/video dialogue. In this tutorial, we will provide an intuitive explanation of these topics, from the basic building blocks including attention, transformers, to recent trends such as causal learning in detail. Further, this tutorial will also cover recent advances in vision-language pre-training methods based on Transformer architectures, which show state-of-the-art performance in various downstream tasks. The limitations of current approaches are also discussed

Program Schedule

First Part (1h30m)
    ● 1. Visual Question Answering
        ○ 1.1 Attention-based Approaches
        ○ 1.2 Debiasing Approaches
        ○ 1.3 Causal lnference Approaches
    ● 2. Video Question Answering
        ○ 2.1 Single-modal Video Question Answering
        ○ 2.2 Multi-modal Video Question Answering

Second Part (1h30m)
    ● 1. Visual Dialogue
        ○ 1.1 Attention-based Approaches
        ○ 1.2 Co-reference Approaches
        ○ 1.3 Causal Inference Approach
    ● 2. Video-grounded Dialogue
        ○ 2.1 RNN seq-2-seq Approaches
        ○ 2.2 Trasformer Approaches
    ● 3. Vision-language Pre-training


Junyeong Kim

Post-doc Researcher, Korea Advanced Institute of Science and Technology, South Korea

Junyeong Kim is a Post-doc researcher in the Artificial Intelligence and Machine Learning Lab., in the School of Electrical Engineering at Korea Advanced Institute of Science and Technology. He received a B.S. and M.S. and Ph.D. degrees in Electrical Engineering at KAIST, in 2015, 2017, and 2021, respectively. His research interest lies in video-language inferences including video question answering and video-grounded dialogue and vision-language reasoning. He focuses on developing AI agents that can 'observe' and 'converse' as a human does. He has written 5 top-tier conference papers including CVPR, AAAI, ECCV. He received the Outstanding Ph.D. Thesis Award in 2021 from the School of Electrical Engineering, KAIST.
Today's Visits
Total Visits
Address: International Center for Converging Technology, Room 202, Korea University, 145 Anam-ro, Sengbuk-gu, Seoul, 02841, Korea
Tel: +82-2-3290-5920, Business License Number: 302-82-07147

E-mail comments to
Last update: 27, Jan. 2021