Vision and Video Dynamics Lab

Studying how the visual world
changes over time.

Hyeongmin Lee · SeoulTech, Dept. of Electronic Engineering

Field

Computer Vision

Image · Video

One agenda · four pillars

One question, four levels of abstraction.

Pillar 01

Video Processing

pixels

Pillar 02

Video Understanding

meaning

Pillar 03

3D / 4D Vision

space

Pillar 04

World Models

future

Pillar 01

Video Processing

Low-level video enhancement

Topics

Video Frame Interpolation
Video Super-Resolution & Enhancement
Video Stabilization
Video Compression
Motion Estimation & Compensation

Enhancement

Compression

All tasks reduce to precise modeling of inter-frame motion.

Pillar 01

Video Processing

Low-level video enhancement

Super-Resolution — drag the handle

after

before

‹›

Bicubic upscale

Real-ESRGAN

Real-ESRGAN · Wang et al. ICCVW 2021

Pillar 01

Video Processing

Low-level video enhancement

Frame Interpolation

RIFE: Real-Time Intermediate Flow Estimation · Huang et al. ECCV 2022

Pillar 01

Video Processing

Low-level video enhancement

Stabilization

GaVS: 3D-Grounded Video Stabilization · You et al. 2025

Pillar 02

Video Understanding

Temporal semantics & representation learning

Topics

Video Foundation Models
Video Question Answering
Video Retrieval
Action Recognition
Video–Language Alignment
Temporal Representation Learning

From pixels to meaning — what changes, and why.

Pillar 02 · VLM example

Visual reasoning

Obama scale prank

Question

Why is this image funny?

✓ Success

Obama is playfully pressing down on the scale with his foot to make the man weighing himself appear heavier. The aides nearby are laughing at the prank.

✗ Failure

A group of men in suits are standing in a hallway, and one of them is using a weighing scale.

Same question, two attempts. Modern VLMs can do situational reasoning — but the difference between getting the joke and missing it is exactly what we work on.

Pillar 02 · VLM example

Even simple things break.

VLM counting failure

✗ Failure

The model says 5 fingers — but the emoji shows 4 fingers + 1 thumb. Counting and strict definitions still trip up state-of-the-art VLMs.

Pillar 02 · Why video matters

Glasses going up, or coming down?

Single frame ambiguity

From a single frame

No model — and no human — can tell.

Direction of motion, intent, before-and-after — these only exist between frames.

Image understanding is not video understanding. We need temporal models that reason about change.

Pillar 03

3D / 4D Vision

3D scene reconstruction & rendering

Topics

3D Scene Reconstruction
Neural Radiance Fields (NeRF)
3D & 4D Gaussian Splatting
Novel View Synthesis
Dynamic Scene Reconstruction

From 2D observations to 3D — and into time, when scenes move.

Pillar 03

3D / 4D Vision

3D scene reconstruction & rendering

3D Gaussian Splatting

3D Gaussian Splatting for Real-Time Radiance Field Rendering · Kerbl et al. SIGGRAPH 2023

Pillar 03

3D / 4D Vision

3D scene reconstruction & rendering

4D Gaussian Splatting

Fully Explicit Dynamic Gaussian Splatting (Ex4DGS) · Lee et al. NeurIPS 2024

Pillar 04

World Models

Predicting — and interacting with — the visual world

Topics

World Foundation Models
Interactive Video Generation
Action-Conditioned Generation
Latent Action Models
Physics-Aware Generation
Foundation Models for Embodied AI

Don't just predict the world. Act in it, and watch it respond.

Pillar 04

World Models

Generative simulation of the visual world

World Simulation

Genie 3 · DeepMind 2025

Pillar 04

World Models

Generative simulation of the visual world

Robot World Model

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos · Gao et al. NVIDIA 2026

Why one lab, four pillars

Not four topics.
One question, four scales.

pixels → meaning → space → future

Where this leads

Where you can go after this.

Global tech

Korean industry

National research

… and graduate school, startups,
or anywhere visual AI is used.

Why this field

Three reasons it's a great field to be in.

01

Anywhere, anytime

All you need is an idea and a laptop. Research happens wherever you are.

02

Endless frontier

The field moves fast. New questions open up every week — there is always something to chase.

03

From idea to the world, fast

A paper today can be a product people use within months. Your work reaches real users — quickly.

ViViD Lab

Come talk to us.

Research, course questions, or just curious — drop by anytime.

hyeongmin.lee@seoultech.ac.kr

vivid.seoultech.ac.kr

QR code to lab page

Scan

01 / 00

Scroll · ↓ · Space