Vision and Video Dynamics Lab

Studying how the visual world
changes over time.

Hyeongmin Lee · SeoulTech, Dept. of Electronic Engineering

Field

Computer Vision

Image · Video

One agenda · four pillars

One question, four levels of abstraction.

Pillar 01
Video Processing
pixels
Pillar 02
Video Understanding
meaning
Pillar 03
3D / 4D Vision
space
Pillar 04
World Models
future
Pillar 01

Video Processing

Low-level video enhancement

Topics

  • Video Frame Interpolation
  • Video Super-Resolution & Enhancement
  • Video Stabilization
  • Video Compression
  • Motion Estimation & Compensation

Enhancement

Low quality Neural Network trained on data Sharp · Clean

Compression

Original 100 MB encode 01001… Bitstream 2 MB decode Recovered 100 MB store stream use case

All tasks reduce to precise modeling of inter-frame motion.

Pillar 01

Video Processing

Low-level video enhancement

Super-Resolution — drag the handle
after before
‹›
Bicubic upscale
Real-ESRGAN

Real-ESRGAN · Wang et al. ICCVW 2021

Pillar 01

Video Processing

Low-level video enhancement

Frame Interpolation

RIFE: Real-Time Intermediate Flow Estimation · Huang et al. ECCV 2022

Pillar 01

Video Processing

Low-level video enhancement

Stabilization

GaVS: 3D-Grounded Video Stabilization · You et al. 2025

Pillar 02

Video Understanding

Temporal semantics & representation learning

Topics

  • Video Foundation Models
  • Video Question Answering
  • Video Retrieval
  • Action Recognition
  • Video–Language Alignment
  • Temporal Representation Learning
Video frames Vision–Language Model "A man is cooking pasta." Meaning · in words

From pixels to meaning — what changes, and why.

Pillar 02 · VLM example

Visual reasoning

Obama scale prank
Question
Why is this image funny?
Success

Obama is playfully pressing down on the scale with his foot to make the man weighing himself appear heavier. The aides nearby are laughing at the prank.

Failure

A group of men in suits are standing in a hallway, and one of them is using a weighing scale.

Same question, two attempts. Modern VLMs can do situational reasoning — but the difference between getting the joke and missing it is exactly what we work on.

Pillar 02 · VLM example

Even simple things break.

VLM counting failure
Failure

The model says 5 fingers — but the emoji shows 4 fingers + 1 thumb. Counting and strict definitions still trip up state-of-the-art VLMs.

Pillar 02 · Why video matters

Glasses going up, or coming down?

Single frame ambiguity
From a single frame

No model — and no human — can tell.

Direction of motion, intent, before-and-after — these only exist between frames.

Image understanding is not video understanding. We need temporal models that reason about change.

Pillar 03

3D / 4D Vision

3D scene reconstruction & rendering

Topics

  • 3D Scene Reconstruction
  • Neural Radiance Fields (NeRF)
  • 3D & 4D Gaussian Splatting
  • Novel View Synthesis
  • Dynamic Scene Reconstruction
Photos (many angles) learn 3D representation (NeRF / Gaussians) render New viewpoint

From 2D observations to 3D — and into time, when scenes move.

Pillar 03

3D / 4D Vision

3D scene reconstruction & rendering

3D Gaussian Splatting

3D Gaussian Splatting for Real-Time Radiance Field Rendering · Kerbl et al. SIGGRAPH 2023

Pillar 03

3D / 4D Vision

3D scene reconstruction & rendering

4D Gaussian Splatting

Fully Explicit Dynamic Gaussian Splatting (Ex4DGS) · Lee et al. NeurIPS 2024

Pillar 04

World Models

Predicting — and interacting with — the visual world

Topics

  • World Foundation Models
  • Interactive Video Generation
  • Action-Conditioned Generation
  • Latent Action Models
  • Physics-Aware Generation
  • Foundation Models for Embodied AI
Observation current scene + Interaction action / control input Diffusion World Model Imagined futures next action · closed loop Robot interacts in its imagination

Don't just predict the world. Act in it, and watch it respond.

Pillar 04

World Models

Generative simulation of the visual world

World Simulation

Genie 3 · DeepMind 2025

Pillar 04

World Models

Generative simulation of the visual world

Why one lab, four pillars

Not four topics.
One question, four scales.

pixels meaning space future

Where this leads

Where you can go after this.

Global tech
google.png
meta.png
ms.png
adobe.png
disney.svg
Korean industry
samsung.svg
lg.webp
hyungdai.svg
naver.svg
kakao.jpeg
skt.svg
kt.png
uplus.png
National research
etri.jpg
kist.svg
add.svg

… and graduate school, startups,
or anywhere visual AI is used.

Why this field

Three reasons it's a great field to be in.

01
Anywhere, anytime

All you need is an idea and a laptop. Research happens wherever you are.

02
Endless frontier

The field moves fast. New questions open up every week — there is always something to chase.

03
From idea to the world, fast

A paper today can be a product people use within months. Your work reaches real users — quickly.

ViViD Lab

Come talk to us.

Research, course questions, or just curious — drop by anytime.

QR code to lab page

Scan

01 / 00
Scroll · ↓ · Space