Research

Conventional and learning-based video compression — codecs, rate–distortion, and the boundary between hand-designed and neural pipelines.

Representing complex scenes with multiple media, layouts, and interactions — beyond a single video stream.

Adapting multimedia content to networks, devices, and users — quality of experience under constraints.

Transport, orchestration, and protocols for delivering multimedia at scale.

Pruning, quantization, low-rank methods, and other tools to make deep models small enough to deploy.

Learning on graphs, manifolds, and structured domains — where the geometry of the data shapes the architecture.

Joint models for vision, language, and beyond — alignment, fusion, and grounded reasoning.