JHU logo

Johns Hopkins researchers, including several affiliated with the Johns Hopkins Data Science and AI Institute, will present their research at the IEEE/CVF Computer Vision and Pattern Recognition Conference, to be held Wednesday, June 11, through Sunday, June 15, in Nashville.

The Computer Vision and Pattern Recognition Conference is the premier annual event that showcases advancements in computer vision.

In addition to presenting papers and posters, Johns Hopkins will host a booth at the conference. Attendees are encouraged to stop by booth 1317 to learn more about Johns Hopkins’ transformational investment in the power and promise of data science and AI.

Johns Hopkins-affiliated researchers will present the following:

Papers

A Compound 3D-Informed Design Toward Spatially-Intelligent Large Multimodal Models
Wufei Ma, Luoxin Ye, Nessa McWeeney, Celso M de Melo, Alan Yuille, and Jieneng Chen

Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency
Feng Wang, Timing Yang, Yaodong Yu, Sucheng Ren, Guoyizhe Wei, Angtian Wang, Wei Shao, Yuyin Zhou, Alan Yuille, and Cihang Xie

CamFreeDiff: Camera-Free Image to Panorama Generation With Diffusion Model
Xiaoding Yuan, Shitao Tang, Kejie Li, Alan Yuille, and Peng Wang

Distilling Multi-Modal Large Language Models for Autonomous Driving
Deepti Hegde, Rajeev Yasarla, Hong Cai, Shizhong Han, Apratim Bhattacharyya, Shweta Mahajan, Litian Liu, Risheek Garrepalli, Vishal M. Patel, and Fatih Porikli

Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Bardia Safaei, Faizan Siddiqui, Jiacong Xu, Vishal M. Patel, and Shao-Yuan Lo

Flowing From Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution
Qihao Liu, Xi Yin, Alan Yuille, Andrew Brown, and Mannat Singh

GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration
Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay N. Paranjape, and Vishal M. Patel

Lux Post Facto: Learning Portrait Performance Relighting With Conditional Video Diffusion and a Hybrid Dataset
Yiqun Mei, Mingming He, Li Ma, Julien Philip, Wenqi Xian, David M George, Xueming Yu, Gabriel Dedic, Ahmet Levent Taşel, Ning Yu, Vishal M. Patel, and Paul Debevec

MambaReg: Vision Mamba Also Needs Registers
Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, and Cihang Xie

MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
Reno Kriz, Kate Sanders, David Etter, Kenton Murray, Cameron Carpenter, Hannah Recknor, Jimena Guallar-Blasco, Alexander Martin, Eugene Yang, and Benjamin Van Durme

SINR: Sparsity Driven Compressed Implicit Neural Representations
Dhananjaya Jayasundara, Sudarshan Rajagopalan, Yasiru Ranasinghe, Trac D. Tran, and Vishal M. Patel

SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction
Yutao Tang, Yuxiang Guo, Deming Li, and Cheng Peng

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
Xingrui Wang, Wufei Ma, Tiezheng Zhang, Celso M de Melo, Jieneng Chen, and Alan Yuille

STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing From Text-to-Image Diffusion Models
Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer, Vishal M. Patel, and Karthik Nandakumar

The Power of Context: How Multimodality Improves Image Super-Resolution
Kangfu Mei, Hossein Talebi, Mojtaba Ardakani, Vishal M. Patel, Peyman Milanfar, and Mauricio Delbracio

Towards Zero-Shot Anomaly Detection and Reasoning With Multimodal Large Language Models
Jiacong Xu, Shao-Yuan Lo, Bardia Safaei, Vishal M. Patel, and Isht Dwivedi

Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun Reddy, Alexander Martin, Eugene Yang, Andrew Yates, Kate Sanders, Kenton Murray, Reno Kriz, Celso M de Melo, Benjamin Van Durme, and Rama Chellappa

Poster

MIRE: Matched Implicit Neural Representations
Dhananjaya Jayasundara, Heng Zhao, Demetrio Labate, and Vishal M. Patel

Demo

SimWorld: A World Simulator for Scaling Photorealistic Multi-Agent Interactions
Yan Zhuang, Jiawei Ren, Xiaokang Ye (equal contribution), Xuhong He, Zijun Gao, Ryan Wu, Mrinaal Dogra, Cassie Zhang, Ziqiao Ma, Tianmin Shu, Zhiting Hu, and Lianhui Qin