Faculty

Danda Paudel

Tenure-Track Faculty

Research topics: 3D Vision Robotics Space AI Egocentric Vision

Bio

My research focuses on 3D Computer Vision, emphasising scene representation and understanding through spatial intelligence derived from 3D/4D visual data using vision-language models. Additionally, I am also interested in applying computer vision to Robotics and Augmented Reality. Some research topics I am currently working on are:

Real-time mapping and understanding of complex dynamic scenes
Robust spatial reasoning from multi-model and corrupted data
Transfer learning from 2D data and textual sources to 4D understanding
Vision-language models for task planning and Human-AI interaction
Geometrically consistent and physically plausible Visual Generative models

Google Scholar: link

Academic carrier:

Faculty at INSAIT, Sofia University (since 2023)
Lecturer, senior researcher, and post-doctoral fellow at ETH Zurich (since 2016)
Ph.D. in Computer Vision from University of Bourgogne, CNRS, France (2016)
Erasmus Mundus M.Sc. in Computer Vision, University of Bourgogne, France (2012)

Recognitions:

Area Chair at the 38th Annual AAAI Conference on Artificial Intelligence, 2024
A member of the European Laboratory for Learning and Intelligent Systems (ELLIS)
The Best Paper Award by IEEE Computer Society (CVPRW 2020)
A Most Interesting Publication by DeepAI (CVPR 2019)
Best of ICCV 2015 invitation by International Journal of Computer Vision (2016)
Accepted with Travel Support for ICCV Doctoral Consortium, ICCV 2015, Chile
Doctoral Research Scholarship by French National Research Agency (2012–15)
Best Erasmus Master’s Thesis by PAL Robotics on Vibot Day, 2012, Spain
Master Research Scholarship by Conseil régional de Bourgogne, France (2011–12)
Won three engineering competitions during undergraduate study in India
Nepal Aid Scholarship by the Government of India (2005)

Publications

SOVA: Image size agnostic and task driven vision encoder
2026 · International Conference on Machine Learning (ICML 2026)
V^{2}-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Inferring Compositional 4D Scenes without Ever Seeing One
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
FireScope: Wildfire Risk Raster Prediction With a Chain-of-Thought Oracle
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
EgoSound: Benchmarking Sound Understanding in Egocentric Videos
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
ConceptPose: Training-Free Zero-Shot Object Pose Estimation using Concept Vectors
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026) Findings
Rethinking Expressivity and Degradation-Awareness in Attention for All-in-One Blind Image Restoration
2026 · International Conference on Learning Representations (ICLR 2026)
EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark
2026 · International Conference on Learning Representations (ICLR 2026)
Efficient Degradation-agnostic Image Restoration via Channel-Wise Functional Decomposition and Manifold Regularization
2026 · International Conference on Learning Representations (ICLR 2026)
AR-VLA: Autoregressive Action Expert for Vision–Language–Action Models
2026 · Robotics: Science and Systems (RSS 2026)
Unlocking Efficient Vehicle Dynamics Modeling via Analytic World Models
2026 · AAAI Conference on Artificial Intelligence (AAAI 2026)
Autonomous Vehicle Path Planning by Searching With Differentiable Simulation
2026 · AAAI Conference on Artificial Intelligence (AAAI 2026)
StateSpaceDiffuser: Bringing Long Context to Diffusion World Models
2025 · Conference on Neural Information Processing Systems (NeurIPS 2025)
LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation
2025 · Conference on Neural Information Processing Systems (NeurIPS 2025)
GaussianWorld: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting
2025 · Conference on Neural Information Processing Systems (Datasets and Benchmarks Track) (NeurIPS 2025)
Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection
2025 · Conference on Neural Information Processing Systems (NeurIPS 2025)
Split Matching for Inductive Zero-shot Semantic Segmentation
2025 · British Machine Vision Conference (BMVC 2025)
Occam’s LGS: An Efficient Approach for Language Gaussian Splatting
2025 · British Machine Vision Conference (BMVC 2025)
Generalist Robot Manipulation beyond Action Labeled Data
2025 · Conference on Robot Learning (CoRL 2025)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation
2025 · IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation
2025 · IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
XTrack: Multimodal Training Boosts RGB-X Video Object Trackers
2025 · International Conference on Computer Vision (ICCV 2025)
What You Have is What You Track: Adaptive and Robust Multimodal Tracking
2025 · International Conference on Computer Vision (ICCV 2025)
Understanding Museum Exhibits using Vision-Language Reasoning
2025 · International Conference on Computer Vision (ICCV 2025)
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
2025 · International Conference on Computer Vision (ICCV 2025)
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
2025 · International Conference on Computer Vision (ICCV 2025)
ObjectRelator: Enabling Cross-View Object Relation Understanding in Ego-Centric and Exo-Centric Videos
2025 · International Conference on Computer Vision (ICCV 2025)
Low-Light Image Enhancement using Event-Based Illumination Estimation
2025 · International Conference on Computer Vision (ICCV 2025)
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description
2025 · International Conference on Computer Vision (ICCV 2025)
GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond
2025 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)
Exploration-Driven Generative Interactive Environments
2025 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)
Complexity Experts are Task-Discriminative Learners for Any Image Restoration
2025 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness
2025 · IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW 2025)
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
2025 · International Conference on Robotics and Automation (ICRA 2025)
Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction
2025 · International Conference on Learning Representations (ICLR 2025)
A Large-Scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining
2025 · International Conference on 3D Vision (3DV 2025)
Diffusion-Based Particle-DETR for BEV Perception
2025 · Winter Conference on Applications of Computer Vision (WACV 2025)
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community
2025 · AAAI Conference on Artificial Intelligence (AAAI 2025)
Learning Generative Interactive Environments By Trained Agent Exploration
2024 · Conference on Neural Information Processing Systems Workshop (NeurIPSW 2024)
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes
2024 · Conference on Neural Information Processing Systems (NeurIPS 2024)
Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-supervised Learning
2024 · Asian Conference on Computer Vision (ACCV 2024)
Ternary-type Opacity and Hybrid Odometry for RGB-only NeRF-SLAM
2024 · IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)
Ternary-Type Opacity and Hybrid Odometry for RGB NeRF-SLAM
2024 · IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)
Event-Free Moving Object Segmentation from Moving Ego Vehicle
2024 · IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)
Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
2024 · European Conference on Computer Vision (ECCV 2024)
Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
2024 · European Conference on Computer Vision (ECCV 2024)
Single-Model and Any-Modality for Video Object Tracking
2024 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
ExtDM: Dual Distribution Extrapolation Diffusion Model for Video Prediction
2024 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
Continuous Pose for Monocular Cameras in Neural Implicit Representation
2024 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
A Unified and Interpretable Emotion Representation and Expression Generation.
2024 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions
2023 · International Symposium on Mixed and Augmented Reality (ISMAR 2023)
Token-Consistent Dropout For Calibrated Vision Transformers
2023 · International Conference on Image Processing (ICIP 2023)
Surface Normal Clustering for Implicit Representation of Manhattan Scenes
2023 · International Conference on Computer Vision (ICCV 2023)
Source-free Depth for Object Pop-out
2023 · International Conference on Computer Vision (ICCV 2023)
Improving Online Lane Graph Extraction by Object-Lane Clustering
2023 · International Conference on Computer Vision (ICCV 2023)
Deformable Neural Radiance Fields using RGB and Event Cameras
2023 · International Conference on Computer Vision (ICCV 2023)

Danda Paudel

Bio

Publications

Research

Group

Resources