Faculty
Danda Paudel
Tenure-Track Faculty
Bio
My research focuses on 3D Computer Vision, emphasising scene representation and understanding through spatial intelligence derived from 3D/4D visual data using vision-language models. Additionally, I am also interested in applying computer vision to Robotics and Augmented Reality. Some research topics I am currently working on are:
- Real-time mapping and understanding of complex dynamic scenes
- Robust spatial reasoning from multi-model and corrupted data
- Transfer learning from 2D data and textual sources to 4D understanding
- Vision-language models for task planning and Human-AI interaction
- Geometrically consistent and physically plausible Visual Generative models
Google Scholar: link
Academic carrier:
- Faculty at INSAIT, Sofia University (since 2023)
- Lecturer, senior researcher, and post-doctoral fellow at ETH Zurich (since 2016)
- Ph.D. in Computer Vision from University of Bourgogne, CNRS, France (2016)
- Erasmus Mundus M.Sc. in Computer Vision, University of Bourgogne, France (2012)
Recognitions:
- Area Chair at the 38th Annual AAAI Conference on Artificial Intelligence, 2024
- A member of the European Laboratory for Learning and Intelligent Systems (ELLIS)
- The Best Paper Award by IEEE Computer Society (CVPRW 2020)
- A Most Interesting Publication by DeepAI (CVPR 2019)
- Best of ICCV 2015 invitation by International Journal of Computer Vision (2016)
- Accepted with Travel Support for ICCV Doctoral Consortium, ICCV 2015, Chile
- Doctoral Research Scholarship by French National Research Agency (2012–15)
- Best Erasmus Master’s Thesis by PAL Robotics on Vibot Day, 2012, Spain
- Master Research Scholarship by Conseil régional de Bourgogne, France (2011–12)
- Won three engineering competitions during undergraduate study in India
- Nepal Aid Scholarship by the Government of India (2005)
Publications
-
SOVA: Image size agnostic and task driven vision encoder
2026 · International Conference on Machine Learning (ICML 2026)
-
V^{2}-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
-
SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
-
Inferring Compositional 4D Scenes without Ever Seeing One
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
-
GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
-
FireScope: Wildfire Risk Raster Prediction With a Chain-of-Thought Oracle
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
-
EgoSound: Benchmarking Sound Understanding in Egocentric Videos
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
-
ConceptPose: Training-Free Zero-Shot Object Pose Estimation using Concept Vectors
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
-
Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
-
Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026) Findings
-
Rethinking Expressivity and Degradation-Awareness in Attention for All-in-One Blind Image Restoration
2026 · International Conference on Learning Representations (ICLR 2026)
-
EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark
2026 · International Conference on Learning Representations (ICLR 2026)
-
Efficient Degradation-agnostic Image Restoration via Channel-Wise Functional Decomposition and Manifold Regularization
2026 · International Conference on Learning Representations (ICLR 2026)
-
AR-VLA: Autoregressive Action Expert for Vision–Language–Action Models
2026 · Robotics: Science and Systems (RSS 2026)
-
Unlocking Efficient Vehicle Dynamics Modeling via Analytic World Models
2026 · AAAI Conference on Artificial Intelligence (AAAI 2026)
-
Autonomous Vehicle Path Planning by Searching With Differentiable Simulation
2026 · AAAI Conference on Artificial Intelligence (AAAI 2026)
-
StateSpaceDiffuser: Bringing Long Context to Diffusion World Models
2025 · Conference on Neural Information Processing Systems (NeurIPS 2025)
-
LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation
2025 · Conference on Neural Information Processing Systems (NeurIPS 2025)
-
GaussianWorld: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting
2025 · Conference on Neural Information Processing Systems (Datasets and Benchmarks Track) (NeurIPS 2025)
-
Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection
2025 · Conference on Neural Information Processing Systems (NeurIPS 2025)
-
Split Matching for Inductive Zero-shot Semantic Segmentation
2025 · British Machine Vision Conference (BMVC 2025)
-
Occam’s LGS: An Efficient Approach for Language Gaussian Splatting
2025 · British Machine Vision Conference (BMVC 2025)
-
Generalist Robot Manipulation beyond Action Labeled Data
2025 · Conference on Robot Learning (CoRL 2025)
-
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation
2025 · IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
-
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation
2025 · IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
-
XTrack: Multimodal Training Boosts RGB-X Video Object Trackers
2025 · International Conference on Computer Vision (ICCV 2025)
-
What You Have is What You Track: Adaptive and Robust Multimodal Tracking
2025 · International Conference on Computer Vision (ICCV 2025)
-
Understanding Museum Exhibits using Vision-Language Reasoning
2025 · International Conference on Computer Vision (ICCV 2025)
-
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
2025 · International Conference on Computer Vision (ICCV 2025)
-
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
2025 · International Conference on Computer Vision (ICCV 2025)
-
ObjectRelator: Enabling Cross-View Object Relation Understanding in Ego-Centric and Exo-Centric Videos
2025 · International Conference on Computer Vision (ICCV 2025)
-
Low-Light Image Enhancement using Event-Based Illumination Estimation
2025 · International Conference on Computer Vision (ICCV 2025)
-
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description
2025 · International Conference on Computer Vision (ICCV 2025)
-
GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond
2025 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)
-
Exploration-Driven Generative Interactive Environments
2025 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)
-
Complexity Experts are Task-Discriminative Learners for Any Image Restoration
2025 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)
-
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness
2025 · IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW 2025)
-
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
2025 · International Conference on Robotics and Automation (ICRA 2025)
-
Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction
2025 · International Conference on Learning Representations (ICLR 2025)
-
A Large-Scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining
2025 · International Conference on 3D Vision (3DV 2025)
-
Diffusion-Based Particle-DETR for BEV Perception
2025 · Winter Conference on Applications of Computer Vision (WACV 2025)
-
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community
2025 · AAAI Conference on Artificial Intelligence (AAAI 2025)
-
Learning Generative Interactive Environments By Trained Agent Exploration
2024 · Conference on Neural Information Processing Systems Workshop (NeurIPSW 2024)
-
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes
2024 · Conference on Neural Information Processing Systems (NeurIPS 2024)
-
Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-supervised Learning
2024 · Asian Conference on Computer Vision (ACCV 2024)
-
Ternary-type Opacity and Hybrid Odometry for RGB-only NeRF-SLAM
2024 · IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)
-
Ternary-Type Opacity and Hybrid Odometry for RGB NeRF-SLAM
2024 · IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)
-
Event-Free Moving Object Segmentation from Moving Ego Vehicle
2024 · IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)
-
Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
2024 · European Conference on Computer Vision (ECCV 2024)
-
Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
2024 · European Conference on Computer Vision (ECCV 2024)
-
Single-Model and Any-Modality for Video Object Tracking
2024 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
-
ExtDM: Dual Distribution Extrapolation Diffusion Model for Video Prediction
2024 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
-
Continuous Pose for Monocular Cameras in Neural Implicit Representation
2024 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
-
A Unified and Interpretable Emotion Representation and Expression Generation.
2024 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024)
-
Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions
2023 · International Symposium on Mixed and Augmented Reality (ISMAR 2023)
-
Token-Consistent Dropout For Calibrated Vision Transformers
2023 · International Conference on Image Processing (ICIP 2023)
-
Surface Normal Clustering for Implicit Representation of Manhattan Scenes
2023 · International Conference on Computer Vision (ICCV 2023)
-
Source-free Depth for Object Pop-out
2023 · International Conference on Computer Vision (ICCV 2023)
-
Improving Online Lane Graph Extraction by Object-Lane Clustering
2023 · International Conference on Computer Vision (ICCV 2023)
-
Deformable Neural Radiance Fields using RGB and Event Cameras
2023 · International Conference on Computer Vision (ICCV 2023)