Bio
**Research scientist **
**E-mail: **[email protected]
Research interests: My research focuses on building next-generation spatial intelligence systems that can understand and act in complex environments through multi-modal reasoning. My goal is to enable transformative advances in human-made environments through augmented assistance, robotics, and intelligent automation. To achieve this, I investigate how large-scale 3D scenes can be represented and semantically understood, and how language can be used to interact with these spaces. I am also exploring how to integrate digital twins of human-made environments to extend spatial understanding beyond static perception. Furthermore, I seek to understand how AI systems can dynamically react to changes in the state of the world by leveraging video and other temporal data streams. Finally, I am interested in developing approaches for multi-modal fusion — combining 3D maps, video feeds, digital twins, and operational documents — to achieve continuous, online spatio-temporal reasoning.
Background: Before joining INSAIT, I have completed my PhD at the Computer Vision Lab at ETH Zurich. During my PhD studies I have worked on: improving implicit neural 3D representations of indoor scenes; model-aware 3D eye gaze tracking through weak supervisions; spatially multi-conditional image generation; compact and efficient multi-task learning. Near the end of my PhD studies, I have conducted a research scientist internship at Meta Reality Labs in Zurich, where I have worked on implicit neural representations of dynamic 3D scene. Before my PhD studies, I was a full-time teaching assistant at the University of Belgrade, School of Electrical Engineering, where I have also completed my MSc and BSc studies specializing in signal processing and control theory. I am also a long-time member of the organizing committee of the PSIML summer school on AI in Serbia.
Links & contacts: Linkedin: https://www.linkedin.com/in/nikola-popovic-172252113/ Google scholar: https://scholar.google.com/citations?user=aY2lypgAAAAJ&hl=en
Publications
-
Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding
2026 · IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
-
GaussianWorld: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting
2025 · Conference on Neural Information Processing Systems (Datasets and Benchmarks Track) (NeurIPS 2025)
-
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
2025 · International Conference on Computer Vision (ICCV 2025)
-
Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction
2025 · International Conference on Learning Representations (ICLR 2025)
-
Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions
2023 · International Symposium on Mixed and Augmented Reality (ISMAR 2023)
-
Token-Consistent Dropout For Calibrated Vision Transformers
2023 · International Conference on Image Processing (ICIP 2023)
-
Surface Normal Clustering for Implicit Representation of Manhattan Scenes
2023 · International Conference on Computer Vision (ICCV 2023)