Chenhui Gou
PhD Student @ Monash University ยท Research Intern @ ByteDance Seed Edge
Monash University
Melbourne, Australia
I am a PhD student at Monash University and a research intern with ByteDance Seed Edge. My work focuses on agentic and multimodal foundation models, especially self-evolving agents, unified multimodal understanding and generation, long-video understanding, and evaluation of multimodal models.
Building AI systems that reason, perceive, and improve.
AI Agents ยท LLMs ยท VLMs ยท Unified Multimodal Models
AI Agents ยท LLMs ยท VLMs ยท Unified Multimodal Models
AI Agents Self-Evolving AI Unified Multimodal Models Long Video Understanding Multimodal Evaluation Efficient MLLMs
news
| Jul 02, 2026 | EdgeBench released: unveiling scaling laws of learning from real-world environments. |
|---|---|
| Mar 01, 2026 | Two papers accepted at CVPR 2026: VQ-VA World and An Empirical Study on How Video-LLMs Answer Video Questions. |
| Jan 01, 2025 | BAGEL (Emerging Properties in Unified Multimodal Pretraining) released. |
๐ experience
2020-12 โ 2021-06
Aibee Inc
Research Intern
Beijing, China
2020
2021-06 โ 2021-12
NIO Inc, Autonomous Driving
Research Intern
Beijing, China
2021
2021-12 โ 2022-08
Baidu, Vision Technology Department
Research Intern
Beijing, China
2021
2022-08 โ 2022-10
University of Technology Sydney
Research Project
Sydney, Australia
2022
2022-07 โ 2023-07
Australian National University
Research Project
Canberra, Australia
2022
2022-12 โ 2023-03
Sensetime Inc
Research Intern
ShenZhen, China
2022
2023-06 โ 2024-12
Vision-CAIR Group, KAUST
Research Intern
2023
2024-12 โ 2025-10
ByteDance Seed VLM-BAGEL Group
Research Intern
2024
2025-10 โ Now
ByteDance Seed Edge
Research Intern
2025
โ drag to explore โ
selected publications
For a full list, please refer to Google Scholar or the publications page.
- Tech Report
- CVPRVQ-VA World: Towards High-Quality Visual Question-Visual AnsweringIn IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
, 2026
- CVPRDrVideo: Document Retrieval Based Long Video UnderstandingIn IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
- CVPRAn Empirical Study on How Video-LLMs Answer Video QuestionsIn IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
- Preprint
- NeurIPSRTFormer: Efficient Design for Real-Time Semantic Segmentation with TransformerIn Advances in Neural Information Processing Systems (NeurIPS). Spotlight Presentation , 2022