[1]Zheng Q, Wang C, Wang D. Bypass network for semantics driven image paragraph captioning[J]. Computer Vision and Image Understanding, 2024, 249: 104154.
[2]Zheng Q, Liu D, Wang C, et al. Esceme: Vision-and-language navigation with episodic scene memory[J]. International Journal of Computer Vision, 2024: 1-21.
[3]Li K, Yu B, Zheng Q, et al. MuEP: A Multimodal Benchmark for Embodied Planning with Foundation Models[C]. IJCAI, 2024.
[4]Zhang H, Liu D, Zheng Q, et al. Modeling video as stochastic processes for fine-grained video representation learning[C]. CVPR, 2023.
[5]Zheng Q, Gong M, You X, et al. A unified B-spline framework for scale-invariant keypoint detection[J]. International Journal of Computer Vision, 2022, 130(3): 777-799.