For a comprehensive list of my publications, please visit my Google Scholar profile.

For additional information and access to the code related to publication, please visit my GitHub repositories.

I: Human Rights & Public Safety: Multimedia Analysis & Assessment

Publications in Conference Proceedings and Journals:

  • Learning spatial awareness to improve crowd counting [pdf] [tool]
    Zhi-Qi Cheng†, Jun-Xiu Li†, Qi Dai, Xiao Wu, Alexander G Hauptmann
    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019 (Oral Presentation; Acceptance rate: 4.3%)
  • Rethinking spatial invariance of convolutional networks for object counting [pdf][code] [washington post] [youtube]
    Zhi-Qi Cheng, Qi Dai, Hong Li, Jingkuan Song, Xiao Wu, Alexander G Hauptmann
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Contributed to Pulitzer-winning Capitol Riot analysis by Washington Post 2022 )
  • HDFormer: High-order Directed Transformer for 3D Human Pose Estimation [pdf][code]
    Hanyuan Chen†, Jun-Yan He†, Wangmeng Xiang†, Zhi-Qi Cheng†, Wei Liu, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie
    Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI), 2023
  • KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration [pdf] [code]
    Xu Bao†, Zhi-Qi Cheng†^, Jun-Yan He†, Chenyang Li, Wangmeng Xiang, Jingdong Sun, Hanbing Liu, Wei Liu, Bin Luo, Yifeng Geng, and others
    Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), 2023
  • PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation [pdf][code]
    Hanbing Liu†, Jun-Yan He†^, Zhi-Qi Cheng†^, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, et al.
    Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), 2023
  • Improving the learning of multi-column convolutional neural network for crowd counting [pdf]
    Zhi-Qi Cheng†, Jun-Xiu Li†, Qi Dai, Xiao Wu, Jun-Yan He, Alexander G Hauptmann
    Proceedings of the 27th ACM International Conference on Multimedia (ACM MM), 2019 (Oral Presentation)
  • Crossnet: Boosting crowd counting with localization [pdf]
    Ji Zhang, Zhi-Qi Cheng, Xiao Wu, Wei Li, Jian-Jun Qiao
    Proceedings of the 30th ACM International Conference on Multimedia (ACM MM), 2022
  • Stacked pooling for boosting scale invariance of crowd counting [pdf] [code]
    Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann
    Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020
  • DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition [pdf][code]
    Jun-Yan He, Xiao Wu, Zhi-Qi Cheng, Zhaoquan Yuan, Yu-Gang Jiang
    Neurocomputing, 2021

† Indicates Equal contribution by authors, ^ indicates Mentorship

Other Articles and Preprints

  • Tracking with Human-Intent Reasoning [pdf] [website]
    Jiawen Zhu, Zhi-Qi Cheng^, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu^, Yifeng Geng, Xuansong Xie
    arXiv preprint arXiv:2312.17448, 2023
  • Hypergraph transformer for skeleton-based action recognition [pdf][code]
    Yuxuan Zhou, Zhi-Qi Cheng^, Chao Li, Yifeng Geng, Xuansong Xie, Margret Keuper
    arXiv preprint arXiv:2211.09590, 2022
  • Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness [pdf][code]
    Yuxuan Zhou, Zhi-Qi Cheng^, Jun-Yan He, Bin Luo, Yifeng Geng, Xuansong Xie, Margret Keuper
    arXiv preprint arXiv:2305.11468, 2023
  • Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation [pdf][code]
    Hanbing Li†, Wangmeng Xiang†^, Jun-Yan He†, Zhi-Qi Cheng†, Bin Luo, Yifeng Geng, Xuansong Xie
    arXiv preprint arXiv:2309.01365, 2023

† Indicates Equal contribution by authors, ^ indicates Mentorship


II: Visual E-commerce: Multimodal Retrieval & Recommendation

Publications in Conference Proceedings and Journals:

  • Video2shop: Exact matching clothes in videos to online shopping images [pdf][code] [youtube]
    Zhi-Qi Cheng, Xiao Wu, Yang Liu, Xian-Sheng Hua
    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 (Alibaba's Pailitao Project; 17M daily users; 30M users during 2017 Double 11 Festival)
  • Video eCommerce: Towards online video advertising [pdf] [youtube]
    Zhi-Qi Cheng, Yang Liu, Xiao Wu, Xian-Sheng Hua
    Proceedings of the 24th ACM International Conference on Multimedia (ACM MM), 2016 (Oral Presentation; Alibaba's Video eCommerce system; ACM SCF Best Student Paper)
  • Video eCommerce++: Toward large scale online video advertising [pdf] [youtube]
    Zhi-Qi Cheng, Xiao Wu, Yang Liu, Xian-Sheng Hua
    IEEE Transactions on Multimedia (IEEE TMM), 19(6), 2017 (Alibaba's Video eCommerce++ system; Recognized by China's Miaozi Program)
  • On the selection of anchors and targets for video hyperlinking [pdf]
    Zhi-Qi Cheng, Hao Zhang, Xiao Wu, Chong-Wah Ngo
    Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval (ICMR), 2017: 287-293 (Special Oral Presentation)
  • Personalized clothing recommendation combining user social circle & fashion style consistency [pdf]
    Guang-Lu Sun, Zhi-Qi Cheng, Xiao Wu, Qiang Peng
    Multimedia Tools and Applications, 77, 2018: 17731-17754
  • Multi-view image generation from a single-view [pdf]
    Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao Liu, Zequn Jie, Jiashi Feng
    Proceedings of the 26th ACM International Conference on Multimedia (ACM MM), 2018 (Oral Presentation)
  • Generating person images with appearance-aware pose stylizer [pdf] [code]
    Siyu Huang, Haoyi Xiong, Zhi-Qi Cheng, Qingzhong Wang, Xingran Zhou, Bihan Wen, Jun Huan, Dejing Dou, and others
    Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), 2020

† Indicates Equal contribution by authors, ^ indicates Mentorship

Publications in Refereed Workshops:

  • Vireo@ TRECVID 2017: Video-to-text, ad-hoc video search and video hyperlinking [pdf]
    Phuong Anh Nguyen, Qing Li, Zhi-Qi Cheng, Yi-Jie Lu, Hao Zhang, Xiao Wu, Chong-Wah Ngo
    2017 TREC Video Retrieval Evaluation (TRECVID 2017), 2017 (1st Place in TRECVID LNK 2017 & 2nd Place in TRECVID AVS 2017)
  • Minimizing risk in video hyperlinking [pdf]
    Chong-Wah Ngo, Zhi-Qi Cheng, Xiao Wu
    2017 TREC Video Retrieval Evaluation (TRECVID 2017), 2017 (Special Oral Presentation)

Patents:

  • Determining recommended object [google patents]
    Zhi-Qi Cheng, Yang Liu, Xian-Sheng Hua
    US Patent US10671851B2, 2020 (Worldwide applications: 2017 - CN, TW; 2018 - WO, EP, US)
  • A kind of data handling system, method and device [google patents]
    Zhi-Qi Cheng, Yang Liu, Xian-Sheng Hua
    CN Patent CN107,463,572 B, 2020
  • A kind of information-pushing method, apparatus, and system [google patents]
    Zhi-Qi Cheng, Yang Liu, Xian-Sheng Hua
    CN Patent CN107,517,393 B, 2020
  • Information pushing method, device and system [google patents]
    Zhi-Qi Cheng, Yang Liu, Xian-Sheng Hua
    CN Patent HK1248437A1, 2018 (Worldwide applications: 2018-HK)

III: Mobility21: Streaming Perception, Detection & Tracking

Publications in Conference Proceedings and Journals:

  • DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs [pdf] [code]
    Jiawen Zhu†, Huayi Tang†, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Shihao Qiu, Shengming Li, Huchuan Lu
    Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2024
  • DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving [pdf] [code]
    Jun-Yan He†, Zhi-Qi Cheng†^, Chenyang Li†, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie, and others
    Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI), 2023 (Recognized as the leading solution for Autonomous Driving Streaming Perception)
  • Procontext: Exploring progressive context transformer for tracking [pdf] [code]
    Jin-Peng Lan†, Zhi-Qi Cheng†, Jun-Yan He†, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie, and others
    Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023 (Oral Presentation)
  • Longshortnet: Exploring temporal and semantic features fusion in streaming perception [pdf] [code]
    Chenyang Li†, Zhi-Qi Cheng†, Jun-Yan He†, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie, and others
    Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
  • Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment [pdf]
    Ji Zhang, Xiao Wu^, Zhi-Qi Cheng^, Qi He, Wei Li
    Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), 2023
  • Real-time semantic segmentation with parallel multiple views feature augmentation [pdf]
    Jian-Jun Qiao, Zhi-Qi Cheng, Xiao Wu, Wei Li, Ji Zhang
    Proceedings of the 30th ACM International Conference on Multimedia (ACM MM), 2022
  • Debunking Free Fusion Myth: Online Multi-view Anomaly Detection with Disentangled Product-of-Experts Modeling [pdf]
    Hao Wang, Zhi-Qi Cheng, Jingdong Sun, Xin Yang, Xiao Wu, Hongyang Chen, Yan Yang
    Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), 2023
  • Robust Automatic Detection of Traffic Activity [pdf]
    Alexander Hauptmann, Lijun Yu, Wenhe Liu, Yijun Qian, Zhi-Qi Cheng, Liangke Gui
    Mobility21, Carnegie Mellon University, 2023

† Indicates Equal contribution by authors, ^ indicates Mentorship


IV: Beyond LLM: Multimodal Knowledge-Driven Comprehension

Publications in Conference Proceedings and Journals:

  • ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules [pdf] [code]
    Zhi-Qi Cheng, Qi Dai, Siyao Li, Jingdong Sun, Teruko Mitamura, Alexander G Hauptmann
    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023
  • Gsrformer: Grounded situation recognition transformer with alternate semantic attention refinement [pdf] [code]
    Zhi-Qi Cheng, Qi Dai, Siyao Li, Teruko Mitamura, Alexander Hauptmann
    Proceedings of the 30th ACM International Conference on Multimedia (ACM MM), 2022 (Oral Presentation)
  • WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models [pdf] [website] [modelscope]
    Jun-Yan He, Zhi-Qi Cheng^, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, and others
    Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
  • Implicit temporal modeling with learnable alignment for video recognition [pdf] [code]
    Shuyuan Tu, Qi Dai, Zuxuan Wu, Zhi-Qi Cheng, Han Hu, Yu-Gang Jiang
    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023 (Oral Presentation)
  • Learning to transfer: Generalizable attribute learning with multitask neural model search [pdf]
    Zhi-Qi Cheng, Xiao Wu, Siyu Huang, Jun-Xiu Li, Alexander G Hauptmann, Qiang Peng
    Proceedings of the 26th ACM International Conference on Multimedia (ACM MM), 2018 (Oral Presentation)
  • Gnas: A greedy neural architecture search method for multi-attribute learning [pdf]
    Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann
    Proceedings of the 26th ACM International Conference on Multimedia (ACM MM), 2018 (Oral Presentation)

† Indicates Equal contribution by authors, ^ indicates Mentorship

Publications in Refereed Workshops:

  • WordArt Designer API: User-Driven Artistic Typography Synthesis using Large Language Models [pdf] [project] [modelscope]
    Jun-Yan He, Zhi-Qi Cheng^, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, and others
    NeurIPS Workshop on Machine Learning for Creativity and Design, 2023 (Best Demonstration Award)
  • Towards Calibrated Robust Fine-Tuning of Vision-Language Models [pdf]
    Changdae Oh†, Hyesu Lim†, Mijoo Kim, Jaegul Choo, Alexander Hauptmann, Zhi-Qi Cheng^, Kyungwoo Song^
    NeurIPS Workshop on Distribution Shifts, 2023
  • Perceiving physical equation by observing visual scenarios [pdf]
    Siyu Huang†, Zhi-Qi Cheng†, Xi Li, Xiao Wu, Zhongfei Zhang, Alexander Hauptmann
    NeurIPS Workshop on Modeling the Physical World, 2018

† Indicates Equal contribution by authors, ^ indicates Mentorship

Other Articles and Preprints:

  • MotionEditor: Editing Video Motion via Content-Aware Diffusion [pdf] [website]
    Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang
    arXiv preprint arXiv:2311.18830, 2023
  • ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval [pdf] [code]
    Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen
    arXiv preprint arXiv:2312.12478, 2023

† Indicates Equal contribution by authors, ^ indicates Mentorship