Zhi-Qi Cheng

For a comprehensive list of my publications, please visit my Google Scholar profile.

For additional information and access to the code related to publication, please visit my GitHub repositories.

I: Human Rights & Public Safety: Multimedia Analysis & Assessment

Publications in Conference Proceedings and Journals:

Learning spatial awareness to improve crowd counting [pdf] [tool]
Zhi-Qi Cheng†, Jun-Xiu Li†, Qi Dai, Xiao Wu, Alexander G Hauptmann
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019 (Oral Presentation; Acceptance rate: 4.3%)
Rethinking spatial invariance of convolutional networks for object counting [pdf][code] [washington post] [youtube]
Zhi-Qi Cheng, Qi Dai, Hong Li, Jingkuan Song, Xiao Wu, Alexander G Hauptmann
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Contributed to Pulitzer-winning Capitol Riot analysis by Washington Post 2022 )
HDFormer: High-order Directed Transformer for 3D Human Pose Estimation [pdf][code]
Hanyuan Chen†, Jun-Yan He†, Wangmeng Xiang†, Zhi-Qi Cheng†, Wei Liu, Hanbing Liu, Bin Luo, Yifeng Geng, Xuansong Xie
Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI), 2023
KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration [pdf] [code]
Xu Bao†, Zhi-Qi Cheng†^, Jun-Yan He†, Chenyang Li, Wangmeng Xiang, Jingdong Sun, Hanbing Liu, Wei Liu, Bin Luo, Yifeng Geng, and others
Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), 2023
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation [pdf][code]
Hanbing Liu†, Jun-Yan He†^, Zhi-Qi Cheng†^, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, et al.
Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), 2023
Improving the learning of multi-column convolutional neural network for crowd counting [pdf]
Zhi-Qi Cheng†, Jun-Xiu Li†, Qi Dai, Xiao Wu, Jun-Yan He, Alexander G Hauptmann
Proceedings of the 27th ACM International Conference on Multimedia (ACM MM), 2019 (Oral Presentation)
Crossnet: Boosting crowd counting with localization [pdf]
Ji Zhang, Zhi-Qi Cheng, Xiao Wu, Wei Li, Jian-Jun Qiao
Proceedings of the 30th ACM International Conference on Multimedia (ACM MM), 2022
Stacked pooling for boosting scale invariance of crowd counting [pdf] [code]
Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020
DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition [pdf][code]
Jun-Yan He, Xiao Wu, Zhi-Qi Cheng, Zhaoquan Yuan, Yu-Gang Jiang
Neurocomputing, 2021

† Indicates Equal contribution by authors, ^ indicates Mentorship

II: Visual E-commerce: Multimodal Retrieval & Recommendation

Publications in Conference Proceedings and Journals:

Video2shop: Exact matching clothes in videos to online shopping images [pdf][code] [youtube]
Zhi-Qi Cheng, Xiao Wu, Yang Liu, Xian-Sheng Hua
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 (Alibaba's Pailitao Project; 17M daily users; 30M users during 2017 Double 11 Festival)
Video eCommerce: Towards online video advertising [pdf] [youtube]
Zhi-Qi Cheng, Yang Liu, Xiao Wu, Xian-Sheng Hua
Proceedings of the 24th ACM International Conference on Multimedia (ACM MM), 2016 (Oral Presentation; Alibaba's Video eCommerce system; ACM SCF Best Student Paper)
Video eCommerce++: Toward large scale online video advertising [pdf] [youtube]
Zhi-Qi Cheng, Xiao Wu, Yang Liu, Xian-Sheng Hua
IEEE Transactions on Multimedia (IEEE TMM), 19(6), 2017 (Alibaba's Video eCommerce++ system; Recognized by China's Miaozi Program)
On the selection of anchors and targets for video hyperlinking [pdf]
Zhi-Qi Cheng, Hao Zhang, Xiao Wu, Chong-Wah Ngo
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval (ICMR), 2017: 287-293 (Special Oral Presentation)
Personalized clothing recommendation combining user social circle & fashion style consistency [pdf]
Guang-Lu Sun, Zhi-Qi Cheng, Xiao Wu, Qiang Peng
Multimedia Tools and Applications, 77, 2018: 17731-17754
Multi-view image generation from a single-view [pdf]
Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao Liu, Zequn Jie, Jiashi Feng
Proceedings of the 26th ACM International Conference on Multimedia (ACM MM), 2018 (Oral Presentation)
Generating person images with appearance-aware pose stylizer [pdf] [code]
Siyu Huang, Haoyi Xiong, Zhi-Qi Cheng, Qingzhong Wang, Xingran Zhou, Bihan Wen, Jun Huan, Dejing Dou, and others
Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI), 2020

† Indicates Equal contribution by authors, ^ indicates Mentorship

Publications in Refereed Workshops:

Vireo@ TRECVID 2017: Video-to-text, ad-hoc video search and video hyperlinking [pdf]
Phuong Anh Nguyen, Qing Li, Zhi-Qi Cheng, Yi-Jie Lu, Hao Zhang, Xiao Wu, Chong-Wah Ngo
2017 TREC Video Retrieval Evaluation (TRECVID 2017), 2017 (1st Place in TRECVID LNK 2017 & 2nd Place in TRECVID AVS 2017)
Minimizing risk in video hyperlinking [pdf]
Chong-Wah Ngo, Zhi-Qi Cheng, Xiao Wu
2017 TREC Video Retrieval Evaluation (TRECVID 2017), 2017 (Special Oral Presentation)

Patents:

Determining recommended object [google patents]
Zhi-Qi Cheng, Yang Liu, Xian-Sheng Hua
US Patent US10671851B2, 2020 (Worldwide applications: 2017 - CN, TW; 2018 - WO, EP, US)
A kind of data handling system, method and device [google patents]
Zhi-Qi Cheng, Yang Liu, Xian-Sheng Hua
CN Patent CN107,463,572 B, 2020
A kind of information-pushing method, apparatus, and system [google patents]
Zhi-Qi Cheng, Yang Liu, Xian-Sheng Hua
CN Patent CN107,517,393 B, 2020
Information pushing method, device and system [google patents]
Zhi-Qi Cheng, Yang Liu, Xian-Sheng Hua
CN Patent HK1248437A1, 2018 (Worldwide applications: 2018-HK)

III: Mobility21: Streaming Perception, Detection & Tracking

Publications in Conference Proceedings and Journals:

DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs [pdf] [code]
Jiawen Zhu†, Huayi Tang†, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Shihao Qiu, Shengming Li, Huchuan Lu
Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2024
DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving [pdf] [code]
Jun-Yan He†, Zhi-Qi Cheng†^, Chenyang Li†, Wangmeng Xiang, Binghui Chen, Bin Luo, Yifeng Geng, Xuansong Xie, and others
Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI), 2023 (Recognized as the leading solution for Autonomous Driving Streaming Perception)
Procontext: Exploring progressive context transformer for tracking [pdf] [code]
Jin-Peng Lan†, Zhi-Qi Cheng†, Jun-Yan He†, Chenyang Li, Bin Luo, Xu Bao, Wangmeng Xiang, Yifeng Geng, Xuansong Xie, and others
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023 (Oral Presentation)
Longshortnet: Exploring temporal and semantic features fusion in streaming perception [pdf] [code]
Chenyang Li†, Zhi-Qi Cheng†, Jun-Yan He†, Pengyu Li, Bin Luo, Hanyuan Chen, Yifeng Geng, Jin-Peng Lan, Xuansong Xie, and others
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment [pdf]
Ji Zhang, Xiao Wu^, Zhi-Qi Cheng^, Qi He, Wei Li
Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), 2023
Real-time semantic segmentation with parallel multiple views feature augmentation [pdf]
Jian-Jun Qiao, Zhi-Qi Cheng, Xiao Wu, Wei Li, Ji Zhang
Proceedings of the 30th ACM International Conference on Multimedia (ACM MM), 2022
Debunking Free Fusion Myth: Online Multi-view Anomaly Detection with Disentangled Product-of-Experts Modeling [pdf]
Hao Wang, Zhi-Qi Cheng, Jingdong Sun, Xin Yang, Xiao Wu, Hongyang Chen, Yan Yang
Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), 2023
Robust Automatic Detection of Traffic Activity [pdf]
Alexander Hauptmann, Lijun Yu, Wenhe Liu, Yijun Qian, Zhi-Qi Cheng, Liangke Gui
Mobility21, Carnegie Mellon University, 2023

† Indicates Equal contribution by authors, ^ indicates Mentorship

IV: Beyond LLM: Multimodal Knowledge-Driven Comprehension

Publications in Conference Proceedings and Journals:

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules [pdf] [code]
Zhi-Qi Cheng, Qi Dai, Siyao Li, Jingdong Sun, Teruko Mitamura, Alexander G Hauptmann
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023
Gsrformer: Grounded situation recognition transformer with alternate semantic attention refinement [pdf] [code]
Zhi-Qi Cheng, Qi Dai, Siyao Li, Teruko Mitamura, Alexander Hauptmann
Proceedings of the 30th ACM International Conference on Multimedia (ACM MM), 2022 (Oral Presentation)
WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models [pdf] [website] [modelscope]
Jun-Yan He, Zhi-Qi Cheng^, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, and others
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Implicit temporal modeling with learnable alignment for video recognition [pdf] [code]
Shuyuan Tu, Qi Dai, Zuxuan Wu, Zhi-Qi Cheng, Han Hu, Yu-Gang Jiang
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023 (Oral Presentation)
Learning to transfer: Generalizable attribute learning with multitask neural model search [pdf]
Zhi-Qi Cheng, Xiao Wu, Siyu Huang, Jun-Xiu Li, Alexander G Hauptmann, Qiang Peng
Proceedings of the 26th ACM International Conference on Multimedia (ACM MM), 2018 (Oral Presentation)
Gnas: A greedy neural architecture search method for multi-attribute learning [pdf]
Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, Alexander Hauptmann
Proceedings of the 26th ACM International Conference on Multimedia (ACM MM), 2018 (Oral Presentation)

† Indicates Equal contribution by authors, ^ indicates Mentorship

Publications in Refereed Workshops:

WordArt Designer API: User-Driven Artistic Typography Synthesis using Large Language Models [pdf] [project] [modelscope]
Jun-Yan He, Zhi-Qi Cheng^, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, and others
NeurIPS Workshop on Machine Learning for Creativity and Design, 2023 (Best Demonstration Award)
Towards Calibrated Robust Fine-Tuning of Vision-Language Models [pdf]
Changdae Oh†, Hyesu Lim†, Mijoo Kim, Jaegul Choo, Alexander Hauptmann, Zhi-Qi Cheng^, Kyungwoo Song^
NeurIPS Workshop on Distribution Shifts, 2023
Perceiving physical equation by observing visual scenarios [pdf]
Siyu Huang†, Zhi-Qi Cheng†, Xi Li, Xiao Wu, Zhongfei Zhang, Alexander Hauptmann
NeurIPS Workshop on Modeling the Physical World, 2018

† Indicates Equal contribution by authors, ^ indicates Mentorship

For a comprehensive list of my publications, please visit my Google Scholar profile.

I: Human Rights & Public Safety: Multimedia Analysis & Assessment

Publications in Conference Proceedings and Journals:

Other Articles and Preprints

II: Visual E-commerce: Multimodal Retrieval & Recommendation

Publications in Conference Proceedings and Journals:

Publications in Refereed Workshops:

Patents:

III: Mobility21: Streaming Perception, Detection & Tracking

Publications in Conference Proceedings and Journals:

IV: Beyond LLM: Multimodal Knowledge-Driven Comprehension

Publications in Conference Proceedings and Journals:

Publications in Refereed Workshops:

Other Articles and Preprints: