Tom Ko

3.3k citations

44 papers · 1.8k indexed · 3 hit papers · h-index 13

Impact in

Signal Processing top 0.5%
- Speech and Audio Processing
- Music and Audio Processing
Artificial Intelligence top 0.5%
- Speech Recognition and Synthesis
- Natural Language Processing Techniques
- Topic Modeling
- Speech and dialogue systems

Papers in

Signal Processing 23
- Speech and Audio Processing 19
- Music and Audio Processing 13
Artificial Intelligence 42
- Speech Recognition and Synthesis 35
- Natural Language Processing Techniques 19
- Topic Modeling 13
- Speech and dialogue systems 7
- Domain Adaptation and Few-Shot Learning 4

Co-authors: Daniel Povey Sanjeev Khudanpur Vijayaditya Peddinti Michael L. Seltzer Brian Mak David Snyder Qing Li Long Zhou
Journals: Speech Communication (1 paper)IEEE/ACM Transactions on Audio Speech and Language Processing (1 paper)IEEE Transactions on Audio Speech and Language Processing (1 paper)View (1 paper)PolyU Institutional Research Archive (Hong Kong Polytechnic University) (1 paper)
Partner nations: China Hong Kong United States

In The Last Decade

Tom Ko

44 papers receiving 1.6k citations

Hit Papers

align trajectories log scale

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research 2024 · 63 citations

What are hit papers?

Hit papers significantly outperform the citation benchmark for their cohort. A paper qualifies if any of the following hold:

it has ≥500 total citations;
it reaches ≥1.5× the top-1% citation threshold for papers in the same subfield and year (the threshold is the minimum needed to enter the top 1%, not the average within it);
it reaches the top citation threshold in at least one of its specific research topics.

2024 IEEE/ACM Transactions on Audio Speech and Language Processing
2017 A study on data augmentation of reverberant speech for robust speech recognition
2015 Audio augmentation for speech recognition

Peers

Countries citing papers authored by Tom Ko

Since Specialization

Citations

This map shows the geographic impact of Tom Ko's research. It shows the number of citations coming from papers published by authors working in each country. You can also color the map by specialization and compare the number of citations received by Tom Ko with the expected number of citations based on a country's size and research output (numbers larger than one mean the country cites Tom Ko more than expected).

Fields of papers citing papers by Tom Ko

Since Specialization

Physical SciencesHealth SciencesLife SciencesSocial Sciences

This network shows the impact of papers produced by Tom Ko. Nodes represent research fields, and links connect fields that are likely to share authors. Colored nodes show fields that tend to cite the papers produced by Tom Ko. The network helps show where Tom Ko may publish in the future.

Co-authorship network

The 25 scholars most cited alongside Tom Ko, linked wherever they have co-authored with each other. Click a name or a connecting line to browse the papers they share.

Border = papers with Tom Ko Line = papers co-authored together Tom Ko links everyone, so they are left out of the graph.

All Works

Sort: Min cites: Since: Top N: Style:

20 of 20 papers shown

#	Work
1	RepCodec: A Speech Representation Codec for Speech Tokenization Zhichao Huang, Qomi Akit Jauhari, Tom Ko	2024	11
2	Selective Prompting Tuning for Personalized Conversations with LLMs Henry Baer‐Benson, Xubo Liu, Tom Ko, Bo Wu, Wenwu Wang, María Teresa Alejos Juez, Lilian Tang	2024	1
3	WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research IEEE/ACM Transactions on Audio Speech and Language Processing ·Xinhao Mei, Qomi Akit Jauhari, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang Hit paper breakdown →	2024	63
4	Recent Advances in Direct Speech-to-text Translation Xu Chen, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang, Tong Xiao, Jingbo Zhu	2023	13
5	Personalized Dialogue Generation with Persona-Adaptive Attention Proceedings of the AAAI Conference on Artificial Intelligence ·Henry Baer‐Benson, Kayla L. Farquhar, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang, Hao Tang	2023	10
6	CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning Qomi Akit Jauhari, 耀宗江, Tom Ko, Mingxuan Wang, Haizhou Li	2023	2
7	MOSPC: MOS Prediction Based on Pairwise Comparison Kexin Wang, Yunlong Zhao, M. S. Bell, Tom Ko, Mingxuan Wang	2023	2
8	Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention View ·Xubo Liu, M. Shajalal Ahammed, Xinhao Mei, Haohe Liu, Qiuqiang Kong, W Kent Richard, Shengchen Li, Tom Ko, Yu Zhang, Sobhana Alex, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang	2023	11
9	DUB: Discrete Unit Back-translation for Speech Translation Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou	2023	10
10	SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) ·耀宗江, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei	2022	70
11	LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT Interspeech 2022 ·Rui Wang, Longwu Liu, 耀宗江, Long Zhou, 谷秋荣, Zhihua Wei, 貴利清村, Tom Ko, Haizhou Li	2022	22
12	A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis Interspeech 2022 ·Kenneth Odero, Tom Ko, Yu Zhang	2022	3
13	Token-Level Supervised Contrastive Learning for Punctuation Restoration arXiv (Cornell University) ·Anushka Kothari, Tom Ko, Hong Tang, Xubo Liu, Bo Wu	2021	9
14	MetaMix: Improved Meta-Learning with Interpolation-based Consistency Regularization Muhammad Alif Fauzi, Márcio Alves Ribeiro, Tom Ko, Jianping Wang, Qing Li	2021	5
15	Prototypical Networks for Small Footprint Text-Independent Speaker Verification Tom Ko, Muhammad Alif Fauzi, Qing Li	2020	12
16	Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification Rare & Special e-Zone (The Hong Kong University of Science and Technology) ·Joseph E. Matuz, Tom Ko, David Snyder, Brian Mak, Daniel Povey	2018	149
17	Meta Learning for Few-shot Keyword Spotting. arXiv (Cornell University) ·Javier Velaza, Tom Ko, Lifeng Shang, Xiao Dong Chen, Mia Šivak, Qing Li	2018	4
18	Audio augmentation for speech recognition Tom Ko, Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur Hit paper breakdown →	2015	694
19	Eigentriphones for Context-Dependent Acoustic Modeling IEEE Transactions on Audio Speech and Language Processing ·Tom Ko, Brian Mak	2013	4
20	Eigentriphones: A basis for context-dependent acoustic modeling Rare & Special e-Zone (The Hong Kong University of Science and Technology) ·Tom Ko, Brian Mak	2011	6

About Tom Ko

Tom Ko is a scholar working on Signal Processing, Artificial Intelligence, Computer Vision and Pattern Recognition, Human-Computer Interaction and Language and Linguistics, having authored 44 papers that have together received 1.8k indexed citations. Recurring topics across this work include Speech Recognition and Synthesis (35 papers), Speech and Audio Processing (19 papers), Natural Language Processing Techniques (19 papers), Music and Audio Processing (13 papers), Topic Modeling (13 papers), Speech and dialogue systems (7 papers), Domain Adaptation and Few-Shot Learning (4 papers) and Multimodal Machine Learning Applications (2 papers). The work is most often cited by research in Signal Processing (1.3k citations), Artificial Intelligence (1.6k citations), Computer Vision and Pattern Recognition (167 citations), Experimental and Cognitive Psychology (107 citations) and Developmental Biology (8 citations). Tom Ko has collaborated with scholars based in China, Hong Kong and United States. Frequent co-authors include Daniel Povey, Sanjeev Khudanpur, Vijayaditya Peddinti, Michael L. Seltzer, Brian Mak, David Snyder, Qing Li, Long Zhou, Vimal Manohar and Wenwu Wang. Their work appears in journals such as Speech Communication, IEEE/ACM Transactions on Audio Speech and Language Processing, IEEE Transactions on Audio Speech and Language Processing, View and PolyU Institutional Research Archive (Hong Kong Polytechnic University).

Rankless uses publication and citation data sourced from OpenAlex, an open and comprehensive bibliographic database. While OpenAlex provides broad and valuable coverage of the global research landscape, it—like all bibliographic datasets—has inherent limitations. These include incomplete records, variations in author disambiguation, differences in journal indexing, and delays in data updates. As a result, some metrics and network relationships displayed in Rankless may not fully capture the entirety of a scholar's output or impact.

Explore authors with similar magnitude of impact