Jan Hajič

8.2k citations

110 papers · 3.8k indexed · 3 hit papers · h-index 29

Impact in

Artificial Intelligence top 0.2%
- Natural Language Processing Techniques
- Topic Modeling
- Text Readability and Simplification
- Speech and dialogue systems
- Semantic Web and Ontologies
- Speech Recognition and Synthesis
- Advanced Text Analysis Techniques
Language and Linguistics top 2%

Papers in

Artificial Intelligence 98
- Natural Language Processing Techniques 93
- Topic Modeling 55
- Semantic Web and Ontologies 18
- Speech and dialogue systems 10
- Text Readability and Simplification 10
- Speech Recognition and Synthesis 8
Language and Linguistics 12
- Lexicography and Language Studies 10

Co-authors: Ryan McDonald Milan Straka Jun’ichi Tsujii Jana Straková Kiril Ribarov Fernando Pereira Daniel Zeman Filip Ginter
Journals: Language Resources and Evaluation (23 papers)Artificial Intelligence in Medicine (1 paper)Transactions of the Association for Computational Linguistics (1 paper)International Journal of Lexicography (1 paper)Meta Journal des traducteurs (1 paper)
Partner nations: Czechia United States Sweden

In The Last Decade

Jan Hajič

95 papers receiving 3.3k citations

Hit Papers

align trajectories log scale

Universal Dependencies v1: A Multilingual Treebank Collection 2016 · 575 citations

What are hit papers?

Hit papers significantly outperform the citation benchmark for their cohort. A paper qualifies if any of the following hold:

it has ≥500 total citations;
it reaches ≥1.5× the top-1% citation threshold for papers in the same subfield and year (the threshold is the minimum needed to enter the top 1%, not the average within it);
it reaches the top citation threshold in at least one of its specific research topics.

2016 Language Resources and Evaluation
2014 International Conference on Computational Linguistics
2005 Non-projective dependency parsing using spanning tree algorithms

Peers

Countries citing papers authored by Jan Hajič

Since Specialization

Citations

This map shows the geographic impact of Jan Hajič's research. It shows the number of citations coming from papers published by authors working in each country. You can also color the map by specialization and compare the number of citations received by Jan Hajič with the expected number of citations based on a country's size and research output (numbers larger than one mean the country cites Jan Hajič more than expected).

Fields of papers citing papers by Jan Hajič

Since Specialization

Physical SciencesHealth SciencesLife SciencesSocial Sciences

This network shows the impact of papers produced by Jan Hajič. Nodes represent research fields, and links connect fields that are likely to share authors. Colored nodes show fields that tend to cite the papers produced by Jan Hajič. The network helps show where Jan Hajič may publish in the future.

Co-authors

The 25 scholars most cited alongside Jan Hajič, linked wherever they have co-authored with each other. Click a name or a connecting line to browse the papers they share.

Border = papers with Jan Hajič Line = papers co-authored together Jan Hajič links everyone, so they are left out of the graph.

All Works

Sort: Min cites: Since: Top N: Style:

20 of 20 papers shown

#	Work
1	Neural Architectures for Nested NER through Linearization Jana Straková, Milan Straka, Jan Hajič	2019	175
2	Modifications of the Czech Morphological Dictionary for Consistent Corpus Annotation Journal of Linguistics/Jazykovedný casopis ·Jaroslava Hlaváčová, Marie Mikulová, Barbora Štěpánková, Jan Hajič	2019	0
3	Creating a Verb Synonym Lexicon Based on a Parallel Corpus Language Resources and Evaluation ·Zdeňka Urešová, Eva Fučíková, Eva Hajičová, Jan Hajič	2018	1
4	CoNLL 2018 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies Daniel Zeman, Jan Hajič, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, Slav Petrov	2018	96
5	Diacritics Restoration Using Neural Networks. Language Resources and Evaluation ·Jakub Náplava, Milan Straka, Pavel Straňák, Jan Hajič	2018	16
6	Universal Dependencies v1: A Multilingual Treebank Collection Language Resources and Evaluation ·Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, Daniel Zeman Hit paper breakdown →	2016	575
7	QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages Language Resources and Evaluation ·Arantxa Otegi, Nora Aranberri, António Branco, Jan Hajič, Martin Popel, Kiril Simov, Eneko Agirre, Petya Osenova, Rita Pereira, João Silva, Steven L. Neale	2016	7
8	UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing Language Resources and Evaluation ·Milan Straka, Jan Hajič, Jana Straková	2016	201
9	Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition Jana Straková, Milan Straka, Jan Hajič	2014	69
10	Multilingual Test Sets for Machine Translation of Search Queries for Cross-Lingual Information Retrieval in the Medical Domain Language Resources and Evaluation ·Zdeňka Urešová, Jan Hajič, Pavel Pecina, Ondřej Dušek	2014	2
11	Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers International Conference on Computational Linguistics ·Jun’ichi Tsujii, Jan Hajič Hit paper breakdown →	2014	303
12	An Analysis of Annotation of Verb-Noun Idiomatic Combinations in a Parallel Dependency Corpus North American Chapter of the Association for Computational Linguistics ·Zdeňka Urešová, Jan Hajič, Eva Fučíková, Jana Šindlerová	2013	4
13	HamleDT: To Parse or Not to Parse? Language Resources and Evaluation ·Daniel Zeman, David Mareček, Martin Popel, Loganathan Ramasamy, Jan Štěpánek, Zdeněk Žabokrtský, Jan Hajič	2012	32
14	Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task Jan Hajič	2009	4
15	Validating the Quality of Full Morphological Annotation. Language Resources and Evaluation ·Johanka Spoustová, Pavel Pecina, Jan Hajič, Miroslav Spousta	2008	1
16	Issues in annotation of the Czech spontaneous speech corpus in the MALACH project Language Resources and Evaluation ·Josef Psutka, Pavel Ircing, Jan Hajič, Vlasta Radová, Bill Byrne, Samuel Gustman	2004	7
17	Annotation Lexicons: Using the Valency Lexicon for Tectogrammatical Annotation. The Prague Bulletin of Mathematical Linguistics ·Jan Hajič, Václav Honetschläger	2003	2
18	Tagging inflective languages Jan Hajič, Barbora Hladká	1998	2
19	Czech language processing, POS tagging. Language Resources and Evaluation ·Jan Hajič, Barbora Hladká	1998	9
20	RUSLAN Jan Hajič	1987	10

About Jan Hajič

Jan Hajič is a scholar working on Artificial Intelligence, Language and Linguistics, General Social Sciences, Linguistics and Language and Information Systems, having authored 110 papers that have together received 3.8k indexed citations. Recurring topics across this work include Natural Language Processing Techniques (93 papers), Topic Modeling (55 papers), Semantic Web and Ontologies (18 papers), Lexicography and Language Studies (10 papers), Speech and dialogue systems (10 papers), Text Readability and Simplification (10 papers), Speech Recognition and Synthesis (8 papers) and Biomedical Text Mining and Ontologies (7 papers). The work is most often cited by research in Artificial Intelligence (3.6k citations), Language and Linguistics (239 citations), Computer Vision and Pattern Recognition (273 citations), Information Systems (247 citations) and Linguistics and Language (42 citations). Jan Hajič has collaborated with scholars based in Czechia, United States and Sweden. Frequent co-authors include Ryan McDonald, Milan Straka, Jun’ichi Tsujii, Jana Straková, Kiril Ribarov, Fernando Pereira, Daniel Zeman, Filip Ginter, Joakim Nivre and Slav Petrov. Their work appears in journals such as Language Resources and Evaluation, Artificial Intelligence in Medicine, Transactions of the Association for Computational Linguistics, International Journal of Lexicography and Meta Journal des traducteurs.

Rankless uses publication and citation data sourced from OpenAlex, an open and comprehensive bibliographic database. While OpenAlex provides broad and valuable coverage of the global research landscape, it—like all bibliographic datasets—has inherent limitations. These include incomplete records, variations in author disambiguation, differences in journal indexing, and delays in data updates. As a result, some metrics and network relationships displayed in Rankless may not fully capture the entirety of a scholar's output or impact.

Explore authors with similar magnitude of impact