#acl +All:read Default = NKJP1M re-annotated using the Morfeusz SGJP tagset = NKJP1M is a 1 million word manually annotated sub-corpus of the National Corpus of Polish ([[http://clip.ipipan.waw.pl/NationalCorpusOfPolish|NKJP]]). It is the main resource used for training taggers of Polish. Unfortunately, NKJP was annotated according to a tagset, which is somewhat different than the tagset of morphological analyser [[http://morfeusz.sgjp.pl/|Morfeusz SGJP]]. Here, we present NKJP1M-SGJP — a version of NKJP1M re-annotated in accordance with the tagset of Morfeusz SGJP. Thus, taggers can be trained compatible with Morfeusz without any tagset conversion. We intend to maintain this version of the corpus both in terms of correcting errors and keeping it compatible with Morfeusz. NKJP1M-SGJP compatible with current Morfeusz is available at [[http://download.sgjp.pl/morfeusz/current/]]. (Older versions are contained in respective subdirectories of [[http://download.sgjp.pl/morfeusz/]], starting July 2020). The corpus is available as a set of NKJP-TEI XML files (file nkjp1m-sgjp-tei-‹release_date›.tgz) as well as a set of files in a simple column based format used by the tagger [[http://zil.ipipan.waw.pl/Concraft|Concraft-PL]] (see Concraft’s page for format description), which we find easier to use (file nkjp1m-sgjp-dag-‹data›.tgz). The DAG version was prepared with training taggers in mind. For each text in the corpus there are two files named {{{ann_morphosyntax_disamb.dag}}} and {{{ann_morphosyntax_ambig.dag}}}. The disamb files contain complete information and can be used for training. Correct interpretations are marked with {{{disamb}}} in column 12 and non-zero probability in column 8. Interpretations unknown to Morfeusz (added by annotators) are marked {{{manual}}} in column 9 (“interpretation related metadata”). The {{{nps}}} (no preceding space) marker of NKJP is present in column 11 (“segment related metadata”). The ambig files can be used for fair testing: they do not contain {{{disamb}}} marks. Moreover, and all manual interpretations and manual segmentation variants were stripped from these files. Thus, an ideal tagger, when given an “ambig” file, should produce a sequence of interpretations as in the “disamb” file. NKJP1M-SGJP is available under [[https://creativecommons.org/licenses/by/4.0/deed.pl|Creative Commons Attribution (CC-BY)]], since this is the license of NKJP1M. Preparation and maintenance of this resource is possible thanks to [[http://clarin-pl.eu/|CLARIN-PL]].