Locked History Actions

attachment:parse.py of TermoPL

Attachment 'parse.py'

Download

   1 import getopt
   2 import glob
   3 import os
   4 import sys
   5 import stanza
   6 from stanza.resources.common import load_resources_json
   7 from stanza.utils.conll import CoNLL
   8 
   9 
  10 def parse(input_file, output_file):
  11 	with open(input_file) as f:
  12 		doc = nlp(f.read())
  13 		CoNLL.write_doc2conll(doc, output_file)
  14 
  15 
  16 language = "pl"
  17 use_pretokenized_text = False
  18 options = "hpl:"
  19 long_options = ["help", "pretokenized", "language="]
  20 processors = "tokenize, pos, lemma, depparse"
  21 
  22 try:
  23     opts, args = getopt.getopt(sys.argv[1:], options, long_options)
  24     for opt, arg in opts:
  25         if opt in ("-h", "--help"):
  26             pass
  27         elif opt in ("-p", "--pretokenized"):
  28             use_pretokenized_text = True
  29         elif opt in ("-l", "--language"):
  30             language = arg
  31 except getopt.error as err:
  32     print(f"Error: {str(err)}")
  33     exit(1)
  34 
  35 if args:
  36 	input_files = []
  37 	for arg in args:
  38 		input_files.extend(glob.glob(arg))
  39 
  40 stanza.download(language)
  41 resources = load_resources_json()
  42 if 'ner' in resources[language]:
  43 	processors += ", ner"
  44 
  45 nlp = stanza.Pipeline(lang = language, processors=processors, tokenize_pretokenized=use_pretokenized_text)
  46 for f_in in input_files:
  47 	if os.path.isfile(f_in):
  48 		dir_name, file_name = os.path.split(f_in)
  49 		file_name = os.path.splitext(file_name)[0] + '.conllu'
  50 		f_out = os.path.join(dir_name, file_name)
  51 		parse(f_in, f_out)

Attached Files

To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.
  • [get | view] (2023-03-02 11:42:11, 1347.7 KB) [[attachment:Finnish.zip]]
  • [get | view] (2016-05-24 10:02:09, 1.3 KB) [[attachment:LICENSE.txt]]
  • [get | view] (2021-08-23 11:17:13, 767.6 KB) [[attachment:TermoPL-user-manual.pdf]]
  • [get | view] (2025-01-09 12:27:20, 492.9 KB) [[attachment:TermoPL.jar]]
  • [get | view] (2025-01-09 12:27:31, 8862.5 KB) [[attachment:TermoPL_Mac_OS_X.zip]]
  • [get | view] (2025-01-09 12:27:41, 1378.3 KB) [[attachment:TermoPL_Ubuntu.zip]]
  • [get | view] (2025-01-09 12:27:52, 9575.3 KB) [[attachment:TermoPL_Win64.zip]]
  • [get | view] (2023-03-02 11:41:42, 12456.4 KB) [[attachment:TermoUD-results.zip]]
  • [get | view] (2022-12-03 13:28:40, 9506.0 KB) [[attachment:TermoUD.mp4]]
  • [get | view] (2023-09-09 09:47:21, 92896.5 KB) [[attachment:TermoUD_Mac_OS_X.zip]]
  • [get | view] (2023-09-09 09:47:42, 89665.4 KB) [[attachment:TermoUD_Ubuntu.zip]]
  • [get | view] (2023-09-09 09:48:05, 98153.4 KB) [[attachment:TermoUD_Win64.zip]]
  • [get | view] (2016-06-09 10:26:50, 306.0 KB) [[attachment:article-LREC2016.pdf]]
  • [get | view] (2018-06-21 10:54:33, 2507642.7 KB) [[attachment:data.zip]]
  • [get | view] (2017-07-05 16:17:07, 8771.5 KB) [[attachment:informatyka_terminy.zip]]
  • [get | view] (2025-01-09 12:26:48, 39.3 KB) [[attachment:jars.zip]]
  • [get | view] (2025-01-09 12:31:08, 39.9 KB) [[attachment:languages.zip]]
  • [get | view] (2020-11-30 18:42:34, 1.1 KB) [[attachment:mar_myk_rych_lrec16.bib]]
  • [get | view] (2025-01-09 12:26:58, 1.3 KB) [[attachment:parse.py]]
  • [get | view] (2016-06-09 10:27:18, 646.2 KB) [[attachment:poster-LREC2016.pdf]]
  • [get | view] (2024-06-13 10:50:20, 753.8 KB) [[attachment:resources.zip]]
  • [get | view] (2017-07-05 16:15:59, 3302.7 KB) [[attachment:sem-ptj.pdf]]
  • [get | view] (2025-01-09 12:27:09, 151.3 KB) [[attachment:src.zip]]
  • [get | view] (2020-11-02 20:20:21, 62.0 KB) [[attachment:tagset.pdf]]
  • [get | view] (2018-06-20 19:10:57, 19576.8 KB) [[attachment:warsztaty-dane.zip]]
 All files | Selected Files: delete move to page

You are not allowed to attach a file to this page.