utils
Class CorpusManager

java.lang.Object
  extended by utils.CorpusManager

public class CorpusManager
extends java.lang.Object

Helper for corpus in tei format

Author:
Mateusz Kopec

Constructor Summary
CorpusManager()
           
 
Method Summary
static corpusapi.tei.TEICorpus getCorpusFromConfigFile(java.lang.String configFilePath)
          Loads corpus, given corpus config file path
static void getSampleFromCorpus(corpusapi.tei.TEICorpus c, int textCount, java.lang.String targetPath)
          Samples corpus for a number of texts and saves them in a given directory
static AnnotationStats getSenseStatisticsForCorpus(corpusapi.Corpus corpus, corpusapi.tei.TEISenseInventory dict)
          Calculates gold standard annotation in corpus
static corpusapi.tei.TEICorpus getWypluwkaForDevelopment()
          Gets development part of wypluwka
static corpusapi.tei.TEICorpus getWypluwkaForFinalEvaluation()
          Gets final evaluation part of wypluwka
static void printCorpusStats(corpusapi.tei.TEICorpus corpus)
          Prints some statistics about the corpus
static void splitCorpus(corpusapi.tei.TEICorpus c, float proportion, java.lang.String targetPath1, java.lang.String targetPath2)
          Splits corpus into two
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CorpusManager

public CorpusManager()
Method Detail

getCorpusFromConfigFile

public static corpusapi.tei.TEICorpus getCorpusFromConfigFile(java.lang.String configFilePath)
                                                       throws java.lang.Exception
Loads corpus, given corpus config file path

Parameters:
configFilePath -
Returns:
corpus
Throws:
java.lang.Exception

getWypluwkaForFinalEvaluation

public static corpusapi.tei.TEICorpus getWypluwkaForFinalEvaluation()
                                                             throws java.lang.Exception
Gets final evaluation part of wypluwka

Returns:
corpus
Throws:
java.lang.Exception

getWypluwkaForDevelopment

public static corpusapi.tei.TEICorpus getWypluwkaForDevelopment()
                                                         throws java.lang.Exception
Gets development part of wypluwka

Returns:
corpus
Throws:
java.lang.Exception

getSampleFromCorpus

public static void getSampleFromCorpus(corpusapi.tei.TEICorpus c,
                                       int textCount,
                                       java.lang.String targetPath)
Samples corpus for a number of texts and saves them in a given directory

Parameters:
c - corpus
textCount - number of texts to choose
targetPath - path to save texts

splitCorpus

public static void splitCorpus(corpusapi.tei.TEICorpus c,
                               float proportion,
                               java.lang.String targetPath1,
                               java.lang.String targetPath2)
Splits corpus into two

Parameters:
c - corpus
proportion - should be between 0 and 1
targetPath1 - path to save first part
targetPath2 - path to save second part

getSenseStatisticsForCorpus

public static AnnotationStats getSenseStatisticsForCorpus(corpusapi.Corpus corpus,
                                                          corpusapi.tei.TEISenseInventory dict)
Calculates gold standard annotation in corpus

Parameters:
corpus -
dict - dictionary of senses
Returns:
annotation stats

printCorpusStats

public static void printCorpusStats(corpusapi.tei.TEICorpus corpus)
Prints some statistics about the corpus

Parameters:
corpus -