************************************************************************
ENGLISH
************************************************************************

*****************************************
*****************************************
****					  		     ****
****   THE WSDDE 0.64 INSTRUCTION    ****
****							     ****
*****************************************
*****************************************

Word Sense Disambiguation Development Environement (WSDDE)

The current development version of the environment facilitates the construction and evaluation of WSD methods in the supervised Machine Learning (ML) paradigm.



**************************
I. INSTALLATION 
**************************

To install the WSDDE only Java Runtime Environement (at least 1.6) (JRE) is required. Hence, the WSDDE should work on every computer with installed JRE. It was tested on Windows XP, Windows Vista and Ubuntu 9.04. One only needs to unpack the zipped file to use most of the environment.


***************************
II. GETTING STARTED
***************************

The structure of directories after unzipping looks as follows:

/wsdde.jar
/config.ini
/readme.txt
/wsdde.pdf (an article from LTC'09)
/wsdde_lib (external libraries used by the WSDDE)
/doc/gpl.txt (the gpl license)
/source
/resources
	/corpora_raw (a small, hand-annotated corpus and pseudowords corpora)
	/corpora_enriched (corpora enriched with extra information (e.g. POS))
	/experiment_meta_descriptions (some experiment's meta descriptions)
	/experiment_descriptions (some experiment's meta descriptions)
	/results
	/wsdmethods (some WSD methods ready to load and use)


The use of the WSDDE v. 0.64 mainly from the command line is recommended. Command:

> java -jar wsdde.jar

shows available options, which should be passed as the program's parameter(s):

> java -jar wsdde.jar [options]

In case of a no-memory error, one should use the -XmxMEMm (MEM = memory in megabytes) switch for JVM, e.g.

> java -Xmx1000m -jar wsdde.jar [options]

All input files (corpora, experiments' descriptions, etc.) should be UTF-8 encoded.

If you give the output path, all the given directories must exist.

To achieve the full functionality, the WSDDE uses external libraries. All the necessary  libraries are in wsdde_lib directory:


commons-cli.jar (1.2) -- command line
weka.jar (3.6.1)-- algorithms of selection and machine learning

mysql.jar (5.1.10)-- jdbc connector to the MYSQL; it is required only if you use the MYSQL. The WSDDE user should have access to the MYSQL server and provide (config.ini) server address, port, DB name, DB user and password.

poliqarp.jar (1.3.7) -- client library for poliqarpd. It is used only if you want to generate pseudowords corpora.

The numbers in brackets are the versions which are currently used and work properly. In the case of MySQL and Poliqarp only the client-part is bundled with the WSDDE. One should give the servers' settings in config.ini file. The newest version of poliqarp is accessible at sourceforge.net. Corpora which are used by poliqarp server are at nkjp.pl or korpus.pl. 

The environment can also use the TAKIPI tagger (http://nlp.ipipan.waw.pl/TaKIPI/). If you want to use it, you should install TAKIPI and edit config.ini (you should put the path to TaKIPI's executable file).


**********************
STRUCTURE OF XML FILES
**********************

The part of the XML structure (e.g. experiment's description) depends on implemented feature generators, therefore XML schemas must be created dynamically. To obtain the current one, you should use the command line:

> java -jar wsdde.jar -xsd

********************
CREATING YOUR OWN GENERATORS
********************

One should extend the abstract class wsdde.generator.FeatureGenerator. Moreover, the extending class should be in the package wsdde.genearator. You should also put XML schemas (xsd) describing generator's parameters into the wsdde.generator package. It is a good idea to look at built-in feature generators.

In the current version, using extra knowledge sources (e.g. wordnet, shallow parsing) is not supported by the architecture. Of course, you can do it on your own. In the future, when the environment accepts NKJP annotation standards, supporting of extra knowledge sources will be part of the architecture.

***************
USAGE SCENARIO
***************

The following sections describe some example scenarios on how to use WSDDE.

SCENARIO1

1. create a pseudowords corpus

(check poliqarp properites in config.ini, run poliqarp server)

> java -jar wsdde.jar -pws -wc parlament=500 dom=500 -o pws.xml

This generates file named pws.xml with 1000 context of pseudoword parlement-dom (~izba).

2. convert corpus to the enriched format

(check if takipi propeties is configured in config.ini or use -simple option)

> java -jar wsdde.jar -enr -i pws.xml -takipi -o pws.wsdc 

3. split the corpus into train and test corpora

> java -jar wsdde.jar -spl -i pws.wsdc -ss 500 500 -p pws

4. create the experiment's description from the metadescription
(edit meta.desc; to obtain syntax, just run "java -jar wsdde.jar -xsd")

>java -jar wsdde.jar -gmd -i metadesc.xml -o desc.xml

5. conduct the experiment

>java -Xmx1000m -jar wsdde.jar -coe -i desc.xml -o result.xml


