Spejd 1.3.6 Copyright (C) IPI PAN, 2007-2012. All rights reserved. Available under the terms of the GNU General Public License; see the COPYING file for details. ABOUT Spejd is a shallow parser, which allows for simultaneous syntactic parsing and morphological disambiguation, developed at the Institute of Computer Science, Polish Academy of Sciences, Warsaw. Spejd homepage: official: http://zil.ipipan.waw.pl/Spejd/ sourceforge (with main bugtracker): http://sourceforge.net/projects/spejd/ Author: Bartosz Zaborowski [bartosz.zaborowski@ipipan.waw.pl] The current implementation is based on ideas and work of: Aleksander Buczyński Aleksander Wawer Adam Przepiórkowski Bartosz Zaborowski Aleksander Zabłocki REQUIREMENTS * Windows version: Windows XP SP3 or newer 40 MB of hard drive space * Linux static binaries: GNU/Linux 2.6.9 or newer (64bit for the 64bit version) 15 MB of hard drive space * Source: POSIX operating system (or POSIX libraries for Windows/MinGW) Either GNU make or some other building environment (you have to set it up manually) C and C++ compiler Dependencies meet (see bellow) Dependencies (in parentheses versions that were tested): POSIX system or Windows with libraries implementing POSIX (at least posix-threads and iconv needed) zlib compression library (1.2.5) ICU unicode library (4.6.1, 4.8) boost library (at least regex and iostreams) >= 1.40 (1.46) (1.42.0 probably has a bug in gzip compressor, don't use it) rt library (for realtime clock) libxmlpp library (2.32.0) morfeusz morphological analyzer library (0.82/20110416), (you can get it from http://sgjp.pl/morfeusz/dopobrania.html) Optional dependencies: google perftools (tcmalloc library) (1.6) - highly recommended pantera tagger (you can get it from http://code.google.com/p/pantera-tagger/) INSTALLATION * Windows binary: The Windows binary version comes with installer. You have to execute it and follow the instructions. * Linux binary: Just unpack the package somewhere, it is ready for use. * Source: Installation from source is similar to many other packages. In the simplest case the shell commands `./configure; make; make install' should configure, build, and install Spejd (the last command may require superuser privileges). For standard installation options see INSTALL file. The Spejd's configure script accepts the following nonstandard options: --enable-static-spejd This option causes the spejd executable to be linked statically. It probably will not be portable if the Spejd is compiled with pantera support (since it installs some other data, not only the library). Disabled by default. --with[out]-pantera With this option you can disable or enable the Pantera tagger support. By default configure will check the system for existence of this package. --with[out]-tcmalloc This option allows to disable custom allocation methods from google perftools. It is optional, but recommended. It can speedup the Spejd by approx. 10% and it gives more reliable memory management/limits. The default is to build with tcmalloc. --with-libicu-prefix=DIR Causes to search for the ICU library in DIR. Precisely: icu-config will be searched in $PATH and also DIR/bin. It will be executed with --prefix=DIR parameter. By default icu-config is searched for in the PATH environment and it will look for library in its default localizations. This option is useful probably only if you have installed ICU in nonstandard localization, like $HOME. Compilation from source on Windows: MinGW: Starting from 1.3.3 the source package is mingw-compatible. After meeting the dependencies you should be able to compile it and install like under UNIX-es. In some cases you have to provide manually: --with-boost-prefix= argument (pointing to place where libboost_*.dll-s are) PKG_CONFIG environment variable set to full path to the pkg-config.exe Since there is no icu-config script in windows icu distribution, currently path to the icu library is hardcoded (/mingw/lib). Other environments: Under Unixes the configure script generates config.h file. Under Windows the building environment must be manually set and special config.h file must be used - it is in the package under the name config.h.win. Just remember to rename it to config.h. BASIC USAGE Spejd is a command line tool. It doesn't have any graphical interface. The basic invocation command is following: spejd [-c ] For more detailed description of usage see doc/getting_started.txt INPUT AND OUTPUT FORMATS Spejd currently can read plain text and XCES and TEI P5 corpus xml formats (as in IPIPAN Corpus and National Corpus of Polish respectively). There is a configuration option for setting input format, however for most cases they can be automatically recognized. Spejd can write output in two formats: XCES and TEI. The output format is set in configuration file. For detailed description of supported formats refer to the doc/manual.pdf file. DOCUMENTATION The main documentation is included in the package in doc/manual.pdf file. The typical usage is documented in doc/getting_started.txt file. Additional documentation can be found in example files (examples directory) in comments. They describe configuration, tagset definition, dictionaries and rules syntax. EXAMPLES See examples directory in the package for an example of config.ini (self explained in comments) and other files including example input. FOR DEVELOPERS Feel free to play around with the sources, modify them and post patches on Spejd's bugtracker at sourceforge. Starting from version 1.1 spejd engine is compiled as a shared library, which can be used from other applications. See src/spejd.h for more information (in windows binary package - devel/spejd.h).