A Multi-Orthography Parallel Corpus of Yiddish Nouns
Go to file
2020-06-14 17:33:13 -04:00
data add final corpus csv 2020-03-13 02:37:23 -04:00
src clean README, rm unigram_lm.py since it's not used 2020-06-14 17:30:50 -04:00
translit release 2019-12-02 01:47:49 -05:00
.gitignore . 2020-02-21 12:04:40 -05:00
experimental_results.sh prettify output 2020-06-14 17:33:13 -04:00
init.sh release 2019-12-02 01:47:49 -05:00
LICENSE Add MIT License 2019-12-02 20:11:22 +00:00
README.md clean README, rm unigram_lm.py since it's not used 2020-06-14 17:30:50 -04:00
requirements.txt release 2019-12-02 01:47:49 -05:00

A Multi-Orthography Parallel Corpus of Yiddish Nouns

This repository hosts the code and LaTeX writeup for the paper A Multi-Orthography Parallel Corpus of Yiddish Nouns.

Transliteration models & experiments

To run the Sequitur models on the train and test data and view the results, simply run ./experimental_results.sh.

For verbose output, there is a eval_verbose.sh in each experiment folder under ./translit.

To run model on unseen data, use apply_to_unseen.sh in each experiment folder. To read from stdin, simply pipe a word per line to ./apply_to_unseen.sh -.

For instance:

# first this
cd translit/rom2yivo

# then this
./apply_to_unseen /path/to/my/romanized_wordlist

# or this
echo "geburtstog" | ./apply_to_unseen.sh -

Contact

To get in touch with the author, visit my website