jonne/yiddish-lrec-2020

A Multi-Orthography Parallel Corpus of Yiddish Nouns

Go to file

j0ma ebe01024de prettify output		2020-06-14 17:33:13 -04:00
data	add final corpus csv	2020-03-13 02:37:23 -04:00
src	clean README, rm unigram_lm.py since it's not used	2020-06-14 17:30:50 -04:00
translit	release	2019-12-02 01:47:49 -05:00
.gitignore	.	2020-02-21 12:04:40 -05:00
experimental_results.sh	prettify output	2020-06-14 17:33:13 -04:00
init.sh	release	2019-12-02 01:47:49 -05:00
LICENSE	Add MIT License	2019-12-02 20:11:22 +00:00
README.md	clean README, rm unigram_lm.py since it's not used	2020-06-14 17:30:50 -04:00
requirements.txt	release	2019-12-02 01:47:49 -05:00

README.md

A Multi-Orthography Parallel Corpus of Yiddish Nouns

This repository hosts the code and LaTeX writeup for the paper A Multi-Orthography Parallel Corpus of Yiddish Nouns.

Transliteration models & experiments

To run the Sequitur models on the train and test data and view the results, simply run ./experimental_results.sh.

For verbose output, there is a eval_verbose.sh in each experiment folder under ./translit.

To run model on unseen data, use apply_to_unseen.sh in each experiment folder. To read from stdin, simply pipe a word per line to ./apply_to_unseen.sh -.

For instance:

# first this
cd translit/rom2yivo

# then this
./apply_to_unseen /path/to/my/romanized_wordlist

# or this
echo "geburtstog" | ./apply_to_unseen.sh -

Contact

To get in touch with the author, visit my website