jonne sälevä

Home Research Resume Blog

Originally from the Finnish Arctic, I’m currently a Ph.D. student in Computer Science at Brandeis University, working on Natural Language Processing and Machine Learning with Prof. Constantine Lignos.

Prior to Brandeis, I graduated from Harvard College, where I got an A.B. in Statistics with a fair bit of Computer Science, Applied Math and Yiddish thrown into the mix.

Feel free to reach out if you’re interested in collaborating or have questions about my work. research-oriented jobs and postdoc offers are also welcome 😎.


academic and industry positions

research (academic/industry)

teaching (academia)

non-research (industry)


latest news

Organizing SIGTURK workshop at ACL 2024
I’m co-organizing the SIGTURK workshop at ACL 2024. The workshop aims to bring together researchers working on NLP, Linguistics and Turkic languages more generally. Please consider submitting a paper, the deadline is May 31st, 2024!
Paper published at LREC-COLING 2024

Our paper, ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata, was accepted to LREC-COLING 2024. We describe the corpus, which covers over 16.8 million entities in 400+ languages, and experiment with using it as

  • a training corpus for name translation models
  • a supplemental resource for low-resource NER tasks.
The resource is also freely available on Github. Further applications abound, especially in the era of large language models. Happy hacking!
Shared task paper at VarDial 2024
Our team took part in the DSL-ML - Multi-label classification of similar languages shared task at VarDial 2024 and placed 1st on all the languages we participated in! Big thanks to Chester Palen-Michel for collaborating on this with me!
Internship at Google DeepMind
This summer/fall, I’ll be working at Google DeepMind as a Student Researcher, focusing on out-of-distribution detection-style topics. Very excited to be working with LLMs in addition to my low-resource NLP work! Thanks GDM!
Best Paper Award at Insights 2023
Our paper, What changes when you randomly choose BPE merge operations? Not much. was accepted to the Fourth Workshop on Insights from Negative Results in NLP, held in conjunction with EACL 2023.
Extended abstract accepted at SIGTYP 2022
Our extended abstract, ParaNames: A Massively Multilingual Entity Name Corpus was accepted to SIGTYP 2022, held in conjunction with NAACL 2022.
The abstract describes the work our lab has been doing on ParaNames, a multilingual entity name corpus that covers over 400 languages and 18 million entities.
Feel free to also take a look at the preprint and the Github repository.
Organizing a workshop at LREC 2022
I’ll be co-organizing the first Workshop on Dataset Creation for Lower-Resourced Languages held at LREC 2022. Hope to see you there!
Internship at USC Information Sciences Institute
This summer, I’ll be joining the Information Sciences Institute at the University of Southern California as a Visiting Research Assistant.
Looking forward to spending the summer in sunny California!
One paper accepted at Findings of the ACL 2022
Our new position paper, Toward More Meaningful Resources for Lower-resourced Languages, was accepted to Findings of the ACL for ACL 2022.
Recommended reading for anyone working on lower-resourced languages, as well as anyone thinking of using Wikidata or WikiAnn out-of-the-box.
Two workshop papers accepted at EACL 2021
My paper on morphology and low-resource NMT, The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation, was accepted to the Student Research Workshop.
Another paper of mine, Mining Wikidata for Name Resources for African Languages, was also accepted to the AfricaNLP Workshop.
I had a lot of fun interacting with everyone at the virtual poster sessions. Gather.town is not that bad after all!
Deep Learning Summer School at MILA
I’ll be joining the Montreal Institute of Learning Algorithms for Deep Learning & Reinforcement Learning Summer school virtually this year.
Hoping to get up to speed on the latest developments and learn from the best!
Paper accepted at LREC 2020
My paper A Multi-Orthography Parallel Corpus of Yiddish Nouns was accepted at LREC 2020. Sadly, due to COVID, there was no opportunity to present the work in Marseille.
You can still find the paper in the proceedings, though!

get in touch


© Jonne Sälevä, 2024