I’m a first-year PhD candidate in the Schwartslab of Huji-NLP at the Hebrew University.

My research focuses on leveraging compression-optimized tokenizers to boost the efficiency and interpretability of language models while also exploring multimodality to build more robust and scalable systems.

When I’m not delving into cutting-edge NLP challenges, I’m a passionate Tel Aviv fan, a devoted cat lover, a sea enthusiast, and arguably the most enthusiastic reader around.

Tel Aviv Art Decal

Education

2023—Present

Hebrew University of Jerusalem

MSc. in Computer Science

Advisor: Prof. Roy Schwartz

Thesis: Detokenization in LLMs

2019—2022

The Open University

BSc in Computer Science

2019—2022

Tel Aviv University

BA. in Economics

2014

Haifa University

BA. in General Studies

Publications

Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models

preprint

Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models

Guy Kaplan*, Michael Toker*, Yuval Reif, Yonatan Belinkov, Roy Schwartz

The study shows that only a few tokens per word capture its core meaning, leaving many redundant tokens that can impair image generation. Removing these redundant tokens improves generation quality, while patching leaked token representations with their isolated versions effectively mitigates semantic leakage.

From Tokens to Words: On the Inner Lexicon of LLMs

ICLR 2025

From Tokens to Words: On the Inner Lexicon of LLMs

Guy Kaplan, Matanel Oren, Yuval Reif, Roy Schwartz

LLMs internally reconstruct full-word representations from sub-word tokens, enabling understanding of out-of-vocabulary words and reducing input length without fine-tuning.

Experience

2023-present

Chief Scientist Officer BRIGHT

Developing model based system for assisting forensics odontologists in identifying human remains

2022-2025

Data Science & SWE Microsoft

Developed novel algorithms for risk score user assessment over Microsoft Defender platform

Summer 2021

SWE Intern Yahoo!

Worked on improving personal recommendations for Yahoo! advertising platform

Portfolio

Social Media based Stock Prediction

PythonPyTorchSHAP

Developed a model to predict stock prices based on social media sentiment analysis

KuaLaLM

PythonAzure Functions

LLM GW for reducing expensive model calls