Deciphering Oracle Bone Language with Diffusion Models
Originating from China's Shang Dynasty approximately 3,000 years ago, the
Oracle Bone Script (OBS) is a cornerstone in the annals of linguistic history,
predating many established writing systems. Despite the discovery of thousands
of inscriptions, a vast expanse of OBS remains undeciphered, casting a veil of
mystery over this ancient language. The emergence of modern AI technologies
presents a novel frontier for OBS decipherment, challenging traditional NLP
methods that rely heavily on large textual corpora, a luxury not afforded by
historical languages. This paper introduces a novel approach by adopting image
generation techniques, specifically through the development of Oracle Bone
Script Decipher (OBSD). Utilizing a conditional diffusion-based strategy, OBSD
generates vital clues for decipherment, charting a new course for AI-assisted
analysis of ancient languages. To validate its efficacy, extensive experiments
were conducted on an oracle bone script dataset, with quantitative results
demonstrating the effectiveness of OBSD. Code and decipherment results will be
made available at https://github.com/guanhaisu/OBSD.