About This Project
Most manga exists in a single language. It is expensive and time consuming to manually translate manga into other languages. We are developing an AI with the ability to translate Japanese manga into English. The first step towards this future is developing a benchmark data set to train our models. We will purchase 100 volumes of Japanese-English manga pairs and hire human annotators to build this data set.
Ask the ScientistsJoin The Discussion
What is the context of this research?
Comics are one of the most popular storytelling mediums across the world. Since the comics are often published in a single language, the readers who do not read the language need to wait until the original version is translated into other languages. Obviously, the translation step is time-consuming and expensive. Manual translation would require $2,000 for a volume of comics. Because of such a huge cost, only a fraction of published comics has been translated so far.
The automatic comic translation had been a difficult task since it requires both accurate image recognition and machine translation. Recently, we have witnessed rapid progress of the two technologies due to the success of deep learning. This is why we now focus on the problem of automatic comic translation.
What is the significance of this project?
We develop an automatic comic translation system by using state-of-the-art machine translation and image processing techniques. Given an input image of comic, our system automatically detects text areas, recognizes characters, and translates them into another language. Our key idea is to train translation models using comic volumes that have been already translated, by which our system learns how professional translators translate comics. Because the translation pipeline is fully automatic, the runtime of translation is much faster than that of the usual manual translation. Owing to such lower cost, our translation system would have the potential to make readers enjoy more and more masterpieces of comics that have not been translated yet.
What are the goals of the project?
The first step towards building a final system that can translate comics with high accuracy is developing a benchmark dataset of 100 volumes pairs of Japanese- English manga. We will have human annotators extract text from the manga volumes manually to develop this benchmark. The evaluation benchmark is essential to accelerate the development because we need a benchmark that validate the effectiveness of new methods. This dataset also helps us find the best parameters and settings for our AI model.
The proposed budget is used to build the dataset to evaluate the performance of manga translation. We use pairs of Japanese-English manga volumes that have been officially translated. The performance is evaluated by comparing the official English version with the output of our system. The dataset is constructed from 100 volumes with varying categories (e.g., battle, sports, romance) to confirm whether our system can translate any types of manga.
We first need to buy pairs of Japanese-English manga volumes. Japanese and English manga costs around four and six dollars per one volume, respectively. Total cost to buy manga is (100 volumes x $10) = $1,000.
We then ask human annotator to extract text data from manga volumes. Extracting text from one paired volumes costs about $40. Total cost for the annotation is (100 volumes x $40) = $4,000.
We expect to finish creating the benchmark by the end of May. We first buy the manga and gather annotators. In parallel, we develop a tool for annotation. We are going to start annotation at the beginning of May, which we expect to be finished by the end of May. Once we build the benchmark, we will keep using it to validate the effectiveness of our methods. We will report the progress in lab notes when each step finish, and share the results evaluated on the benchmark after creating benchmark.
Mar 11, 2019
Apr 20, 2019
Apr 25, 2019
Develop annotation tool
May 01, 2019
May 31, 2019
Finish creating benchmark
Meet the Team
We are the research team focusing on manga translation at The University of Tokyo. Our team consists of an expert in machine translation (Shonosuke Ishiwatari), computer vision (Ryota Hinami), and an expert engineer with NLP research experience (Kazuhiko Yasuda). We are awarded the innovative technologies 2018 in Japan with the idea of comic translation and are going to exhibit the demo at SXSW 2019.
Ryota Hinami received BE and MS degrees in Information and Communication Engineering from the University of Tokyo in 2014 and 2016, respectively. He is currently a PhD candidate at The University of Tokyo and JSPS Research Fellow (DC2). His research interests include multimedia and computer vision.
Shonosuke Ishiwatari received BE and MS degrees in Information and Communication Engineering from The University of Tokyo in 2014 and 2016, respectively. He is currently a PhD candidate at The University of Tokyo and JSPS Research Fellow (DC2). He is interested in the research area of natural language processing, especially machine translation and description generation.
Kazuhiko Yasuda received BE degree in Information and Communication Engineering from the University of Tokyo in 2017. I am currently a master's student at the Department of Information and Communication Engineering, Graduate School of Information Science and Technology, the University of Tokyo. My research interests include natural language processing, especially adversarial examples.
Nothing posted yet.
- $391Total Donations
- $43.44Average Donation