Can AI translate manga?

$391
Raised of $5,000 Goal
8%
Ended on 4/10/19
Campaign Ended
  • $391
    pledged
  • 8%
    funded
  • Finished
    on 4/10/19

About This Project

Most manga exists in a single language. It is expensive and time consuming to manually translate manga into other languages. We are developing an AI with the ability to translate Japanese manga into English. The first step towards this future is developing a benchmark data set to train our models. We will purchase 100 volumes of Japanese-English manga pairs and hire human annotators to build this data set.

Ask the Scientists

Join The Discussion

What is the context of this research?

Comics are one of the most popular storytelling mediums across the world. Since the comics are often published in a single language, the readers who do not read the language need to wait until the original version is translated into other languages. Obviously, the translation step is time-consuming and expensive. Manual translation would require $2,000 for a volume of comics. Because of such a huge cost, only a fraction of published comics has been translated so far.

The automatic comic translation had been a difficult task since it requires both accurate image recognition and machine translation. Recently, we have witnessed rapid progress of the two technologies due to the success of deep learning. This is why we now focus on the problem of automatic comic translation.

What is the significance of this project?

We develop an automatic comic translation system by using state-of-the-art machine translation and image processing techniques. Given an input image of comic, our system automatically detects text areas, recognizes characters, and translates them into another language. Our key idea is to train translation models using comic volumes that have been already translated, by which our system learns how professional translators translate comics. Because the translation pipeline is fully automatic, the runtime of translation is much faster than that of the usual manual translation. Owing to such lower cost, our translation system would have the potential to make readers enjoy more and more masterpieces of comics that have not been translated yet.

What are the goals of the project?

The first step towards building a final system that can translate comics with high accuracy is developing a benchmark dataset of 100 volumes pairs of Japanese- English manga. We will have human annotators extract text from the manga volumes manually to develop this benchmark. The evaluation benchmark is essential to accelerate the development because we need a benchmark that validate the effectiveness of new methods. This dataset also helps us find the best parameters and settings for our AI model.

Budget

Please wait...

The proposed budget is used to build the dataset to evaluate the performance of manga translation. We use pairs of Japanese-English manga volumes that have been officially translated. The performance is evaluated by comparing the official English version with the output of our system. The dataset is constructed from 100 volumes with varying categories (e.g., battle, sports, romance) to confirm whether our system can translate any types of manga.

We first need to buy pairs of Japanese-English manga volumes. Japanese and English manga costs around four and six dollars per one volume, respectively. Total cost to buy manga is (100 volumes x $10) = $1,000.

We then ask human annotator to extract text data from manga volumes. Extracting text from one paired volumes costs about $40. Total cost for the annotation is (100 volumes x $40) = $4,000.

Endorsed by

This project is really exciting and very necessary. Development of a new annotation dataset with comic images is critical to progress in the field of machine translation and computer vision. I have been collaborating with Ryota, Shonosuke, Kazuhiko. They are thoughtful and competent scientists able to complete this project.

Project Timeline

We expect to finish creating the benchmark by the end of May. We first buy the manga and gather annotators. In parallel, we develop a tool for annotation. We are going to start annotation at the beginning of May, which we expect to be finished by the end of May. Once we build the benchmark, we will keep using it to validate the effectiveness of our methods. We will report the progress in lab notes when each step finish, and share the results evaluated on the benchmark after creating benchmark.

Mar 11, 2019

Project Launched

Apr 20, 2019

Buy manga

Apr 25, 2019

Develop annotation tool

May 01, 2019

Start annotation

May 31, 2019

Finish creating benchmark

Meet the Team

Ryota Hinami
Ryota Hinami
PhD candidate

Affiliates

The University of Tokyo
View Profile
Shonosuke Ishiwatari
Shonosuke Ishiwatari
PhD candidate

Affiliates

The University of Tokyo
View Profile
Kazuhiko Yasuda
Kazuhiko Yasuda

Affiliates

The University of Tokyo
View Profile

Team Bio

We are the research team focusing on manga translation at The University of Tokyo. Our team consists of an expert in machine translation (Shonosuke Ishiwatari), computer vision (Ryota Hinami), and an expert engineer with NLP research experience (Kazuhiko Yasuda). We are awarded the innovative technologies 2018 in Japan with the idea of comic translation and are going to exhibit the demo at SXSW 2019.

Ryota Hinami

Ryota Hinami received BE and MS degrees in Information and Communication Engineering from the University of Tokyo in 2014 and 2016, respectively. He is currently a PhD candidate at The University of Tokyo and JSPS Research Fellow (DC2). His research interests include multimedia and computer vision.

Shonosuke Ishiwatari

Shonosuke Ishiwatari received BE and MS degrees in Information and Communication Engineering from The University of Tokyo in 2014 and 2016, respectively. He is currently a PhD candidate at The University of Tokyo and JSPS Research Fellow (DC2). He is interested in the research area of natural language processing, especially machine translation and description generation.

Kazuhiko Yasuda

Kazuhiko Yasuda received BE degree in Information and Communication Engineering from the University of Tokyo in 2017. I am currently a master's student at the Department of Information and Communication Engineering, Graduate School of Information Science and Technology, the University of Tokyo. My research interests include natural language processing, especially adversarial examples.

Lab Notes

Nothing posted yet.


Project Backers

  • 9Backers
  • 8%Funded
  • $391Total Donations
  • $43.44Average Donation
Please wait...