Rationale

Concept

Pocyr is an innovative application designed to transliterate text into different alphabets. It serves multiple purposes, including entertainment, scientific research, and cost-saving. Cost-saving? Yes, indeed! Our research indicates that using an optimal alphabet can significantly reduce the average number of letters in words.

Currently, the application supports the Polish language and is available for Android devices. Future updates will include additional language and alphabet pairs.

Justification

It’s well-known that written forms of natural languages are not always optimized. A simple proof is that an archived text file often weighs less than its uncompressed version. However, language encoding has a unique aspect that sets it apart from standard compression algorithms: human readability. Simply replacing sound patterns with new symbols can lead to an increase in the number of symbols, rendering the text unreadable.

The most logical approach is to assign one symbol for each sound, at least for the frequently used ones. Surprisingly, this straightforward idea does not apply to most languages, albeit to varying degrees. Polish, for example, has a particularly challenging situation where one sound can be represented by up to three symbols, such as czsz. This issue arises because the Latin alphabet is not well-suited for representing the sounds of Slavic languages. In contrast, the Czech language uses additional letters like š and č to address this issue. Our alternative solution is to use the Cyrillic alphabet for writing the Polish language.

Research

For our research, we utilized various resources and chose one of the most famous Polish books, “The Witcher,” for demonstration.

The book contains a total of 609,763 characters, with 497,388 being words and the rest being punctuation marks, spaces, and numbers. After transliterating the text into Cyrillic, the total number of characters decreased to 576,524, with 464,149 in words. This reduction decreased the average word length from 5.426 to 5.130 characters.

The most significant indicator for us is the 5.5% reduction in the total number of symbols. Although the exact number of printed books is unknown, a conservative estimate based on the sales of the related video game in Poland exceeds one million copies. If these books were printed in their transliterated form, the same million copies would fit on the amount of paper equivalent to 945 thousand non-transliterated books. This would save 55,000 books, equivalent to 12,925,000 book pages or 6,462,000 physical pages.

By rounding the approximate values, we estimate that this change could save around 80 trees. This is just for one book by one author. A complete switch to transliterated printing could potentially save thousands of trees each year.