Kasutaja:Ehitaja/A Short Introduction to Estonian Wikiquote

Allikas: Vikitsitaadid

A Short Introduction to Estonian Wikiquote: some additions to the presentation given on the CEE wikiconference on 7 Nov 2021.

What has been going on in Estonian Wikiquote and why?[muuda]

Over the last year and a half, the project has grown beyond any expectations in quantity and quality. The main body of work has been mainly carried out by two co-workers, while the others have been assisting with smaller contributions and technical support. Sizewise, 1,000 pages have become 8,000, and among the Wikiquote language versions, the Estonian version has risen from 35th to the 7th place. Simultaneously, we have placed emphasis on introducing female authors and artists, our neighboring and other smaller cultures, and other lesser-known cultural areas.

Why are we doing this? We started with a quantitative effort at first just because there was a shortage of basically everything and we wanted to see how much we could do to fill the giant hole. The other goals have been added in time to showcase the authors and cultures that are less well known than they should be.

As both main collaborators of the Estonian Wikiquote have a strong theoretical background - a semiotician and a philosopher -, during the practical work we have apparently paid more attention than is customary for Wikiprojects to attempts for establishing a systematic basis that would allow us to act more thoughtfully and efficiently. Therefore, we have also invented new techniques and made various observations that might possibly be also useful in other Wikiprojects.

Where did it all start[muuda]

In the beginning, there was the sordid observation that one of our teachers, Marju Lepajõe, a prominent classical philologist, was missing from the Wikiquote. While correcting the error, we noticed that the whole project was quite abandoned with only a single non-Estonian speaking Finn wandering among the ruins, like a hedgehog in Babylon. Initially we just improvised, gradually added some content and cleaned up some of the trash heap that had accumulated over the years. In doing so, we came up with further ideas of what could be done. We looked at how things have been done in other Wikiquotes and different Estonian quote sites but found that there was not much to imitate, so we established our own practices, building most systems from scratch.

The intensive work really started when one of our colleagues somewhat unexpectedly created a "quote of the day" feature on the front page, so there was a sudden need to fill it with something. We had discussed the idea before and thought that we didn't have the time and energy to produce anything new for each and every day. However, as necessity is the mother of whatever, we discovered that this new task was, actually, manageable. We thought we didn't want to fill the front page with random trivialities, so we tried to come up with a system. The use of authors born on the same day seemed logical. Thus, we kept on the production until we met with the problem of hypercanonization.

Against the canon[muuda]

During the digitization of culture, the cultural canon tends to intensify: digitally, the works of a small number of male authors in major European cultures are gaining an even greater share and position than they did in the analogue culture, while everyone else is becoming increasingly marginalized. In the context of attention, such hypercanonization works in much the same way as the internet celebrity phenomenon, in which some subjects become very famous because they, well, are famous. Those who already have grasped the public attention gain more and more, but women, small cultures, certain genres, themes etc., are left out. Of course, there are some exceptions here - for example, computer games and animated films often tend to acquire a non-intuitive relevance in many Wikiprojects - but the principle generally applies. We have been trying to resist that effect.

We cannot completely avoid hypercanonization because it originates in the culture around us. However, we can change the cultural emphasis in the Wikiquote project, which is determined by our own choice. Putting it simply: we can't and shouldn't avoid making an article about Shakespeare. However, we do not have to put him on the front page, because people already know to look for him anyway. Hence, instead we have been only presenting female authors in the daily quote feature (since the end of 2020, at least). For this, we have created a distinct tool: a calendar covering authors born every day. Kasutaja:Ehitaja/Kalender Unfortunately, there is no good ready-made template for this, so we've had to compile the lists manually. The main problem is that peripheral authors, including many - even successful - female writers, are mostly missing from the entire Wikipedia system, including Wikidata. Also, the quality of the information on Wikipedia date pages is significantly lower than in the actual biographical articles, so even if we don't have to create an entry from scratch, every list entry gathered from the date pages definitely needs to be checked. (There are additional factors enhancing the jolly confusion, like the difference between Julian and Gregorian calendars which have been adopted in different times in different areas, and therefore are often described differently in different Wikiprojects, and occasionally just miscalculated.) We also have the desire and need to use the published Estonian translations of world literature if possible - both because these are what our readers will be looking for, and because to reduce our own workload.

Covering the canon in Wikiquote is necessary, it's the kind of content readers need and seek. However, it is not necessary to highlight the canon separately, as it is already familiar to most people. That's why we have tried to cover the periphery better than it is usually represented in our cultural context: female authors: cultures close to Estonia, such as Latvian, Lithuanian, Finnish, Swedish, Polish and Russian; historical authors forgotten by the canon-tenders; to a lesser extent, also some peripheral genres and sources.

The fundamental principles[muuda]

Fundamentally, our progress has been helped by the specific principles of the Estonian Wikipedia, which are not as widely acknowledged in most Wikiprojects: inclusion and separatism. The first means that very few topics are left out altogether, as usually the problem is thought to lie in the quality of the text rather than in the choice of the topic; the second, that the keywords are specified as precisely as possible, resulting in many articles that have no equivalent in the language versions ​​in which it is customary to write composite articles. As these practices are already familiar for Estonian readers of Wikipedia, we have also adopted these in the Estonian Wikiquote project. For example, we may have separate pages for an activity and its performer, or for related terms that are not synonymous, but also terms that are denoted by the same word in different fields yet compiled into one general article in English Wikipedia.

As an important difference from Wikipedia, though, we recognize that Wikiquote is - to a much greater extent, at least - an ordinary language project. This means that the basic concepts of citations found in fiction and media, which are our main sources, must be discerned differently than they are in professional or scientific terminologies. E.g., there may be several pages on different legal terms in Wikipedia, which can be covered by just one common term in the Wikiquote if most fiction and media we quote does not differ between those exact meanings. Thus, we also need to be more critical of dictionaries and lexicons, as those often provide a definition for a term in a professional jargon only, without taking the other layers of language into account.

Growth and growing pains[muuda]

From the beginning, we have knowingly set ourselves on the path of extensive growth. We have created a systematic category tree, we try to avoid having uncategorized articles, and we limit the expansion of very general categories. At the same time, we are consciously creating a large number of stubs so that we will have a well-developed network of fixed concepts, clarifying the needs for their systematization. At some point, we plan to stop the extensive growth and start supplementing and growing the existing articles instead.

Since we inherited less than a thousand rather poor articles, we have improved them to the best of our ability. However, a number of poorly cited material remains in the project. Extensive growth allows you to control the quality of references as you grow - and we are much more careful about that than most other versions of Wikiquote, almost always using quotations pointing to a specific publication and page or the full publication data of a website - but improving the existing references is a much larger and slower job which we shall take up more thoroughly when the project will have grown enough. As Estonian is a translation-based culture, we pay special attention to different translations and translators in our references.

In the meantime, we're marking all instances of problematic references we find with a hidden category. We have similar hidden categories for other common issues, such as stubs, lack of images and definitions. This will allow such problems to be addressed systematically in the future.

Illustrations as illuminations[muuda]

We prefer to illustrate our articles with works of art - the world is full of great works of visual art, many of which are much more eloquent than a photograph merely documenting a phenomenon. At the same time, Wikiquote readers have an apt opportunity to educate themselves in the field of art history. Whenever possible, we use works of art by female authors to give them greater visibility, because art history is tilted towards men in the same way as other cultural areas. We do not have to try making Vincent van Gogh more famous than he already is, and who's to say that he'd be necessarily better or more important in every way than Giovanna Garzoni, Wegmann, or Valadon? We also keep a separate account of the works of women artists used in Wikiquote: Kasutaja:Pseudacorus/Naisautorite illustratsioonidega lehed. That list currently includes more than 650 female artists. In many cases we've had to start by finding their works somewhere else and then importing them to Commons.

Gender ratio and showcasing[muuda]

In the summer of 2020, after maintaining the daily quote section for a few months, we noticed that women had hardly come up on the front page - and we decided to do something about it, i.e. to make sure that most of the daily quotes would come from women born that day. By the beginning of November, 77.56% of personal articles were about men and 22.44% were about women. In 2021, all the daily quotes have been - and will be - by women. At present (beginning of November 2021), there are 62.2% men and 37.8% women in the Estonian Wikiquotes, so there is still some room for development. (3 pie charts are available for pie lovers here: https://docs.google.com/spreadsheets/d/1cYhXR98gP_NnbFTCkN9KtIL_IniqTRNPkgG1SJaBwOs/edit?usp=sharing ) Should our current trend (ca +15%/yr) persist, the percentage of articles about women would reach 100% by 2026.

Since our 1000th article (Greta Thunberg), we have tried to set prominent women on the round numbers: our 2000th content page was about Doris Lessing, 3000th - Virginia Woolf, 4000th - Toni Morrison, 5000th - Elinor Ostrom, 6000th - Marguerite Duras, 7000th - Wisława Szymborska, and 8000th - Emily Brontë. We are considering who might be suitable for the 10,000th. See also: https://et.wikiquote.org/wiki/Vikitsitaadid:Statistika

Non-triviality and systematicity[muuda]

In all this, these are two aspects that probably need stressing. One of these makes our work harder but raises our morals, and the other makes it easier in the long run.

Reflecting on the ingrained practices in the other Wikiquote language versions, we decided to avoid trivialities. We don't normally include textbook quotes that merely state the facts in a way that should be retold in Wikipedia and add nothing by their way of expression. We also don't encourage the use of non-content-related illustrations with the kind of "inspirational quotes" that look like they've been composed by a Polonius 2.0 quotebot.

Low amount of available workforce enforces sustainability as a goal, so we must use our energy sparingly and efficiently. The best way for that is being systematic in every step of our work process. We digitize huge amounts of literature (focusing on the authors upcoming next month) so it can be easily browsed and searched by keywords. When composing the author pages, we browse the digital texts, extracting the quotes that can then be spread across content pages under the relevant keywords, keeping a minor amount on the author's or work's page and enabling us to constantly cover new keywords that will in turn shape the category system. Thus, every new book grows both our conceptual network in width and the individual pages in depth.

In digitization, we have tried different techniques and found that the most efficient would be a mix of scanning (for the books that are not in a very clear print or which we will need to study in depth for a long time) and phone photography enhanced by computer OCR for books that give reasonably good optical results and don't necessarily need the consistent quality gained by slower scanning. Focusing on one step of the work process at a time yields more results per hour, leaving us more hours for well-earned rest. Win-win!


As a result, we have reached a pace of enlargement that in most months, no other language version can surpass. Although in the material we harvest from the public culture men tend to predominate - they simply speak up more in every field and on every topic - we have gradually increased the proportion of women among the person-based articles. Every day, there is a new female author on the front page of Estonian Wikiquote, and the proportion of women among the authors of illustrations is also steadily growing.

As of today, November 7, 2021, the Estonian Wikiquote with its 8,000 articles is the 7th of all the language versions, but less than 1,000 articles separate us from the 6th position, and practice has shown that writing a 1,000 articles per month needs just a bit of focus.

As said above, in the long run, we do not intend to continue indefinitely with the extensive enlargement. The plan is to expand the concept network so that a sufficiently detailed system will be developed, after which we can focus on depth and density of the content. At the moment, we have paid rather brief attention to the literary canon, but we plan to come back to it. In the course of our work, we have accumulated an extensive digital corpus of literature, which we can mine by keywords. Problematic quote pages are marked separately so that problems can be dealt with efficiently and in a focused way. However, for the most part, all this work is yet to come.