Where East Meets West: Technical Communication and Usability, October 29-31, 1998, HERMES SoftLab, Ljubljana, Slovenia


Bringing the Linux Documentation Project to Slovenia

Primož Peterlin, Roman Maurer, Marko Samastur and Borut Mrak

Abstract

Linux is presented in the broader context of free software. The idea of free software is described, and the need for free documentation matching free software is explained. Linux Documentation Project is an example of free software, aiming to provide complete documentation for the Linux operating system. Among hundreds of volunteers there is also a small group from Slovenia involved in the project. In this paper we present our work on making documentation more accessible to users in our community.

Introduction: Linux and Free Software

I believe we should start at the very beginning: the title of our talk, as it itself might already require a short explanation. What is Linux? Linux is a multi-tasking, multi-user, multi-platform POSIX-compliant (i.e. Unix-like) operating system. But since we are not talking about its technical merits right now, we can focus on another of its aspects: it's a free operating system. As the idea of free software usually seems unusual at best to people in general, and in particular to people working in the software industry, we feel the need to spend a few words on the free software idea itself.

What is Free Software?

What people usually notice first is that using free software doesn't require paying for its use. But that's not the end of the story. Microsoft Corp. also doesn't charge (not directly, anyway) for using the Internet Explorer -- the full version is available on their Web site, and everybody can download and use it. Still, Microsoft Internet Explorer is not free software. Because what I get is only the compiled bit-image of the program, not the source code. Thus, regardless of whether I am capable of doing it or not, I am in advance deprived of the possibility to correct any bugs I think I have spotted in the program, to add some new gadget, or to modify it to run on some other platform, and to share such modified version with my friends. Even more, I am not even allowed to share the unmodified bit-image, since the end-user license only allows me to download it and use it on my computer. So we have come to the first conclusion: freedom doesn't equal zero price. Programs we don't have to pay for aren't necessarily free, and vice versa: if we buy free software on a CD-ROM, we usually have to pay a small fee for it.

Is then free software thus the software which comes along with its source code? Again, the answer is not that simple. Availability of source code is indeed required before we can treat some software as free software. It is an important enough requirement that many people prefer to use the term "Open-Source Software" instead of free software, thus emphasizing this requirement, and also making the idea of free software more palatable for the entrepreneurial world. Still, the sole availability of source code is not enough. An example familiar to most students of science and technology in this country is the Numerical Recipes library. The routines are available in source code (and actually also with a clearly written explanation of their operation), yet on the other hand they also come with a very restrictive license. This license explicitly prohibits me to redistribute the source code of any routine from the library, even as part of my own program.

What rights does therefore free software include? In short, you are allowed to use, copy, redistribute, understand, modify and improve the program. In more detail, we can divide these rights into three levels:

  1. The right to read the source code, understand its operation, and adapt it to suit your own needs.
  2. The right to copy and redistribute the program.
  3. The right to improve the program, including the right to redistribute your improved version of the program.

Linux is perhaps enjoying more than its fair share of publicity among the free software, probably because the operating system kernel, written by Linus Torvalds and collaborators, filled in the last gap to make a complete working free operating system possible; yet it's only the tip of the iceberg. Since 1984 the Free Software Foundation, centered around Richard M. Stallman, has been working on the GNU project, aiming to produce an enhanced free plug-in replacement for the UNIX operating system. The X Window System, a free multi-platform windowing system developed at MIT, can also be traced a long way ago, just as the networking software, developed at the University of California at Berkeley. These are only a few major traits from the rich heritage, upon which Linux was built.

With Linux, so it seems, the free software concept has passed the test of the real world with flying colors. Linux has very successfully combined the idea of free software with the rising tide of the Internet in the early 1990's and is today the only non- Microsoft operating system expanding its user base. People worried about the non-existing support for free software may be surprised to learn that InfoWorld awarded the Linux user community the best technical support award for the 1997. Such unorthodox nomination certainly tells a lot about the enthusiasm and the positive atmosphere in the Linux community.

Free Software Needs Free Documentation

The criteria for the free documents are in its essence the same ones as the ones for the free software. Or are they? Is it really essential that people have a general permission to modify, say, this article? I don't believe so. After all, it contains the authors' personal views and opinions.

Still, free software does need to be matched by free documentation. Imagine a situation when a bright, creative programmer finds a way to enhance an existing free program (events like this are happening daily, and are actually fuelling the free software). Now, being one of a well-mannered sort, he -- or she -- would also want to modify the manual in a manner that accurately reflects the action of the modified program. A manual that would disallow any modifications doesn't suit the needs of users of free software, as it would require that the authors of even so slight change in the program behavior write a new manual from scratch.

So, while there is no question that the license must require preservation of the original author's copyright notice, the distribution terms, or the list of authors, it must also allow the modification of the technical content of the manual. There is a general agreement on it also among the authors contributing to the Linux documentation project.

About the Linux Documentation Project

The Linux Documentation Project (LDP) was started in 1992 with the aim of producing good, reliable documentation for the Linux operating system. The documents are covering installing, configuring and using Linux, and are written in a variety of formats: plain text that can be read anywhere, HTML documents can be viewed with a browser, man pages that can be read either online or printed in a book, and typeset documents intended to be printed and read in books.

The LDP is centered around its web page, hosted by SunSite at the University of North Carolina, USA , which is in turn mirrored throughout the world.

There are four basic types of documentation produced by the LDP:

In addition to those, several documents are only available online on the LDP web page. These are:

In total, the yield of LDP so far is: 7 guides in various degrees of completeness from fragmentary to already published, 111 HOWTOs and 131 mini HOWTOs, a handful of FAQs and 34 issues of Linux Gazette.

Table 1: An overview of the translated documentation in the Linux Documentation Project.
language ISO639 code mini-HOWTOs HOWTOs guides
German de 8 25 1
Greek el 18
English en 104 102 7
Spanish es 23 37 4
French fr 80 80 1
Croatian hr 4 7
Indonesian id 2 10
Italian it 13 31 2
Japanese ja 58 61
Korean ko 35 28
Polish pl 28 39
Russian ru 10 1
Slovenian sl 10
Swedish sv 3 19
Turkish tr 23
Chinese zh 31 29

The documents produced by the LDP are primarily written in English (an exception are the "national" HOWTOs, dealing with localizing Linux for a particular locale, which are usually written in the language understood by its target population). In order to make Linux documentation available also to people whose native language is not English, there are, however, several translation projects running in parallel to the main LDP. So far, the documents of LDP have been translated to 15 other languages: Chinese, Croatian, French, German, Greek, Indonesian, Italian, Japanese, Korean, Polish, Russian, Slovenian, Spanish, Swedish and Turkish.

The already mentioned "national" HOWTOs currently exist for 10 languages and locales: Danish, Esperanto, Finnish, French, German, Italian, Polish, Portuguese, Slovenian, and Swedish. In addition, Linux manual pages have so far been translated into Czech, German, Italian, Japanese and Spanish.

Technical Issues of the LDP

Starting as a voluntary project, the LDP was in the beginning gladly accepting any contribution, regardless of the format its authors chose. While the guides' authors usually chose LaTeX, the authors of shorter texts shown more imagination, their preferred choices ranging from plain text, HTML (HyperText Markup Language) to LaTeX. If you add the already existing documentation written in troff macro packages and project GNU's own hypertext format, texinfo, one can imagine that the situation was dangerously close to becoming a complete chaos. So it was relatively early on when it was agreed that a uniform format is needed for the HOWTO documents, which comprise the bulk of the LDP. The solution sought had to provide possibility to produce from a single source various formats, both those meant for online reading, like HTML and GNU info, and those intended to be printed, like LaTeX.

Considering the requirements, SGML (Standard Generalized Markup Language; ISO 8879:1986) was chosen as the standard template format for the documents. Relying on an established standard, it offers maximal independence of the written material on the tools used in its preparation. SGML is not a document format itself. It is a meta-language that allows defining customized markup languages, known as Document Type Definitions (DTD). HTML is without doubt the best known DTD today. HTML was actually conceived at CERN with a similar idea in mind, providing the writer with semantic mark-up tags and offering an opportunity to organize a distributed technical documentation system. The format unfortunately strayed away from this goal during the years. Instead of strengthening its rather frail hypertext structure and relatively weak semantic markup, the vendors pushed it more and more towards visual markup, and bloated it with various gadgets that have little use in technical documentation, but which do require exceedingly complex browsers. This all made HTML unsuitable choice.

The DTD sought had to be a better match for the needs of technical documentation. It had to be simple enough -- about as simple as HTML 2.0 -- in order not to turn the aspiring authors away from learning it, and the whole package had to be reasonably easy to implement. So Matt Welsh, the author of the first version of the Linuxdoc-SGML package, based it on the QWERTZ DTD by Tom Gordon and James Clark sgmls parser (which was in turn based on Charles Goldfarb's arcsgml parser) and. Other people worked on the back- ends: Magnus Alvestad and Helmut Geyer provided the HTML support, and Christian Schwarz added the texinfo interface. Further development of the Linuxdoc-SGML package was conducted by Greg Hankins and later Cees de Groot, who succeeded him as a maintainer. In November 1996 the package was renamed to SGMLtools, in order to emphasize that it is actually a general system for writing technical documentation and not something specific to Linux.

The current production version of SGMLtools (1.0.9) expects its input written conformant to Linuxdoc DTD, descending from QUERTZ DTD. It allows one to produce LaTeX, PostScript, PDF (Portable Document Format), HTML, RTF (Rich Text Format), GNU info, LyX (GUI editor for writing documents in LaTeX and Linuxdoc SGML) and plain text (via groff) from a single source. On October 18, 1998, however, SGMLtools 2.0 was released, supporting a much richer DocBook DTD (v3.0), developed by the Davenport Group, a consortium of specialists in technical documentation. DocBook, now being maintained by the Organization for the Advancement of Structured Information Standards (OASIS) is supported by most SGML vendors. In the one-year transition period, all the documentation will have to be converted into the new format. The new SGMLtools package of course contain tools to facilitate the transition. Still, since the transition requires replacing old-style visual tags with higher-level semantic ones, the first, automatic pass will have to be followed by a second, incremental pass, when the authors will add the semantic features that DocBook supports.

The SGMLtools 2.0 package is employing James Clark's SP SGML parser, a descendant of sgmls parser by the same author and JADE (James' DSSSL Engine), a free implementation of DSSSL (Document Style Semantics and Specification Language; ISO/IEC 10179:1996), again written by James Clark. On the presentation side, DocBook DTD by the Davenport Group is matched by a set of modular DocBook style-sheets by Norm Walsh. The new solution offers two improvements over the old one. First, a vastly richer DTD not only makes it possible to write also longer documents such as LDP guides and thus allows to unify the Linux documentation pool, but also means that LDP has entered the SGML mainstream. A valid DocBook document can be processed on virtually any SGML text processing system. And second, DSSSL, being a high-level Scheme- like language, offers much easier and much more powerful way to control the look of SGML documents than it was possible with simple mapping files. The increased complexity of DTD however also has its drawbacks. One of them is that it has become increasingly difficult to write documents without a specialized editor which would enable the writer to edit mark-up at the logical, or semantic, level. While the psgml mode in the Emacs editor does offer significant help for composing SGML text, it still does not match the capabilities offered by commercial product like ArborText Adept. The team developing SGMLtools is thus discussing the plans for a GUI SGML editor.

LDP and Slovenia

Our first contribution to the LDP was Slovenian-HOWTO (19 pages, 3887 words, 30778 characters), written in 1996, which covers localization of the Linux system and adaptation for use with Slovenian language. However, it was only in the first half of 1998, when this first solitary attempt was followed by forming a Slovenian translating team. The members of the team cooperate by discussing the terminology, cross-proofreading and editing each other's work, all of which increases the quality of translation.

To date, the team has produced a dozen of documents. Prevailing in number are the translations of HOWTO documents:

Along with these, the Linux INFO-SHEET, a short promotional document providing basic information on Linux, including a brief introduction, a list of features, hardware requirements for running it, and a list of relevant resources, was also translated.

In a hope to reduce some of the ever-recurring questions on local Linux forums, two lists of frequently asked questions (FAQ) were also translated:

Our work has so far received very good response from the user side. This is important, as good volunteer work is the best way to attract new volunteers to join the project.

While the translation is still carried out manually, our translation team is considering possibilities of employing modern methods and aids offered by the digital technology.

Towards a Computer-Assisted Translation

The growing collection of translated material presents a wealth of accumulated knowledge that can be utilized to ease further translation. What we have in mind is computer-assisted translation (CAT) using a translation memory system. Translation memory systems build a knowledge base on a set of translated parallel units of text, which can be on demand readily available to the translator. Usefulness of such approach increases as the knowledge base increases, which also amounts to the fact that only increased availability of high-power computers and high-volume storage devices made the method popular during the last decade. Some of the known commercial tools in the CAT market are the IBM Translation Manager, EUROLANG Optimizer, TRADOS Fine Translation Tools, STAR Transit, ATRIL Déja Vu and ZERESTRANS Translation Memory Technology.

The material accumulated so far in the translation of the LDP is particularly suited for the method. First, it is technical writing, with a relatively limited vocabulary and numerous recurring terms, patterns and syntagms. Second, the text is already marked-up using SGML tags, which facilitates segmentation into parallel-running units of texts (usually paragraphs). Research projects like MULTEXT-East have already dealt with this topic. Third, English as the starting language in the translation process simplifies the situation, since one does not have to deal with numerous word forms due to declinations and conjugations.

We therefore consider it feasible to implement in about one year time a translation memory system operating on the following principles:

When using it, a translator could request all instances of some term, phrase or, using some fuzzy matching, even a whole sentence, and, if the term already exists in the base, get in return contextual translations of all instances of the requested term. Since the described translation tool is meant for the interactive use, integration into popular editors like Emacs or LyX should also be considered, as well as its use as a Web tool.

Conclusion

With close to estimated 10 million users, and being the only non- Microsoft operating system expanding its user base in the absolute and relative terms during the recent years, Linux is a phenomenon that cannot go unnoticed. The principles its booming development is founded upon are in many ways challenging the traditional perception of software development.

Documentation has always been an inherently weak point of all volunteer projects. As good documentation is crucial for the success of a project, Linux Documentation Project was conceived to remedy the situation and provide the complete documentation for the Linux operating system. The technical solutions chosen put an emphasis on the independence of written material on the tools used in its preparation, the preference of international open standards to internal "industry" standards, and the possibility to produce documentation in a variety of formats from a single source. Hence, the solution employ SGML and DSSSL.

Slovenia is the smallest language group in an odd dozen nations participating in the project. We consider our participation to be important both for bringing Linux closer to a user in our local community, as well as exercising our language's ability to answer the challenges posed by the increased informational dynamics of the post-industrial era.

References

[1]
DocBook Documentation, http://www.oreilly.com/davenport/
[2]
T. Erjavec, N. Ide, D. Tufis (1997): Encoding and Parallel Alignment of Linguistic Corpora in Six Central and Eastern European Languages. Presented at the Joint International Conference of the ACH-ALLC '97, June 1997.
[3]
Greg Hankins and Michael K. Johnson (1997): Introduction to the Linux Documentation Project; in: R. Kiesling (Ed.), Linux, The Complete Reference, Linux Systems Labs, 1998.
[4]
Robert Kiesling, Ed. (1998): Linux, The Complete Reference, 6th Edition, Linux Systems Labs. ISBN 1-57176-199-3
[5]
Linux User Community win the 1997 Product of the Year Award for Best Technical Support, InfoWorld. http://www.infoworld.com/cgi-bin/displayTC.pl?/97poy.supp.htm
[6]
Organization for the Advancement of Structured Information Standards, http://www.oasis-open.org/
[7]
LUGOS: Prevodi HOWTO-jev, http://www.lugos.si/delo/slo/HOWTO-sl/
[8]
Eric S. Raymond (1997): The Cathedral and the Bazaar, http://www.tuxedo.org/~esr/writings/cathedral-bazaar/
[9]
SGMLtools Homepage, http://www.sgmltools.org/
[10]
Richard Stallman (1994): Why Software Should Not Have Owners, http://www.fsf.org/philosophy/why-free.html
[11]
Richard Stallman (1997): Free Software and Free Manuals, http://www.fsf.org/philosophy/free-doc.html
[12]
The Linux Documentation Project Homepage, http://sunsite.unc.edu/LDP/
[13]
Špela Vintar (1998): Programi s pomnilnikom prevodov s stališča morebitnega uporabnika, Jezikovne tehnologije za slovenski jezik / Mednarodna multi-konferenca Informacijska družba - IS'98, Ljubljana, Slovenija, 8. oktober 1998 / International Multi-conference Information Society - IS'98, Ljubljana, Slovenia, October 1998. - Ljubljana : Institut Jožef Stefan, 1998. . - str. 87-91.

Created 1998-11-23 by P. Peterlin
Last update $Date: 98/11/25 14:43:54 $ by $Author: peterlin $