The Digital Humanities Research Group of Eszterházy Károly University was founded in September 2018. The main objective of the research is to elaborate a standardized methodology for publishing digital scholarly facsimile (image-based) editions of Hungarian manuscripts and early printed Hungarian books, as well as to create and publish such publications (The Foundation Charter of Tihany Abbey, Funeral Sermon and Prayer, Old Hungarian Lament of Mary, Pompéry Codex). In the case of the Corpus of Early Hungarian Printed Books and the Mikes Letters, the aim is to produce the digital scholarly edition that underlies the later image-based edition. We want to apply the current international standards in our digital scholarly editions: the resulting digital objects are XML files that correspond to the proposals of the Text Encoding Initiative. XML can be used to generate arbitrary formats (by XSLT), beyond downloadable e-book formats (e.g. PDF, EPUB) to produce web-based files (HTML). Research is expected to produce the following results:
- clarifying questions about the digitization of Hungarian manuscripts and early printed books, as well as TEI-XML encoding of them
- developing a standardized methodology for TEI encoding and publishing of those sources on the Internet
- publishing the developed methodology in the form of a manual
- publishing the emerging digital scholarly editions on the Internet, linking some of the texts to high-quality images of the sources (digital scholarly facsimile).
It is our intention to make text-image linking at the level of words, which would allow us to develop word-based search engines. Thus, it would be possible not only to have a synoptic reading but to search for the source text so that the results could be displayed on the image.
The central issue of research is the way in which standard, platform-independent encoding and publishing of manuscripts and early printed books is possible. Standardization and platform independence are problems to be solved at several levels. For medieval and early modern manuscripts and printed sources, e.g. even the production of digital text may be hampered by the presence of special graphemes that have no Unicode encoding. The research attempts to solve this issue basically by developing a unified methodology for producing transcriptional layers (paleographic, diplomatic). In part, it relies on international standards (e.g. Medieval Unicode Font Initiative), and in part has to develop its own methods. The next level of standardization and platform independence is the issue of the markup language used in metadata. Although TEI-XML is an international recommendation available in this case, its application is not automatic: the adaptation of TEI encoding for own purposes is also a central problem for research. The next level is the problem of standard and platform-independent publishing: how to generate the resulting digital editions so that they can be used in the same way (in the same form and with the same functionality), irrespective of the display devices. In the case of the latter, our priority is also to ensure responsiveness (i.e. the possibility of using it on mobile devices), as we would like to create the possibility of using in public education (especially in the case of Old Hungarian sources and Mikes Letters).
As a result of our research, a recommendation (standard) can be created that is suitable for the unified digitization of Hungarian manuscripts and early printed books, for the production and publication of their digital scholarly edition or digital scholarly facsimile. Since there are several types of sources in our project, it may be possible to publish these (multi-language texts, codices) digitally, or to publish their image-based facsimile versions. The Old Hungarian (manuscript) texts are available in a wide range of digitized forms on the World Wide Web. However, most of these editions remain at the reproductive level, i.e. they publish images of the source in a standard format, sometimes in the PDF containing the image. Text digitization is a more advanced solution than image digitization, which can be called the representative level, the advantage of which is that it can be searched more easily by computer. A dual-layer PDF is considered to be a combination of reproductive and representative levels. In these, the text is behind the source image––mostly produced by optical character recognition (OCR)––therefore it is searchable. The quality of OCR, which is usually not good for old or manuscript texts, or the time spent on post-production, especially manual correction, determines usability. Our research would provide the opportunity to move forward and make editions in which even a word-based search engine can be used to display the information you are looking for on the image as well in a platform independent way. Beyond the fact that our cultural heritage can be archived at the highest possible level, in the most up-to-date manner, it is also an important aspect that our digital editions can be used in education. Our research in Hungary would be a novelty because this kind of encoding of pre-enlightenment sources has not been done in a standard and platform-independent form. Image-based editions in Hungary has not yet been created at all, there are good examples abroad, but due to special problems (e.g. the specificity of local features), our results would be significant internationally as well.