We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Automatic Encoding and Language Detection in the GSDL -- Part II.
- Authors
Pinkas, Otakar
- Abstract
The processing of the older MS Word format in the GSDL depends on the correct encoding of the temporary HTML file. The "windows-scripting" fails, but the wvware.exe program is successful. The actual .docx format needs user to change the setting in the Word configuration. A temporary HTML file should be encoded in UTF-8 instead of the Windows-1250 preset in the Czech environment. The automatic conversion from ISO-8859-2 to Windows-1250 for HTML pages is wrong, but the conversion ISO-8859-1 to Windows-1252 is valid. The automatic language detection is sometimes incorrect due to the predomination of a similar language model. The automatic language detection needs further investigation.
- Subjects
DIGITAL library software; HTML (Document markup language); DOCUMENT markup languages
- Publication
Journal of Systems Integration (1804-2724), 2015, Vol 6, Issue 4, p45
- ISSN
1804-2724
- Publication type
Article
- DOI
10.20470/jsi.v6i4.238