We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
TEl Analytics: converting documents into a TEl format for cross-collection text analysis.
- Authors
Pytlik Zilhig, Brian L.
- Abstract
For the purposes of large-scale analysis of XML/SGML files, converting humanities texts into a common form of markup represents a technical challenge. The MONK (Metadata Offer New Knowledge) Project has developed both a common format, TEl Analytics (a TEl subset designed to facilitate interoperability of text archives) and a command-line tool, Abbot, that performs the conversion. Abbot relies upon a new technique, schema harvesting, developed by the author to convert text documents into TEl-A. This article has two aims: first, to describe the TEl-A format itself and, second, to outline the methods used to convert files. More generally, it is hoped that the techniques described will lead to greater interoperability of text documents for text analysis in a wider context.
- Subjects
TEXT Encoding Initiative (Document type definition); FILE conversion (Computer science); XML (Extensible Markup Language); SGML (Document markup language); DOCUMENT markup languages; INTERNETWORKING; METADATA; HUMANITIES
- Publication
Literary & Linguistic Computing, 2009, Vol 24, Issue 2, p187
- ISSN
0268-1145
- Publication type
Article
- DOI
10.1093/llc/fqp005