data compression, database, semi-structured text
The structure of this paper is as follows. We begin by identifying some characteristics of semi-structured text that have special relevance to data compression. We then give a brief account of a particular large textual database, and describe a compression scheme that exploits its structure. In addition to providing compression, the system gives some insight into the structure of the database. Finally we show how the hierarchical grammar can be generalized, first manually and then automatically, to yield further improvements in compression performance.
Original Publication Citation
Nevill-Manning, C. G., Witten, I. H., Olsen, D. R.: "Compressing semi-structured text using hierarchical phrase identification", Proceedings of the Data Compression Conference, IEEE Press, Los Alamitos, CA (1996).
BYU ScholarsArchive Citation
Olsen, Dan R. Jr.; Nevill-Manning, Craig G.; and Witten, Ian H., "Compressing Semi-structured Text Using Hierarchical Phrase Identifications" (1996). Faculty Publications. 1294.
Physical and Mathematical Sciences
© 1996 Institute of Electrical and Electronics Engineers
Copyright Use Information