Keywords
data compression, database, semi-structured text
Abstract
The structure of this paper is as follows. We begin by identifying some characteristics of semi-structured text that have special relevance to data compression. We then give a brief account of a particular large textual database, and describe a compression scheme that exploits its structure. In addition to providing compression, the system gives some insight into the structure of the database. Finally we show how the hierarchical grammar can be generalized, first manually and then automatically, to yield further improvements in compression performance.
Original Publication Citation
Nevill-Manning, C. G., Witten, I. H., Olsen, D. R.: "Compressing semi-structured text using hierarchical phrase identification", Proceedings of the Data Compression Conference, IEEE Press, Los Alamitos, CA (1996).
BYU ScholarsArchive Citation
Olsen, Dan R. Jr.; Nevill-Manning, Craig G.; and Witten, Ian H., "Compressing Semi-structured Text Using Hierarchical Phrase Identifications" (1996). Faculty Publications. 1294.
https://scholarsarchive.byu.edu/facpub/1294
Document Type
Presentation
Publication Date
1996-04-03
Permanent URL
http://hdl.lib.byu.edu/1877/2348
Publisher
IEEE
Language
English
College
Physical and Mathematical Sciences
Department
Computer Science
Copyright Status
© 1996 Institute of Electrical and Electronics Engineers
Copyright Use Information
http://lib.byu.edu/about/copyright/