Keywords

data compression, database, semi-structured text

Abstract

The structure of this paper is as follows. We begin by identifying some characteristics of semi-structured text that have special relevance to data compression. We then give a brief account of a particular large textual database, and describe a compression scheme that exploits its structure. In addition to providing compression, the system gives some insight into the structure of the database. Finally we show how the hierarchical grammar can be generalized, first manually and then automatically, to yield further improvements in compression performance.

Original Publication Citation

Nevill-Manning, C. G., Witten, I. H., Olsen, D. R.: "Compressing semi-structured text using hierarchical phrase identification", Proceedings of the Data Compression Conference, IEEE Press, Los Alamitos, CA (1996).

Document Type

Presentation

Publication Date

1996-04-03

Permanent URL

http://hdl.lib.byu.edu/1877/2348

Publisher

IEEE

Language

English

College

Physical and Mathematical Sciences

Department

Computer Science

Share

COinS