Digital media holds a strong presence in society today. Providers of digital media may choose to obtain a content rating for a given media item by submitting that item to a content rating authority. That authority will then issue a content rating that denotes to which age groups that media item is appropriate. Content rating authorities serve publishers in many countries for different forms of media such as television, music, video games, and mobile applications. Content ratings allow consumers to quickly determine whether or not a given media item is suitable to their age or preference. Literature, on the other hand, remains devoid of a comparable content rating authority. If a new, human-driven rating authority for literature were to be implemented, it would be impeded by the fact that literary content is published far more rapidly than are other forms of digital media; humans working for such an authority simply would not be able to issue accurate content ratings for items of literature at their current rate of production. Thus, to provide fast, automated content ratings to items of literature (i.e., books), we propose a computer-driven rating system which predicts a book's content rating within each of seven categories: 1) crude humor/language; 2) drug, alcohol, and tobacco use; 3) kissing; 4) profanity; 5) nudity; 6) sex and intimacy; and 7) violence and horror given the text of that book. Our computer-driven system circumvents the major hindrance to any theoretical human-driven rating system previously mentioned--namely infeasibility in time spent. Our work has demonstrated that mature content of literature can be accurately predicted through the use of natural language processing and machine learning techniques.



College and Department

Physical and Mathematical Sciences; Computer Science



Date Submitted


Document Type





machine learning, neural net, document classification, book content rating