Electrical and Computer Engineering


Philip B. Lundrigan

Steve Richardson

Karl F. Warnick


censorship, Chinese, homophone, keyword filtering


As the scope of Chinese language censorship expands, individuals will seek to bypass such censorship efforts. One of the most prevalent techniques in such censorship is automated keyword filtering. This research focuses on building a command-line tool that can bypass automated keyword filters for both traditional and simplified Chinese characters using a two-part approach. The first part involves detecting sensitive words in user-inputted text by using phrase matching techniques to identify character strings that have been censored in the past. The second part centers around generating possible obfuscated homonym alternatives. The tool relies on a compiled list of banned and potentially banned phrases from previous research to determine what is deemed "sensitive." Alternate characters to generate the obfuscated text are drawn from a a standardized list of the most commonly used Chinese characters. Further research is needed to automate the updating the list of sensitive phrases and to detect phrases that are similar, but not identical, to those that have been censored in the past.