Journal of Undergraduate Research


mergers, acquisitions, document vector space, SEC


Marriott School of Management




This project involved looking at Mergers and Acquisitions and a textual analysis of their SEC filings. We index, clean and match documents in the SEC EDGAR database to the CRSP and Compustat databases in order to consider multiple instruments for “specificity” and uniqueness in a filing. We then scrape each document and use the data to create these instruments. One instrument for specificity we create uses parts of the Stanford Natural Language Processing library and the NLTK and GenSim libraries in python to convert each filing into a vector in a vector space. This allows us to find the cosine distance between any document and the “mean” document/vector in the vector space (a la Hoberg and Hanley, 2010 RFS). We then find the daily split between the returns of firms in the third and first terciles of our instruments for 20 years of data (holding each firm in portfolio for 36 months). We find that the returns from our instrument delivers a strongly significant abnormal return over the Fama-French 3-factor model.