XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX SUMMARY This is the complete database of firm-year-specific vertical integration scores used in the primary reference paper listed below. This data is computed using the same triple matrix product used to compute pairwise vertical relatedness scores, but to compute vertical integration (a property of a single firm and not a firm pair), the observations in this data are extracted from the diagonal entries of the resulting triple product. The result is one measure of vertical integration for each firm in each year. The vertical integration score (vertinteg in the database) indicates the potential of the given firm's products to be vertically related to the other products sold by the same firm. Intuitively, if this score is high, we conclude that the firm is vertically integrated. ****** FYI on most recent 2022 update: In this update, in addition to forward extending the database to 2021 fiscal year endings, we also improved the linking to Compustat gvkeys resulting in 1% more observations in each year relative to older versions. We also used better parsing technology to improve the quality of the item 1 extracted from some 10-Ks (we thank Christopher Ball at metaHeuristica.com). We tested these improvements using standard tests from HP2016 referenced below and find a modest improvement in signal power indicating that this version is improved relative to prior versions. ****** NOTE: Please read the technical descriptions below and the data and methods of the primary reference paper below before using the data. ************************************************************************************************************** ************************************************************************************************************** ********************************************** Citations ***************************************************** ********************************************** Citations ***************************************************** ********************************************** Citations ***************************************************** ************************************************************************************************************** ************************************************************************************************************** This data is the result of a large research project initiated in early 2012 by Laurent Fresard, Gerard Hoberg and Gordon Phillips. The intent of the project is to better understand the role of vertical relatedness across product markets and to examine the link between vertical relatedness and innovation. The data is the result of innovations described in the following paper. As such, this article should be cited when using this data for the purpose of academic research. Primary reference: Innovation Activities and Integration through Vertical Acquisitions Laurent Fresard, Gerard Hoberg, and Gordon Phillips, Review of Financial Studies (accepted 2019). *********** Auxiliary reference 1: This articles is a precursor to VTNIC. In particular, it develops the original horizontal TNIC industries. Text-Based Network Industries and Endogenous Product Differentiation Gerard Hoberg and Gordon Phillips, Journal of Political Economy (October 2016), 124 (5) 1423-1465. *********** Auxiliary reference 2: This articles is also a precursor to VTNIC. It uses horizontal relatedness data (TNIC) to study mergers and acquisitions. This article is relevant especially because the primary VTNIC paper above also studies M&A. The two papers are complementary as the theories and results for vertically related product markets are distinct from those in horizontally related markets. See the papers for details. Product Market Synergies and Competition in Mergers and Acquisitions: A Text-Based Analysis Gerard Hoberg and Gordon Phillips, Review of Financial Studies (October 2010) 23 (10), 3773-3811. ********************************************************************************************************************** ********************************************************************************************************************** ********************************************** Technical Details ***************************************************** ********************************************** Technical Details ***************************************************** ********************************************** Technical Details ***************************************************** ********************************************************************************************************************** ********************************************************************************************************************** ****** Please read the data and methods section of the above primary reference for complete details regarding how the variables are constructed. ******** Technical Note 1) The database is a firm year panel and thus contains the fields: year, gvkey, vertinteg. The vertinteg variable indicates the degree of vertical integration for the given firm (gvkey) in the given year. Technical Note 2) Each file contains a gvkey1 and a gvkey2 variable in addition to the score variable. It is important to note that we already did the merge to COMPUSTAT, so you do not have to repeat this. The data contained here is not lagged. Consider a COMPUSTAT firm with a fiscal year ending on Sept 30th, 1997, for example (i.e., the CSTAT variable datadate is 19970930). The corresponding observations for this firm in the VTNIC database would have the year set to 1997. These observations would be based on the product description of the 10-K report that was associated with this 9/30/1997 fiscal year end. More generally, the year field in the TNIC database is always set to be the first four digits of the datadate variable (the year part) so the database uses the calendar year convention for convenience. Because this data is merged by fiscal year end, the pairwise links in this file should conveniently be viewed as being time-synchronous based on the year identified as the first four digits of the datadate Compustat variable.