XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX PRELIMINARIES: Please turn on "word wrap" when reading this document. This file describes the content of the dynamic offshoring databases. Please review and reference the following papers when using this data: Offshore Activities and Financial vs Operational Hedging, 2017, Journal of Financial Economics 125 (2), 217-244. The Offshoring Return Premium, 2019, Management Science, 65 (6), 2445-2945. * We thank metaHeuristica for making this data feasible to generate and to do so very efficiently. For inquiries regarding how to create similar data structures, please contact Christopher Ball ( XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX DATA DESCRIPTION: ****** Please read the data section of the above papers for complete details regarding how the variables are constructed. ******** We note only some basic details about the data here (see paper for full detail). First, the data is indexed by Compustat gvkey, nation and by year, so they can be merged into existing databases easily. The year refers to the calendar year in which the given firm's fiscal year ends. The data is not lagged. If researchers wish to have lagged data, they must lag the data on their own. Also note that this data has been extended since the paper was written and now covers the years from 1997 to 2017. In order to be included in this database, a firm must have a machine readable 10-K. 10-Ks were processed using the metaHeuristica software program, and we thank Christopher Ball of metaHeuristica for providing us with access. *** There are four offshoring variables included in the database (please read the data section of the above paper for more details) *** (1) Offshore Output (variable=OUTPUT): number of mentions of the firm selling goods to the given nation. (2) Offshore Input (variable=INPUT): number of mentions of the firm purchasing inputs from the given nation. (3) Offshore External Input (variable=EXIN): number of mentions of the firm purchasing inputs from the given nation when the firm does not also mention owning assets in the given nation. (4) Offshore Internal Input (variable=ININ): number of mentions of the firm purchasing inputs from the given nation when the firm does also mention owning assets in the given nation. * Note that INPUT does not always equal EXIN plus ININ. There is also a third category we refer to as "Indeterminate INPUT" that tabulates some input words that are not explicitly identified as either external input or internal input. This third category can be computed as INPUT minus EXIN minus ININ. Please read the data section of the above paper for more details. ********** IMPORTANT NOTE ON DATA COVERAGE ********** ********** IMPORTANT NOTE ON DATA COVERAGE ********** ********** IMPORTANT NOTE ON DATA COVERAGE ********** ********** IMPORTANT NOTE ON DATA COVERAGE ********** The main database file contains one record per firm x year x country x activity. "Activity" is one of the four offshoring activities noted above (OUTPUT, INPUT, EXIN, ININ), and this information is denoted by the column "type". The column "number" indicates how many mentions of the given activity the text analytic engine picked up for the given firm in the given year. Note however that some firms do not have any observed offshore activities of any kind. Because the main database is a database of "actual hits", such non-offshoring firms will not be in the main database even though we do have machine readable 10-Ks for these firms. Hence, we want to make researchers aware of the second file we include in the zipped file, which is called "OffshoringDatabase_Vx_Coverage.txt". This file is a simple list of gvkey-year pairs listing the firm-years that are covered in the metaHeuristica database. We also include a log document length variable (based on # paragraphs) in this file, which can serve as a control variable. Regarding data coverage, the key here is that any firm-year that is in this coverage file was "touched" and scanned by the text analytics engine. So if a firm-year appears in this coverage database, but NOT in the main database, the conclusion is that the firm has zero hits indicating offshoring activities. These are true zeroes ("non-offshoring firms"). This can matter because for any firm-year that is NOT in the file "OffshoringDatabase_Vx_Coverage.txt", the offshoring variables should be set to "missing" and not to "zero". This would mean that we do not have a machine readable 10-K for the given firm so it is thus missing. To build the panel that is right for each research purpose, therefore, researchers should consider using both files in the zipped archive in tandem, which will ensure accurate ability to discriminate between missing and zero. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX BASIC VS ENHANCED DATABASES: There are two versions of the database available on the offshoring data library. Both are constructed by searching for country names and/or region names in the 10-K text and then looking at the surrounding text to determine whether the mention of the given country or region relates to which specific offshore activities (selling output or buying input, etc). The basic version of the database ONLY covers text searches for individual country names such as "Japan" or "Canada". The enhanced version covers everything the basic version covers, and it additionally includes regions such as "Asia" or "Latin America". We provide the two separate versions as either might be relevant given the nature of the research question posed. Both the basic and enhanced databases are available from the Hoberg and Moon data library. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX NOTE ON LEGACY VERSION OF THE DATA: The original data used in the paper was generated using an older version of metaHeuristica that bundled 10-K exhibits along with the regular 10-K text. The new version of metaHeuristica (which was used to create the extended data provided on this website) only includes the 10-K text and does not include the exhibits. To obtain a copy of the legacy data, please follow this link (we note that including or not including the exhibits does not matter regarding the inferences made in the original paper): * Note that the legacy version of the data will not be updated going forward.