The Hoberg and Phillips firm-scope database, along with its industry mappings, is based on the text firms use to
describe their businesses and product offerings in annual public-firm 10-K
disclosures. Scope is identified at the individual firm level using doc2vec and k-means spatial models that reduce the
dimensionality of product descriptions and group their content into buckets based on common product discussions. Keywords
that most define each unique industry are then used to map firms to the industries they serve and thus compute scope as the
count of the industries a given firm serves in a given year.
The scope database is best
described through its two parts (two separate potential downloads). The first is a firm-year
panel database containing information about each firm’s scope. The second is a higher dimensional
mapping database that indicates the specific doc2vec industries each firm likely operates in within each year. The result is a firm-industry-year panel database
with textual scores indicating how strong the link is between each firm and its mapped industries.
|
Welcome
to the Hoberg-Phillips Firm-Scope Data Library
<<
NEW: Data coverage now 1989 to 2021! >>
Data provided
by Gerard Hoberg
(University of Southern California),
and Gordon Phillips
(Dartmouth College)
|
* Please cite the following paper when using this
data. Details regarding the creation
and use of this data are documented in the paper.
Firm Scope Data
(firm-scope for each firm in each year):
** This is the primary database that addresses the overwhelming
majority of research needs.
* The extent of firm scope is identified at the individual
firm level using doc2vec and k-means spatial modeling.
** Please review
all details in the readme file before using the database.
Firm-Industry Assignments in each Year (with industry labels):
* The mappings for each firm to the likely industries it seves (along with industry labels) is identified at the individual
firm level using doc2vec and k-means spatial modeling.
** Please review
all details in the readme file before using the database.
|
Back to top
|
|