A frightening tale of imperfect data, a contrived metric, and undead scholarship.

Read on, if you dare, and you'll find out what zombies haunt American sociology! 🧟‍♀️🧟‍♂️🎓

I couldn't resist the Halloween theme for a quick writeup of my recent foray into bibliometric research. I wanted to figure out whether I could identify "undead" sociological scholarship—work that, after being dead for a while, found an unexpected second lease on life. I understand "dead" in the scientometric sense of not being cited.

How to find such undead scholarship? With a bibliometric approach, it's possible to cast a wide net. I decided to first focus on work that appeared in two major journals, the American Journal of Sociology and the American Sociological Review. Both journals have a long publication history (starting in 1895 and 1936, respectively), so they've had plenty of time to breed a few zombies.

Here's how I created my data set:

  1. I collected the digital object identifier (e.g., 10.1007/s12108-015-9254-0) of every article to appear in the two journals from the beginning till 1990, figuring that work published after then hasn't been around long enough to attain zombie status. This is a simple matter of making a few API calls to Crossref. Crossref doesn't only respond with DOIs but also a range of other useful bits of information, including a citation count.
  2. I then queried Lens for additional data on all AJS and ASR articles. What I was most interested in was the list of citing articles for each DOI I queried. Without knowing which articles are or aren't being cited, how will I find the zombies?
  3. Because Lens does not supply full bibliographic information for each citing article but only a Lens ID (e.g., 016-637-274-198-210), I needed to submit another batch of queries. This time I sent the Lens IDs of the citing articles and asked to get back their publication dates. That way I can't just determine whether articles are being cited, but also when.
  4. I can then merge the list of AJS and ASR articles with the list of citing articles and their publications years. For each article, I can then find out not just how often they were cited (a popular vanity metric), but also how long the gaps between citations are.

How do I got from citation gaps to finding zombies? This is where it gets extra scary. I concoted a metric that I call the zombie score. It is a function of the maximum gap between citations, the standard deviation of all gaps, the total "lifetime" of the article (from publication date to date of last citation), and the total number of citations. Why, you ask? Do you really want to know? Really? I told you the answer is scary. Alright, I'll tell you, but don't say I didn't warn you. I simply developed the score through trial and error, trying slightly different calculations until the score seemed to be surfacing the kinds of articles I was hoping to identify.

The horror.

Alright, you asked for it...

(Want to hunt zombies as well? You can find the data in this repository.)

In [60]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib
In [61]:
plt.rcParams['figure.figsize'] = [18, 12]
In [2]:
import pandas as pd
import seaborn as sb
In [147]:
amsoc_lens = pd.read_msgpack('amsoc_lens.msgpack')
amsoc_citations = pd.read_msgpack('amsoc_citations_lens.msgpack')

After loading the two data sets (AJS and ASR articles, citing articles), I merged them so that I could calculate citation gaps for each article. This requires a little intermediary work to bridge the two, which is what the following inelegant code does.

In [148]:
citation_pairs = []

for i, row in amsoc_lens[['scholarly_citations']].dropna(how='all').iterrows():
    citation_pairs.extend([(i, cit) for cit in row['scholarly_citations']])

citations_long = pd.DataFrame(citation_pairs, columns=['doi', 'citing_article_lens_id'])
In [149]:
citations_long = citations_long.merge(amsoc_lens[['year', 
                                                  'date_published', 
                                                  'citations_crossref', 
                                                  'citations_lens']], 
                                      left_on='doi', 
                                      right_index=True)
In [150]:
citations_long = citations_long.merge(amsoc_citations, 
                                      left_on='citing_article_lens_id',
                                      right_on='lens_id', how='left').drop('lens_id', 1).set_index('doi')

Now that the data sets are merged, the following calculates gaps between citations.

In [151]:
citations_long = citations_long.reset_index().sort_values(['doi', 'citing_article_year']).set_index('doi')
citations_long['prev_cit_year'] = citations_long.groupby('doi')['citing_article_year'].shift()
citations_long['prev_cit_year'] = citations_long['prev_cit_year'].fillna(citations_long['year'])
citations_long['cit_gap'] = citations_long['citing_article_year'] - citations_long['prev_cit_year']
In [152]:
max_cit_gaps = citations_long.groupby('doi')[['cit_gap']].max()\
    .rename(columns={'cit_gap': 'max_cit_gap'})
std_cit_gaps = citations_long.groupby('doi')[['cit_gap']].std(ddof=0)\
    .rename(columns={'cit_gap': 'std_cit_gap'})
last_cit = citations_long.groupby('doi')[['citing_article_year']].max()\
    .rename(columns={'citing_article_year': 'last_cit'})

Finally, I merge the citation gap metrics back into the article-level data.

In [153]:
amsoc_lens = amsoc_lens.merge(max_cit_gaps, on='doi')\
    .merge(std_cit_gaps, on='doi')\
    .merge(last_cit, on='doi')

I calculate the difference between the number of citations reported by Lens and Crossref. This is a crude indicator of data quality. I'm relying on the Lens data, but if Lens reports fewer citations than Crossref, it's likely the Lens data have substantial problems. Then citation gaps would then appear too big, resulting in spurious zombie spottings. Can't have that.

In [154]:
amsoc_lens['lens_crossref_diff'] = amsoc_lens['citations_lens'] - amsoc_lens['citations_crossref']

I will restrict the analysis to articles with at least five citations and no obvious data problems.

In [155]:
amsoc_lens = amsoc_lens[(amsoc_lens['citations_lens'] > 4) &
                        (amsoc_lens['lens_crossref_diff'] >= 0)]

Now we're ready to calculate the zombie score. 😱

In [156]:
amsoc_lens['lifetime'] = amsoc_lens['last_cit'] - amsoc_lens['year']
In [157]:
amsoc_lens['zombie_score'] = ((amsoc_lens['max_cit_gap'] * amsoc_lens['std_cit_gap']) / \
                              (amsoc_lens['lifetime'] * pd.np.log10(amsoc_lens['citations_lens'])))
In [158]:
sb.lineplot(amsoc_lens['year'], 
            amsoc_lens['zombie_score'])
Out[158]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fcaa30f3710>

Not surprisingly, zombies were mostly born a long time ago. In fact, the most zombie-ish articles are from a time before ASR even existed, meaning that ASR zombies are probably a rare breed.

The jaggedness of the line in the early decades indicates that there are a lot of data problems with older articles. But let's just ignore that for now, shall we?

What does it mean for an article to have a low zombie score? It means it's an evergreen article, the kind that's been continually cited. Unsurprisingly, that characteristic is highly correlated with a high absolute citation count. Before we look at the zombies, let's look at their sprightly counterparts.

In [159]:
amsoc_lens.sort_values('zombie_score')\
    .head(10)[['family_names', 'title', 'journal', 'year', 'last_cit', 'citations_lens']]
Out[159]:
family_names title journal year last_cit citations_lens
doi
10.1086/225469 (Granovetter,) The Strength of Weak Ties American Journal of Sociology 1973 2019.0 26767.0
10.1086/228311 (Granovetter,) Economic Action and Social Structure: The Prob... American Journal of Sociology 1985 2019.0 20114.0
10.2307/2091739 (Berger, Luckmann) The Social Construction of Reality: A Treatise... American Sociological Review 1967 2019.0 10656.0
10.2307/2087860 (Faris, Parsons) The Social System. American Sociological Review 1953 2019.0 6533.0
10.2307/2092623 (Gouldner,) The Norm of Reciprocity: A Preliminary Statement American Sociological Review 1960 2019.0 6929.0
10.1086/226424 (Hannan, Freeman) The Population Ecology of Organizations American Journal of Sociology 1977 2019.0 5485.0
10.2307/2094589 (Cohen, Felson) Social Change and Crime Rate Trends: A Routine... American Sociological Review 1979 2019.0 5288.0
10.1086/226550 (Rowan,) Institutionalized Organizations: Formal Struct... American Journal of Sociology 1977 2019.0 15890.0
10.2307/2095521 (Swidler,) Culture in Action: Symbols and Strategies American Sociological Review 1986 2019.0 4706.0
10.1086/226464 (Zald,) Resource Mobilization and Social Movements: A ... American Journal of Sociology 1977 2019.0 3807.0

Most of these highly-cited "evergreen" articles are from the 1970s and 1980s. (Remember that I didn't collect any data past 1990.) Interestingly, rival sociologists Parsons and Gouldner are two exceptions, breaking into these high ranks with considerably older articles.

But enough of these pedestrian pieces of scholarship. Let's finally look at the zombies.

Doomsday drumroll please...

In [160]:
amsoc_lens.sort_values('zombie_score', ascending=False)\
    .head(20)[['family_names', 'title', 'journal', 'year', 'citations_lens', 'last_cit']]
Out[160]:
family_names title journal year citations_lens last_cit
doi
10.1086/210723 (Ratzel,) Studies in Political Areas. II. Intellectual, ... American Journal of Sociology 1898 5.0 2019.0
10.1086/211171 (Karapetoff,) On Life-Satisfaction American Journal of Sociology 1903 5.0 2014.0
10.1086/210858 (Ellwood,) Prolegomena to Social Psychology. II. The Fund... American Journal of Sociology 1899 5.0 2014.0
10.1086/210868 (Ellwood,) Prolegomena to Social Psychology. III. The Nat... American Journal of Sociology 1899 5.0 2014.0
10.1086/210901 (Mead,) The Psychology of Socialism.Gustave Le Bon American Journal of Sociology 1899 5.0 2018.0
10.1086/211909 (Vincent,) The Rivalry of Social Groups American Journal of Sociology 1911 5.0 2014.0
10.1086/210761 (Durkheim,) Minor Editorials American Journal of Sociology 1898 5.0 2016.0
10.1086/211634 (Willcox,) Discussion of the Paper by Alfred H. Stone, "I... American Journal of Sociology 1908 5.0 2017.0
10.1086/210999 (Auten,) Some Phases of the Sweating System in the Garm... American Journal of Sociology 1901 5.0 2013.0
10.1086/212341 (Dealey,) The Eugenic-Euthenic Relation in Child Welfare American Journal of Sociology 1914 5.0 2017.0
10.1086/213672 (Chapin,) The Statistical Definition of a Societal Variable American Journal of Sociology 1924 5.0 2015.0
10.1086/210935 (Small,) The Scope of Sociology. III. The Problems of S... American Journal of Sociology 1900 5.0 2019.0
10.1086/211155 (Small,) What Is a Sociologist? American Journal of Sociology 1903 6.0 2015.0
10.1086/211930 (Riley,) Sociology and Social Surveys American Journal of Sociology 1911 5.0 2017.0
10.1086/210624 (Willcox,) Methods of Determining the Economic Productivi... American Journal of Sociology 1896 6.0 2015.0
10.1086/210950 (Lowell, Henderson, M'Gonnigle, Barbour, Sanbo... Public Outdoor Relief American Journal of Sociology 1900 5.0 2015.0
10.1086/211991 (Breckinridge,) Half a Man. The Status of the Negro in New Yor... American Journal of Sociology 1911 5.0 2015.0
10.1086/211692 (MacLean,) Life in the Pennsylvania Coal Fields with Part... American Journal of Sociology 1908 5.0 2014.0
10.1086/212339 (Hayes,) Effects of Geographic Conditions Upon Social R... American Journal of Sociology 1914 5.0 2016.0
10.1086/210880 (MacLean,) Factory Legislation for Women in Canada American Journal of Sociology 1899 5.0 2009.0

There we have them, the 25 most undead articles of American sociology. They were, with one exception, all published before World War I, and they all appeared in AJS. They have only been cited a handful of times, but they've all been cited in the past decade.

I plan to look more into who necrobumped these articles and why, but a glance over the list already suggests a few possible reasons. G.H. Mead and Albion Small were major figures in the (first) Chicago School, and it seems like their work in AJS has been rediscovered by scholars taking an interest in the Chicago School's intellectual history. Annie Marion MacLean is on the list (twice) for a similar reason, except that she wasn't a major figure in the Chicago School, but one of the neglected women toiling away at ethnographic studies. Nellie Mason Auten is another such example.

The note by Durkheim published as a "short editorial" is the only thing the eminent French sociologist ever published in AJS, as far as I can see.

But many of the other names are more puzzling. Can we expect an Ellwood revival, or a Hayes revival, or a Chapin revival? Unlikely. Ratzel coined the phrase Lebensraum beloved by the Nazis; that's a zombie to slay. I'm glad I learned about Vladimir Karapetoff and his mathematical ideas about human satisfaction though, and I'll be sure to inspect more zombies up close.

What about zombies born after ASR came into being?

In [163]:
amsoc_lens[amsoc_lens['year'] >= 1936].sort_values('zombie_score', ascending=False)\
    .head(20)[['family_names', 'title', 'journal', 'year', 'citations_lens', 'last_cit']]
Out[163]:
family_names title journal year citations_lens last_cit
doi
10.1086/218312 (Horney,) What Is a Neurosis? American Journal of Sociology 1939 5.0 2018.0
10.2307/2084504 (Merton, Gini) Prime Linee di Patologia Economica. American Sociological Review 1936 5.0 2016.0
10.2307/2086361 (Eggan, Osias) The Filipino Way of Life: The Pluralized Philo... American Sociological Review 1941 5.0 2015.0
10.2307/2084959 (Merton, Haller, Billington, Bready, Halbwachs) The Rise of Puritanism; Or, the Way to the New... American Sociological Review 1939 5.0 2014.0
10.2307/2085078 (Ginzberg,) The Occupational Adjustment of 1000 Selectees American Sociological Review 1943 5.0 2016.0
10.2307/2085001 (Rumney, Maier) The Science of Society: An Introduction to Soc... American Sociological Review 1939 5.0 2019.0
10.2307/2086240 (Guillaume,) La Psychologie Animale. American Sociological Review 1941 5.0 2015.0
10.1086/219456 (Knight,) Human Nature and World Democracy American Journal of Sociology 1944 5.0 2014.0
10.1086/219573 (Brown,) Missions and Cultural Diffusion American Journal of Sociology 1944 5.0 2019.0
10.2307/2088312 (Taylor,) Geography in the Twentieth Century: A Study of... American Sociological Review 1951 5.0 2018.0
10.1086/217297 (Miller,) The Relation of Reading Characteristics to Soc... American Journal of Sociology 1936 5.0 2011.0
10.2307/2086968 (Odum,) The Way of the South: Toward the Regional Bala... American Sociological Review 1947 5.0 2012.0
10.2307/2085689 (Miller,) Effect of the War Declaration on the National ... American Sociological Review 1942 6.0 2010.0
10.1086/218920 (Durant,) Morale and Its Measurement American Journal of Sociology 1941 6.0 2016.0
10.2307/2084073 (Beynon,) The Southern White Laborer Migrates to Michigan American Sociological Review 1938 5.0 2013.0
10.1086/221204 (Goodman,) II. The Validation of Prediction American Journal of Sociology 1953 5.0 2010.0
10.2307/2085548 (Zeleny,) Measurement of Sociation American Sociological Review 1941 5.0 2014.0
10.1086/222363 (Hughes,) Brücke und Tür: Essays des Philosophen zur Ges... American Journal of Sociology 1958 5.0 2018.0
10.2307/2085850 (von Hentig,) The First Generation and a Half: Notes on the ... American Sociological Review 1945 5.0 2018.0
10.2307/2086021 (Steiner,) Population Trends in Japan American Sociological Review 1944 6.0 2017.0

We see a fairly even split between the two journals. Some of the authors here are clearly superstars that were being rediscovered, as before with the Chicago School scholars. Karen Horney! Robert Merton! Everett Hughes! But others are again more puzzling...

It's also noticeable that all zombies have 5 or 6 citations—right at the cutoff. That suggests that my metric needs adjusting—but the thought of that is just too scary for me right now...

Happy Halloween! 🎃