global politics, relationally

Stop the Madness: Reviewer Demands and Quantitative Models


The madness has to stop.  Things are getting out of hand for those that do statistical research and try to publish their results.  Reviewers for quantitative political science articles have gone off the deep end in their requests for revisions of models.  A recent Journal of Peace Research article had a table with 37 different models for robustness checks.  It is not unheard of to see other articles with hundreds of different models in an appendix or footnotes.  New variables, robustness, new methods, the requests never seem to end and only escalate.

What is the source of the problem?  For one, the concern by reviewers could always be that the scholar is fudging the modeling to make the results meet their expectations or using inappropriate methods.  Modeling to meet expectations is a horrible practice and efforts of this sort always are discovered.  We know many debates where this has happened, but the benefit of the doubt should be with the scholar in the first place.  Using inappropriate methods is another key concern, but new methods are not always better or even useful.  Regardless, the findings should be central rather than the model.

The reasonableness of the request must be judged against the payoff for the change.  Is it worth it to request dozens of new specifications based on your own inclinations?  Are you making comments and suggestions because you want to be able to say something about a model at the review stage?  Is it really necessary?

Will someone think of the authors?  Requiring new specifications and variables often requires a vastly expanded model, learning new methods, or restructuring an entire dataset to meet demands.  Since careers often hang in the balance, this request is often granted, but at what price?

At the review stage, we need to be better at utility calculations.  Will our request make the article better?  Do we have some sort of special knowledge about the model or question that would suggest adding a new variable will greatly enhance the outcome?  Having gone through this process many times, I rarely ask for new models or specifications in reviews because I know the burden this demands of the author.  Basically, the question is simple, is the article good enough or not?  No amount of specifications will change the outcome of this question.

The real issue is civility.  Are we being civil to our colleagues or just presenting them with a list of tasks in order to get over our personal bar.  Looking at recent publications and reviewing many articles, I believe we have gone too far in our demands during the article revision stage.  Are we asking for changes because it will make the article better or because we are creating unreasonable publication standards?

PS – Reflections (9/9/2014)

Seems this post struck a nerve for some.  That was the intention since I gather many who work in the quantitative IR field feel this way while the statistics wing of politics argues for more robustness and standards.  My main point was to think about the nature of the article and the burden on the author when making such requests during the reviewing stage.  For me, the issue really struck home after a coauthor spent weeks modifying a model to satisfy a reviewer with no substantive changes made despite mountains of work.

My JPR example example was perhaps ill chosen.  I picked the article because it started a conversion on facebook that made me aware of the collective angst regarding these issues.

I did want to add that at no point did I ever blame the editors of journals.  They are only working with what they are given by the reviewers and have to uphold standards.   To me, this issue is more about nature of demands by reviewers rather than some structural problems.  As always, be civil, unless its really funny to not be…


Author: Brandon Valeriano

Brandon Valeriano is the Donald Bren Chair of Armed Politics at the Marine Corps University.

  • A recent reviewer suggested I have a statistician look at my results. Okay, sure. What now? Am I supposed to add a footnote stating “These results have been reviewed a bona fide statistician, who said ‘Yep, looks good,’ so you may now rest easy”?

  • Amelia Hoover Green

    I’m torn. On the one hand, I actually don’t believe that massaging the models to fit expectations is “always” found out — so I like to see manuscripts with multiple different specifications, many robustness checks, etc. On the other hand, I suspect that modeling (particularly when we’re talking about work on violence) is a secondary problem. The real issue is always going to be measurement. I can’t tell you how many manuscripts I’ve sent back because I wasn’t convinced by the authors’ hand-waving about how the particular convenience dataset they’re using is “probably” representative.

    • Polonius

      Oh my, yes. Also, some proxies are so far from the concept they are supposed to represent, they couldn’t see it with a pair of binoculars. Hand-waving abounds.

  • Pingback: [BLOCKED BY STBV] Referees, Robustness & Valid Inference in Poli-Sci | Will Opines()

  • Mike

    I couldn’t disagree more with this post. Model results that are fudged or otherwise highly fragile are a huge problem in the social sciences. (The claim that such practices “always are discovered” is bizarre. Is that a typo?) The expectation that authors provide a range of reasonable models is the best way to guard against this. Results are much more believable if you can show they don’t depend on specific and typically arbitrary assumptions.

    The post’s main response is that this is too difficult. First, some variations are and some aren’t. Granted, if a check requires substantial data collection and is not all that compelling, you shouldn’t be expected to do it. In my experience, editors often acknowledge this. But if it’s a matter of adding some available variable or tweaking the specification, then what’s the problem? Second, the goal of the discipline should be publishing credible and informative research, not racking up publications with a minimum degree of effort from authors.

    There are several other problems. The statement that “the findings should be central rather than the model” is incoherent since the finding derives from a model. Moreover, the whole point of looking at model variations is to confirm that findings don’t depend too strongly on the model choice. Even worse is this passage: “Basically, the question is simple, is the article good enough or not?
    No amount of specifications will change the outcome of this question.” Huh? Of course it can. If the main finding of interest doesn’t hold up to some obvious control or disappears for every reasonable specification but the one the author chose, shouldn’t that change our opinion about the article?

    • Brandon Valeriano

      Of course everyone should run obvious controls and specification tests. What I am talking about is extreme versions of this, hundreds of new models, adding in your own personal variable, suggesting new methods that make the results less open to interpretation.

      • Mike

        Well, this is in dramatic conflict with your post. What it sounds like you’re saying now is that looking at a range of models is good, but the alterations need to be compelling. This is right, but it requires an attitude of being open to alternatives and judging when they are plausible. It also means accepting that some situations should require more checks than others. It does not mean that we should stop suggesting robustness checks because they’re annoying to do.

    • Rather than burden the author with producing replication tables and the reader with paging through them, journals should just encourage (or require) replication materials—including data and do files—be submitted with the publication draft. If you doubt the results, run the models yourself and write a note to the editor if something weird happens. Attaching your raw data and do file sends a much stronger signal that results are legitimate than “here are ten more tables I put together,” and I’d believe an author more who added his do file than one who could have made up twenty-six more sets of model results.

      • Mike

        That’s a false dichotomy. Authors should have to do both. Moreover, providing the data doesn’t help with reviewing the paper, unless you unreasonably expect reviewers to be running models. Doing lots of robustness checks does not require that much time, nor does it need to take up much article space since they can be put in an online appendix.

        The bigger point here is that we should accept that the models we specify are highly uncertain. Moreover, results can strongly depend on which model we choose. It’s valuable to show the reader when this is not the case.