A just-published study — Inside Higher Ed story here — confirms, with data, something that I think is pretty obvious to many academics already: women do more service, on average, than their male counterparts. According to the study, that relationship holds true across faculty ranks, even as the amount of service overall increases as one ascends the Assistant-Associate-Professor ladder. And it is driven, conclude the authors, by “internal” service rather than “external” service, that is, by service that one performs by serving on committees and the like within one’s department or university, rather than by serving in such capacities for community groups or professional associations. Even though the results of the study are a serious indictment of the academic life, it is always nice to have data to validate one’s intuitions and link one’s local experiences to broader patterns.
I am less concerned with the fine-grained distinctions among possible causes for this gender gap than others might be (the authors of the study test hypotheses about proportionality and the gender of academic unit leadership, as well as issues related to adopting an administrative career track). I want instead to take the descriptive findings as a point of departure and propose a potential solution that intervenes directly in the moment when a member of the professoriate is asked, by anyone, to engage in internal service. (I am setting aside external service, largely because the study doesn’t find a significant gap between male and female academics in the amount of external service — although it does suggest that different kinds of external service are chosen by men and by women, which is intriguing on its own.) This doesn’t just address gender differentials, but any categorical differentials in service, because what I am proposing is that we generate impersonal metrics for internal service that allow everyone to ascertain whether they and their colleagues are doing an appropriate amount of such service.
But before I do that, we need to be clear on what an impersonal metric actually is, and what it does in practice.
This is especially important because metrics for the other two points of the academic trifecta — research and teaching — have been, shall we say, controversial. Whether we’re talking about more or less sophisticated citation counts, journal impact factors, student evaluations of teaching, the U.K.’s “research excellence framework,” etc., impersonal metrics have a bad rap in certain parts of academia, where they are regarded as tools of neoliberalization and the imposition of managerial discipline on the free exercise of thinking and writing and teaching. And I certainly don’t disagree that there are terrible uses of such metrics, and terrible metrics; as far as I’m concerned student evaluations of teaching are the most pernicious of these, as they reduce teaching to a matter of customer satisfaction (“did you have a good time in this class?”) and confuse an immediate student reaction with an augmentation of the student’s capacity for critical thought. But citation counts are hardly better, given the well-documented gender gap in citation practices across the board. Every metric can be criticized for its biases, and no metric is immune to bias.
But that’s not the point of a metric, properly understood. And I would argue that most of the problems people associate with impersonal metrics in research and teaching come from a misunderstanding of what impersonal metrics are actually for — a misunderstanding that characterizes, in the first instance, those outside of the day-to-day flow of the academic life who want to use impersonal standards as a way of “optimizing the academic enterprise” or some other such piece of managerial nonsense. By “those outside of the day-to-day flow” I am including those elements of the contemporary university that don’t teach or do any research, but just “administrate.” Not that all academic administrators inevitably become such managers, but it seems like a lot of them do, at least in part because they have lost touch with the actual point of the whole exercise: to generate knowledge and to cultivate thinkers. Removed at one or several steps from that “life of the mind,” the temptation to rely on impersonal metrics as managerial tools can become overwhelming, especially when “external stakeholders” start demanding evidence of institutional effectiveness…
The problem here is that the managerial use of performance metrics presumes something that we know isn’t true, and equates the impersonality of a metric with what we might call classical objectivity. Classical objectivity — the kind of thing that has been roundly critiqued in the philosophy of knowledge for centuries, but still exercises a hold on the broader public imagination — maintains that good knowledge is a “view from nowhere,” a pure and perspective-less account that simply tells us about things “as they are” and thus provides the only and unquestionably solid foundation for subsequent dealings. Basically no philosopher since Immanuel Kant’s time has actually accepted this account of knowledge, and even those who maintain that the purpose of (particularly scientific) knowledge is to produce pictures of the world and the things in it that represent those things as accurately as possible would and do deny that any measurement is classically objective in this way. Knowledge, at least the kind of factual knowledge that academics generate and write down in articles and books, is never an unmediated presentation of how things are in themselves, but is always — always! — dependent on the conceptual definitions and operational procedures employed to turn observations into facts and facts into findings. Ways of measuring, classifying, and characterizing things are thus better thought of as conceptual equipment than as utterly transparent reflections.
None of this means that there aren’t better and worse tools for different purposes, or that every piece of conceptual equipment is just as good as any other. As I have said elsewhere, pluralism isn’t relativism, and epistemic diversity isn’t dangerous — well, it’s dangerous to classical objectivity, but not to anyone who doesn’t maintain that particular discredited point of view. All it means is that, precisely because there are a plurality of ways of slicing into the world for a number of different purposes, we have to be clear about what our purposes are. While there often are consensus answers we can arrive at by agreeing to use terms in the same ways — we can come to a consensus about how many people attended an event by agreeing on the procedures for counting people, for example — that’s a far cry from saying that we can come to a final answer that somehow gives us unmediated access to really real reality, just because we might change definitions and procedures in the future and hence arrive at a different (but, importantly, not a contradictory) answer. Whether this alcoholic beverage is scotch whisky or not depends on our definition of “scotch whisky”, and on our agreeing to a procedure for applying that definition, so the answer we get is both contingent on our consensus and independent of any of us individually: given a definition and a procedure, this either is or is not scotch whisky. If we sometime in the future redefine “scotch whisky”, the object might not be scotch whisky any longer, although once it was. [Fun fact: even though Prohibition in the United States wasn’t repealed until December 1933, low-alcohol beer could be legally bought and sold in the U.S. from 7 April 1933 on because of the Cullen-Harrison Act, which redefined “intoxicating beverage” and thus altered the correct description of various liquids made from fermented grains. Issues of epistemic pluralism are not just of academic interest.]
What this all has to do with impersonal metrics is that, contrary to the misguided managerial practices of academic administrators, any particular impersonal standard is not a classically objective instrument for measuring things in an incontrovertible way. It is instead a descriptive instrument that like all such instruments comes from a place and as such instantiates a particular way of operationally defining the attribute it is measuring. If one forgets this, it is easy to treat the metric as though it were simply a statement of how things are, because any impersonal standard by definition generates impersonally valid results. If we defined “scholarly productivity” as the number of times someone’s published works included the letter “q” — which would be an impersonal standard inasmuch as it doesn’t depend on anyone’s idiosyncratic expression of a personal preference — the application of that metric would yield impersonally valid results ranking various scholars in terms of their productivity according to that definition of productivity. (Compare what seems to be each of my children’s operational definitions of “good music”: “whatever we say it is at the moment, whether or not we thought this was ‘good music’ last week.” That’s a “personal” standard, and as such can only generate personally valid results when applied to anything. We do ourselves no epistemic favors calling this kind of thing “subjective”, however, since the opposite kind of standard isn’t classically objective anyway.)
So: any impersonal standard generates impersonally valid results when applied, just as long as the standard involves definitions and procedures on which we are agreed.1 But we should not confuse that impersonality with classical objectivity. When academic administrators (and others even further outside of academia: government officials, private industry, etc.) impose performance metrics on academics, the distinction between impersonality and classical objectivity is blurred. The difference between “according to standard X, Y is a productive scholar/teacher” and “Y is a productive scholar/teacher” gets elided precisely because the standard in question is a consensus definition among people other than those to whose performance it is being applied. So for those outside of the primary flow of academic life, the results that are produced by the application of the metric to academics are impersonally valid for the people doing the application, but not for the academics themselves. So the academics can either a) flat-out reject the metric, which is difficult to do given the general precariousness of academic employment even for people fortunate enough to have tenure; b) attempt to renegotiate the metric, which can look to one’s colleagues like “selling out” and is therefore sometimes difficult to pull off; or c) accept the metric as parametric whether one agrees with it personally or not, and alter one’s activities so that, in terms of that metric, one is successful. This third option doesn’t even require one to apply it to oneself, necessarily; the acceptance of an externally-imposed metric shows up in the advice we give our graduate students, the way we evaluate our colleagues for hiring and for tenure and promotion if and as appropriate, and the way we present ourselves when applying for jobs and grants and the like. So for a variety of reasons, that third option wins out in practice quite often…but that then reinforces the consensus among those imposed the standard that the standard is a good one, and the “objectivity illusion” generated by the application of an impersonal standard gets stronger.
All of which is to say that I complete get the hesitancy on the part of many academics to talk and act in terms of impersonal standards for performance. In any field of endeavor, externally-imposed standards are likely going to meet with resistance, because no one wants to be told how to do their jobs, least of all by people who don’t have any idea what their jobs entail (I’m looking at you, Betsy DeVos). But we need to distinguish between the problem, which is the external imposition of impersonal standards, and the notion of impersonal standards per se. “We should determine for ourselves what counts as acceptable academic performance” is not equivalent to “we should not have impersonal standards for what constitutes acceptable academic performance.” And it is also not equivalent to “we should not endeavor to deliberatively produce consensus on definitions and procedures, but should just let each of us apply her or his own standards,” particularly because while this “live and let live” approach may be the easiest way to keep departmental peace, it is simply irresponsible — especially to the poor junior faculty-member trying to figure out the rules of the game, but in broader terms, to everyone involved. It’s irresponsible because it engenders arbitrariness and idiosyncrasy and the kind of permissive opaqueness that permits all sorts of rank prejudices and discriminatory traditions to reign unacknowledged and unchallenged. As such, “live and let live” is just another way of saying that we have no impersonal standards — no standards on which we have achieved at least a rough measure of consensus such that they can be applied in order to produce impersonally valid results.
So the choice, I would say, is between having explicitly-formulated impersonal standards and not having them, and under conditions where parties external to the academic life are only too happy to impose such standards, it’s up to us — we academics ourselves — to deliberatively generate our own performance standards. Sometimes it might be tactically useful to incorporate, with modification, something in one of the efforts to externally impose a standard, but the key point here is that the metrics we use and the metrics we evaluate our own and one another’s performance by should be our metrics. Impersonal standards instantiate and operationalize a group consensus in ways that generate results which cannot be simply reduced to “because person X said so.” And that matters because making the criteria explicit gives instructions to newcomers, clarifies the rules for everyone, and permits us to call one another to account if decisions are made on grounds other than those agreed-upon impersonal standards. That is what an impersonal performance metric is actually for: clarifying our expectations and helping us to shape our academic lives together. The alternative is losing that capacity altogether, and remaining cogs in a machine not of our own making.
When it comes to research and teaching, we have several examples of impersonal metrics floating around out there, and we can debate the relative merits of each. I’m not interested in wading into that morass at the moment, but I do want to point out that the right starting question for such debates has to be something like “what definition of ‘research’ or ‘teaching’ is being instantiated here?” Especially given the “performativity problem” that results when standards are practically incorporated into the flow of everyday activities, we have to be very careful that whatever metrics we design actually implement the standards we want to implement. We can’t do without metrics, but we need to make sure that the metrics are embedded in ongoing practice rather than standing outside of them — which also ensures that they can be renegotiated and refined as practically needed.
As I suggested at the outset, the issues wth “internal service” in academic life can be addressed in the first instance by clarifying service expectations. Whatever else is going on to produce differential rates of such service among colleagues of different genders (and perhaps, although the study didn’t tackle this centrally, among other categories too: race, ethnicity, religion…), I would bet that we can confront the problem pretty effectively by developing impersonal metrics for internal service. I have in mind something like the “service matrix” that we implemented at SIS a few years ago, in which internal service activities are assigned to different categories and every full-time member of the faculty is expected to have a certain total number of service commitments drawn from different categories. We used categories to distinguish between types of internal service with different levels of time-commitment, mainly; attending a student recruitment event is clearly a different kind of commitment than serving on the university’s Faculty Actions Committee, so those are in different categories and the service matrix specifies equivalences between numbers of activities in different categories so that overall service time is roughly the same regardless of the particular combination of service activities a person engages in. Obviously the trick here is to design the metric properly, so that all the service that needs to get done gets done; we regard the “service matrix” as a work in progress, with things being added and taken off as circumstances change. And importantly, the “service matrix” is owned by the faculty: there was faculty discussion and a faculty vote on it, and changes can be proposed through the normal faculty governance process. So this is not an external imposition, but something that bubbles up from a necessarily imperfect but ongoing deliberative process.
Done right, such impersonal standards can help to address the overall problem by empowering people to say “no” when approached to do yet more service: “sounds interesting but I already have one from column A and three from column D so I don’t have any more service expectation that can legitimately be put on me at this time.” The point of the service matrix, like any impersonal performance standard, is to concretely and operationally define what it means to do enough: in this case, enough service. And while such a metric also allows the impersonal identification of people who exceed the expectations, I’d say that — contrary to the managerial use of metrics — it would be a mistake to consistently reward people who exceeded those expectations, because that would all-too-easily lead to a ratcheting-up of the informal expectations for everyone: “here’s what the metric is, but you’re expected to exceed it.” And then we’d be Lake Wobegon, where all the children are above average…or we’d be in the surreal land of right-skewed distributions of student evaluations, where almost everybody is “good” and the distinction between hundredths of a point on whatever scale one is using are basically meaningless.
My point is that a good set of impersonal performance standards for academics should, in the first instance, help us all figure out what it means to do enough, to pass muster, to hit the mark. Doing more than that is problematic precisely because we don’t all agree on what constitutes academic success, nor should we. This isn’t a competitive sport and it’s not a profit-making business supplying customers with homogenous products. It’s a peculiar, distinctive arrangement set up to produce and sustain “thinking space” for scholars and students alike. But that doesn’t mean it has to be a place where all evaluation of our work is reduced to arbitrary and idiosyncratic chaos. Instead, the trick is to agree on a set of impersonal standards that can actually afford and facilitate the creative work of the faculty in researching, writing, thinking, teaching, and yes, also in doing internal service — and can do so in an equitable way.
1 Technically, it’s a little more complicated than this, since we can agree on the definition of a category but not believe that each of us will apply it in the same impersonal way, as in basically any sport where the enforcement of the rules can disadvantage one or another player. In football, for example, players would never agree among themselves about whether a particular interaction between them was a foul or not, because being found to have committed a foul comes with a penalty. Thus we have officials to make that determination and, as long as they do their jobs well, restore some measure of impersonality to the play of the game. Or we could disagree on the correct definition but simply offer our results as a kind of covert lobbying on behalf of our preferred definition. But those are, I wager, derived special cases, and as such they depend on the notion that impersonal standards produce impersonally valid results rather than constituting a critique of that notion.