Professor Norm Matloff
University of California, Davis
(Update, October 26, 2021.)
Hello! I developed this Web page from two major interests of mine, Affirmative Action (AA) and statistics. Currently there are several controversial instances of AA in school admissions contexts in progress:
I believe that the combination of my expertise in statistics and longtime keen interest in AA enables me to present a different perspective on AA than one normally sees in the heated discussions on the topic.
The purpose of this report, then, is to re-examine arguments made on this issue, from a statistical point of view. It will hopefully clarify the issues for those interested in AA, and may even be used as a tutorial in social science or law courses on the use of quantitative analysis.
The discussion will center on the Harvard case.
You can read details of my background in my bio. Briefly, I am a professor of computer science, but am also a statistician. I was formerly a professor of statistics, and I conduct research in statistical methodology. My book, Statistical Regression and Classification: from Linear Models to Machine Learning was selected for the Eric Ziegal Award in 2017.
Since the lawsuit against Harvard regarding AA is being brought mainly by Chinese immigrants (see below), who also spearheaded the opposition to the California ballot measure, the NYC high school reform and the Lowell HS case, it is worth mentioning that I have been active in that community for many years, including being active in the defense of the engineer Wen Ho Lee accused of spying for China. I am a speaker of Cantonese and Mandarin. Among other things, I have been an active participant in discussions on AA in WeChat, a Chinese social media platform.
To be clear, I am not neutral on AA myself, as I am a proponent of AA. I have for instance chaired my university's Affirmative Action Committee (which at the time dealt only with hiring and promotion of faculty and staff, not student admissions). My motivation for this is a deep concern for the well-being of underrepresented minorities and women, and the benefit to our society that comes from improving conditions for those groups. On top of that, in the college admissions case, I am also very alarmed about what I consider rampant gaming of the system.
Underlying the controversies over AA school admissions is a fundamental disconnect in perceived goals:
This difference in perceptions of goals plays a key role in interpreting the statistical data below, so the point is worth some elaboration here before we start the statistical presentaton.
In discussing the universities' perspective, I make the analogy of choosing guests for a dinner party; you want to invite interesting, stimulating people, not necessarily guests with the highest SAT scores. So for instance Harvard may want for example to favor applicants who overcame adversity, or who come from working-class backgrounds.
The two groups -- AA critics and the universities -- also have contrasting views of responsibility stemming from the use of public funds:
Another good analogy explaining the universities' point of view regarding public funds is that of employers, say who receive government contracts. It is illegal to discriminate on race/gender, they might say, but otherwise the government should not be looking over the employer's shoulder and dictate her hiring policy. And again, if she would rather hire someone who overcame adversity even if another applicant has a degree from a fancy school, it is the employer's right to do so.
All this is anathema to the AA critics, who believe university admissions processes should be transparent, not subjective.
Note carefully that AA in college admissions is aimed in obtaining a diverse student body, not only in terms of race but also with respect to gender. The latter is just as important as the former, yet is seldom mentioned in the debates on AA. Keep both aspects in mind here.
One of the goals of AA is to develop role models, and thus transcends mere socioeconomic class. Former Pres. Obama was and is a great role model, in spite of having attended a wealthy private high school. Obama is a graduate of the Harvard Law School.
Note too that though this essay focuses on college admissions, AA is far broader than that, such as Minority Business Contracts and Minority Business Loans. Many of these AA policies include Asians. (I will use the latter term for brevity below, but mean Asian-Americans rather than e.g. Asian foreign students.)
(The trial was held in 2018. Later, in 2021, Prof. Card was awarded the Nobel Prize in Economics, and some defenders of AA felt that that bolsters their case. But of course that is not true; in fact, even though I am a proponent of AA, I will note that Card and I served as expert witnesses in an unrelated discrimination case, which was ultimately settled.)
The following statistical concepts will pervade the discussion here.
When assessing the relation between X and Y, it is often crucial to account for a third variable Z, termed a covariate.
Consider for instance a lawsuit in which an employer is accused of gender discrimination, say with an allegation that women were more likely to be laid off than men. Here X is gender and Y is being laid off. The covariate might be Z = rating at the employee's last review. If women are more likely to be laid off even compared to men of the same rating, there is a good case for claiming discrimination. If on the other hand women tend to get lower ratings, things are not so clear.
Often several covariates are important. In the above example, let's rename the rating Z1, and consider the profitability of the department in which the employee works. The firm may be laying off more heavily in departments that are less profitable. But if, say, more women than men are laid off in the same department, that would support the claim of discrimination.
The covariate issue will arise frequently in the material below.
A related issue is that of correlation. The larger the number of covariates involved, the smaller the correlation between X and Y becomes. So the unconditional correlation may be large but the conditional could be small. In many cases, the latter is the more relevant one.
The other point about correlation, of course, is the famous saying, "Correlation is not causation." X and Y might be related only because they are both related to Z.
The Harvard data is so voluminous that almost any comparison would be "statistically significant." But that can be misleading, since with large samples one can detect nonzero but every small, unimportant differences.
On the other hand, a dataset may be too small for good statistical conclusions. As we will see, not all data related to Harvard admissions are large.
A statistical analysis is only as good as the quality of the data on which it's based. Two questions must be considered:
In the employment example, suppose the ratings Z are problematic, say relying on traits not well related to the employee's job tasks. Or worse, what if the raters are biased? This would then undermine the analysis.
This describes a situation in which the effect of X on Y will have different values, even different signs, at different values of Z. We will see examples here in which being Asian is a plus for women but a negative for men.
A variable X is said to be a proxy for Y if X can be used as a substitute for Y. The substitution may not be perfect, but still may be useful.
These concepts will be treated in a nontechnical manner, and indeed they may seem "obvious." But recognizing them in subtle, complex settings may not be easy, as we will see.
With the above preparation, we can now delve into AA statistically.
Both expert witnesses, Card and Arcidiacono, agree that the Asian applicants to Harvard differ from other groups. In particular, the Asians tend to have very solid academic credentials, e.g. grades and SAT scores. Indeed, most Harvard applicants have strong academics, so much so that the Harvard admissions office uses the term Standard Strong. The meaning is, in essence, "Applicant has strong academics like everyone else, but not much special otherwise," so it is a negative.
Arcidiacono presents data showing that a disproportionate share of Asian applicants are rated Standard Strong, which he offers as evidence of discrimination. Yet what matters is whether two applicants, both rated Standard Strong and with similar covariate values, but one Asian and the other non-Asian, have the same probability of being admitted to Harvard.
An example of such a covariate would be geographical region, say Wyoming, which as a rural state tends to be favored by Ivy League schools for geographical diversity. The question then becomes, for example, among Standard Strong applicants from the same state, do Asians and non-Asians have the same chance for admission?
The critics of AA often cite a study by Princeton's Thomas Espenshade that found that Asian-American applicants needed total SAT scores 140 points higher than whites to be admitted to elite schools. The critics interpret this as a consequence of those schools' race-conscious admissions policies.
Actually, Espenshade himself has warned that this does not necessarily prove there is discrimination against Asians. He notes the problem of missing covariates; in his analysis, he did not have access to the applicants' teacher recommendation, for instance. If, say, in comparing Asian and non-Asian applicants with the same quality of recommendations, the Asians needed higher SAT scores to be admitted, a case might be made for evidence of discrimination, though of course other covariates must be considered as well. (See for instance the discussion of Caltech admissions below.)
In addition, there is the issue noted earlier of statistically significance vs. practical significance. That 140-point figure is actually not very large. If it is, say, 70 points for each of the Math and Verbal sections, this is a mild effect when viewed in the context of the much larger SAT variation, as seen in a later section.
Both Professors Card and Arcidiacono agree that the SAT actually plays a very small role in the admissions decision. However, it is important to understand how that comes about.
For instance, Arcidiacono found that the correlation (actually, logit coefficient) between being admitted and SAT scores, with the other covariates being fixed, was close to 0 (Document 413, p.19). This may seem quite surprising at first, but statistically it goes back to our earlier point that, as one adds more and more covariates, correlations get smaller and smaller.
As noted earlier, most applicants to Harvard have high scores. Since differences among high scores are not very important (also noted above), at that level the SAT ceases to be much of a factor in admission. As Card notes, "non-academic factors (taken together) explain more than three times as much of the variation in admissions decisions as the academic rating does. That should not be surprising, since exceptional non-academic qualities are less common in the applicant pool than exceptional academic qualities and are thus more likely to distinguish applicants from one another." Those "non-academic factors" are covariates here.
The situation is like that of the NBA. In the general population, there will be a substantial correlation between height and basketball talent. But in the NBA, where everyone is big, height has a much lower correlation with success.
Again, recall the term Standard Strong. Harvard applicants typically have strong academic records, so nonacademic aspects are what count most.
One way to view the problem of omitted covariates is that it overly aggregates the data. It's like the old joke that if you put one hand in freezing water and the other in boiling water, then on average you feel fine.
Often some important covariates are omitted because they simply aren't available. But sometimes they are available but aggregated out anyway.
An instance of this latter is noted by Card regarding Arcidiacono's analyses is that they aggregate several years worth of admissions data, quite problematic in that admissions criteria change every year. (My side note: My impression has been that universities do this to try to counter gaming of their system, so they can stay one step ahead of the applicants.) Aggregating such data is extremely dangerous. Here the covariate is Year. Arcidiacono actually does use Year in some of his analyses, but this can produce quite misleading results if not done properly.
Another example involves Gender as a covariate. Card finds that actually being Asian is a plus among women applicants. This of course may reflect the fact that AA covers not only race but also gender; more on this later.
While Asian groups include those with heritage in East, Southeast and South Asia, the vast majority of Asian activists opposing AA are specifically Chinese, even more specifically, recent Chinese immigrants in tech and the professions. See reports: here, here, here, here. Chinese-immigrant activists have extensive use of the WeChat app to coordinate their efforts in the Harvard and UC cases, and in the New York City high schools case.
This also shows up in research by Profs. Karthick Ramakrishnan and Janelle Wong, which shows a stark difference between the Chinese and other Asian groups; Chinese support for AA has plummeted in recent years, while support among other Asians has risen slightly.
To be sure, many Chinese-Americans do support AA, and some non-Chinese Asian-Americans oppose it. However, the major impetus is Chinese.
From a statistical point of view, the specific ethnicity within the pan-Asian umbrella may be an important covariate. Hypothetically speaking, Harvard for instance may be favoring some Asian ethnicities while disadvantaging others. This was not brought out in the lawsuit, as the plaintiffs merely alleged discrimination against "Asian" applicants, but it is something that careful analyses may consider.
I've found that many Asian immigrants who oppose AA are assuming that the US system mirrors that of their native lands, with a planned multi-tier hierarchy of "eliteness." In the UC system, they observe that Berkeley is most selective, UCLA is next-selective and so on, and assume that this was planned.
This of course is incorrect, an example of the point that "Correlation is not causation" mentioned earlier. X and Y might be related only because they are both related to Z.
The Berkeley campus was established early in California's history. Later, a second campus was added, UCLA, for those residing in the southern part of the state . As the population grew, more campuses were added for residents of other areas in the state, just as hospitals were added for residents of various parts of the state.
Rankings of universities are largely determined by research reputation, and the older UC campuses have developed their research prowess over longer periods of time. So, the older campuses are the more prestigious, hence draw more applicants, hence have the most selective admissions bars. It is not the case that the state planned a hierarchy of eliteness.
Analysis by conservative writer Ron Unz compared the time trends of Asian enrollment in selective colleges vs. number of college-age Asians nationwide, from 1990 to 2011. He found that Asian enrollment at Caltech had tracked the number of Asians in the U.S., but that the Ivy League college Asian enrollments seem to have hit a plateau.
Unz interpreted that as clear evidence of discrimination against Asian applicants by the Ivies, due to race-conscious policies at those schools. However, Caltech also has an AA policy, so that explanation fails. A much better explanation is as follows.
The Asian applicants tend to major in STEM fields. For example, at Stanford, reportedly 46% of Computer Science majors are Asian, even though only 23% of the student body as a whole is Asian. This has major (pun intended) implications for the "plateau" effect described by Unz: The Ivies, as liberal arts schools, simply don't have as many slots open for STEM applicants as does Caltech, a STEM school. Asians, a STEM-heavy group, thus would find more difficulty getting admitted to the Ivies -- with no discrimination involved. Again, covariates matter, in this case Major and School Type.
At the same time, it may explain Card's finding that female Asians fare better than non-Asian women. There has been a nationwide concern that not enough women major in Engineering, especially Computer Science (CS). Female Asian students also tend to be STEM-oriented, so all of this may result in university admissions committees being favorably inclined to admit female applicants for, say, CS, including Asian females.
Note too that this also illustrates the principle of correlation not necesarily showing causation. If the CS explanation is correct, Card's findings don't show that being Asian per se helped those female applicants, but rather that their interest in STEM gave them the boost.
Most of the data analyses done by both sides for this lawsuit have no problems with sample sizes, which are quite ample for statistically valid conclusions.
However, an interesting sample size issue arises in one aspect. The plaintiffs state that Asian applicants get higher ratings from alumni interviewers than what is seen in the Personal Rating. But an important component of the latter is teacher letters of recommendation, which the alumni interviewers do not see. Statistically speaking, a teacher, who has seen the student for an entire year, can provide more accurate information than can an alumnus/a who chats with the student for 15-30 minutes.
There is also a problem in that an alumnus/a who was a student at Harvard 25 years ago is judging applicants by the standards of that era, when admissions standards were, though high, not nearly as draconian as the current ones. This is called bias in statistics, meaning a systemic error in the estimation process rather than not having enough data.
Much has been made by the press, and indeed by Prof. Arcidiacono, of the Personal Rating that is part of the Harvard admissions process. A key question then is, What does this rating actually measure? In other words, there is a data quality issue. Given the widespread concern, e.g. that Harvard was stereotyping Asian applicants as "quiet" or even "dull," it is worth discussing this in some detail here.
Arcidiacono found that the Asian applicants tended to have lower scores in Personal Rating. Unfortunately, this has been misinterpreted as some kind of rating of, say charisma. In fact, it is far broader than a personality rating.
Recall my "dinner party" analogy. Just as a highly successful person who grew up in a small town in Wyoming may make an interesting dinner guest, Harvard values this in its student body as well, and so do the other highly selective colleges. Such a factor then becomes part of an applicant's Personal Rating. (Stanford's term, the Personal Context, better captures this notion.)
From the Card report:
As noted above, family background provides important context for each applicant's achievements. Exhibit 64 and Exhibit 65 (Appendix C) show that the parents of Asian-American and White applicants tend to have different types of occupations. 33% of fathers and 16% of mothers of Asian-American applicants work in the fields of 'Computer and Mathematical,' 'Life, Physical, Social Science,' or 'Architecture and Engineering,' while only 16% and 5% (respectively) of fathers and mothers of White applicants work in those fields. Such differences can reflect not just differences in a family's economic prosperity but also differences in applicants' life experiences. For example, if the son of a professional writer and the son of a police officer display talent in writing, Harvard might regard the latter's talent as more impressive than the former's. The same might be true of the daughter of professional scientists and the daughter of factory workers, both of whom exhibit talent in a scientific field.
Again, this would show up in the Personal Rating -- without any discrimination involved. (Of course, a skeptic may counter that Harvard intentionally chose such criteria as a method of reducing Asian admissions. This charge is not a statistical issue, thus beyond the scope of my document here.)
Many would object that the Asian applicants should not be penalized for having been raised by well-off parents who can nurture them into academic excellence, especially in STEM. But from Harvard's point of view -- again, the "dinner party" analogy -- it is not a matter of "penalizing" anyone, just as someone you might not choose for your dinner event is not "penalized" and you are not being "unfair" in excluding them. Or if you are an employer, you are not "penalizing" job applicants who due to privileged upbringing lack some quality you admire.
In other words, the fact that Asian applicants, on average, have lower scores on the Personal Rating category is largely a reflection of the comfortable middle- or upper-class upbringing many of them have enjoyed, rather than some "defect" in their personality.
Again, the Harvard Personal Rating is far broader than a personality rating. But to be sure, the Personal Rating does include some charisma-like aspects, and critics of Harvard's admissions process suspect that the stereotype of "the quiet Asian" may be part of reason for the lower average score for Asian applicants. On the one hand, there is no denying that, on average, the Asian students do tend to be quieter, as educational researcher Jianhua Feng has explained. On the other hand, unconscious bias of this sort can never be ruled out; it is certainly possible that some Harvard admissions officers could be biased in rating some Asian applicants as "dull dinner partners." And for that matter, applicants' teacher recommendation letters, which are also part of the Personal Rating, may be biased in this respect as well.
Unfortunately, demonstrating this statistically would be extremely difficult. The mere fact that the Asian applicants to Harvard have a lower Personal Rating tells us little. Was this because rather few of them are from Wyoming or are children of police officers? Or was it because the Asian applicants are less charismatic? Or was it unconscious bias? Maybe a combination of all of the above? There simply isn't data to determine this.
A central argument of AA critics is that university admissions should be based on academic "merit," meaning SAT/AP test scores, grades, awards and so on. Putting aside the fact that selective schools like Harvard operate under a "dinner party" goal rather than "best athlete," here again we have a data quality issue -- are those test scores etc. really measurements of the applicant's merit, or do they reflect privilege, as claimed by some AA proponents? There are serious issues here regarding data quality and relevance.
Many older critics of AA don't realize that the world of the SAT has changed radically since the time they were in college. The rise of the SAT coaching schools is one major new aspect, reportedly worth 115 points. It may be even more in some individual cases but in any event the point is that this phenomenon reduces the statistical value of the SAT.
The Chinese community in particular makes use of the coaching schools. One popular -- and reportedly highly effective -- example is the IvyMax chain. (Its Chinese-American founder Steven Ma even offers application strategy services utilizing machine learning techniques. More on tis later.)
Perhaps partly as a result of the coaching schools and so on, the SAT may be becoming less statistically stable than in the past. In June 2018, a test anomaly resulted in the Math portion being scored as much as 150 points lower than usual, causing an uproar.
There is more than just the effects of SAT coaching services themselves. Many of the Asian applicants went through Kumon, a math drill service, when they were in elementary school. As a mathematician, I recommend against this, but it certainly does develop in kids the ability to take intensive, timed tests like the SAT later on.
Indeed, there is often much more. Many Asian applicants enjoy enormous familial advantages, such as: Kumon in elementary school; Math Camp in junior and senior high school; SAT coaching; AI-aided strategy for building a re'sume' for college admission, as with IvyMax above; access to university research mentors and schools that groom students to do well in science research contests (see below); etc. Remarkably, even many working-class Asian parents will engage in some of this, borrowing money from relatives.
It is commendable that these parents are so highly committed to providing educational opportunities for their children. But in assessing academic merit, this clearly presents a data quality problem. Often the high test scores, awards and so on of such applicants come in part from familial advantage. (More on the awards below.) This greatly undermines the claim that these students should be admitted on the basis of "merit."
Some AA advocate call for the SAT and similar tests to be eliminated from admissions processes entirely, claiming that even without coaching schools and the like, the SAT advantages the wealthy. They often cite a vocabulary word from one of the old tests, regatta, as advantaging the wealthy. But any student who enjoys reading -- the public libraries are free -- will pick up such words naturally, without having attended a regatta. As a longtime educator in STEM, I do believe the SAT has value in assessing a student's readiness for college, and definitely believe it should play a role in admissions.
Much has been made of the fact that in the national high school science contests run by Intel and Siemens, the semifinalists, finalists and winners have been disproportionately Asian (Chinese and Indian). But again we have a data quality problem -- exactly what is this awards data measuring?
As I wrote in Bloomberg View, the implication that the top entrants (of whatever ethnicity) are the nation's best young science talents is highly misleading.
First, one must note that the top entrants tend to come from just a few schools/school districts, largely on Long Island but also in key spots around the nation, which have special programs to groom their students to do well in the contests from Day One. Note, for instance, that the Half Hollow Hills District on Long Island actually has a position titled Academic Research Director. Among other things, these schools link up their students to university researchers, in whose labs the kids work.
Second, typically the work done by the students does not come from their own ideas; they are simply carrying out experiments designed by their university mentors. From my Bloomberg op-ed:
During that work [done under the direction of the university researcher], the student will come up with ideas for refinements, but a focus on "their solutions" [to deep scientific problems] is exaggerated. Those "High School Student Finds Cure for Cancer" headlines are seriously misleading.
Martin Rocek, a university mentor for one Intel semifinalist, recounted for the New York Times how he interacted with the student. Rocek found a "not exceedingly technical" topic in math, gave the student tutorials and suggested the calculations to be done.
Professor Miriam Rafailovich, who runs an organized mentoring program for high school researchers at SUNY Stony Brook, told me in an email interview that the contestants "get massive coaching from the schools"...
As [a book] and Rafailovich point out, a big motivation for many contestants is to bolster their admission chances to selective colleges. That is a fine goal, but it also explains why many contestants have immigrant parents -- who often have a "Harvard or bust" viewpoint. Those kids are more likely to participate.
Knowing the emphasis colleges place on extracurricular activities, many anxious parents have their children engage in a maximal list of such activities. Again this is a data quality issue, and the hapless admissions officers must try to divine which activities show "dinner party" quality and which are simply done to build up a re'sume' for college applications.
As noted, coaching services for the SAT and so on are extremely popular in Asian-immigrant communities, even among working-class parents. But the services can go far beyond merely SAT prep.
Here are some details on how services such as the aforementioned Chinese business IvyMax and ThinkTank work, described in an article by Stephanie Ban concerning CEO Steven Ma. Ban offers insights of the benefit of such services, the special value placed on them by Chinese parents, and even the statistical data quality aspects I've discussed here:
Apart from Ma's prices, his so-called guarantee of admission is falling under criticism from admissions officers and internet moguls alike. Stanford's dean of admissions calls Ma's approach "gaming the system..."
Ma acknowledges that his program might be putting too much emphasis on getting into top-tier schools, but claims that the main perpetuator of this emphasis is the parents of the children that he works with. Ma boasts a high success rate to back up his assertions about the efficacy of his tutoring... Ma points out that immigrant parents in affluent areas generally view admission to a top-tier school as the primary indicator of success...
[Ma] already takes care of the measurable criteria in admissions and gives students more opportunities to showcase the immeasurable qualities. Ultimately, Ma is just acting as a knowledgeable (if overpriced) guidance counselor with a mind for statistics. I think admission to a top-tier school will always be a bit of a gamble, but as Ma shows, it helps to have the cards stacked in your favor.
A 2017 interview with Ma goes into more detail, noting that he can even game the nonacademic aspects of college applications:
Ma's company, founded in 2002, claims it can debunk the mystique around college admissions through a secret algorithm he designed. It considers academic components like GPA and SAT scores as well as non-academic ones like community service and after-school activities, he said...
The mathematical model, according to Ma, has proven very accurate and reliable in the past 10 years.
He claims it can correctly predict the chances of students getting into their top-choice school 93 percent of the time.
After crunching the numbers, Ma might learn that more extracurricular activities are needed to bulk up an application. To that end, ThinkTank Learning can arrange internships, say, with a radio station or press secretary of an elected official for someone thinking of majoring in journalism.
That last point is similar to the above point on the special "Siemens high schools" arranging research positions for their students at local universities.
See also many details in this article in The New Republic by Clio Chang.
To discuss the issue of proxies -- one variable substituting for another, even if imperfectly -- consider a variable Disadvantaged, combining effects of family income, parents' education and so on, and a variable Race. There is some correlation between the two, so each might potentially serve as a proxy for the other. This comes up with critics of AA in a couple of ways:
The universities deny the second, and say that the first won't bring the rich diversity they feel crucial to their mission.
It is important to note that neither of the above race-related points applies to gender. Giving an admissions plus to low-income students would do nothing to increase the number of women in the student body, e.g. in gender-imbalanced institutions such as Caltech and MIT.
Hopefully this document has made it clear that there are important statistical considerations that are often overlooked in discussions about AA. Simplistic arguments on both sides, e.g. with the rallying cries "We need diversity!" or "Admit on the basis of merit!" are not informative.