Government's College Rating System Lacks Details

After more than a year in development, the U.S. Department of Education released a draft framework for its much-anticipated Postsecondary Institution Rating System (PIRS) on Friday, December 19, 2014.

The draft framework is incomplete and unfinished. In any kind of a rating or ranking system, the details matter, and in this case, the most important details are missing. By now, the U.S. Department of Education should have finalized the list of data sources, the metrics that will be used to rate the colleges and, most importantly, how the various metrics will be combined. But, the draft framework is tentative even about the nearly a dozen variables that will be the basis for the metrics.

Three Main Groups of Metrics

The metrics will focus on access, affordability and outcomes, but not the quality of education.

The access metrics will be based on

Percent Pell – the percentage of the student body that have received a Federal Pell Grant
EFC Gap – the average difference between each student’s Expected Family Contribution (EFC) and an unspecified baseline EFC
Family Income Quintiles
First-Generation College Status

As previously reported by the Chronicle of Higher Education on August 15, 2014, a key problem with family income quintiles as currently used by the U.S. Department of Education’s College Navigator is that the colleges can currently choose which students are in each quintile. Some colleges use adjusted gross income (AGI) while others use their own method.

The affordability metrics will be based on

Average Net Price
Net Price by Quintile

It is unclear, however, whether the net price figures will be averages for just the first year or for the full length of the education program. About half of all colleges practice front-loading of grants, where first-year students get a better mix of grants than during subsequent years. This yields a higher net price for upperclassmen.

The proposal omits consideration of average loan debt as an affordability metric. Colleges often talk about loans as making college more affordable, but don't consider whether students are graduating with unaffordable debt. Affordable debt can be measured using debt-to-income, debt-service-to-income and debt-service-to-discretionary-income ratios.

The outcomes metrics will be based on

Completion Rates – the percentage of the incoming class who graduate within a particular time-frame
Transfer Rates – the percentage of the incoming class who transfer to another college
Labor Market Success – the percentage of a college's graduates whose income at an unspecified time after graduation exceeds a baseline income level, such as 200 percent of the poverty line or the Federal minimum wage
Graduate School Attendance – the percentage of the graduating students who subsequently enroll in graduate or professional school
Loan Performance Outcomes – one or more measures of the performance of a college's loan portfolio, such as the percentage of borrowers whose loans are negatively amortized, the percentage of initial loan balances that are repaid or a loan repayment rate based on adding deferment and forbearance rates to default rates

The discussion of transfer rates provides a good example of how the U.S. Department of Education’s proposal lacks decisive detail concerning the metrics. The framework does not specify how it will handle transfer students, both students who transfer out of an institution and student who transfer into an institution. Many current data sources calculate outcomes based solely on first-time, full-time students, omitting students who transfer into a college and students who enroll part-time. Will transfer students count if they do not eventually graduate? The framework also does not specify whether it counts only students who transfer from 2-year institutions to 4-year institutions, or also lateral transfers (from 2-year to 2-year and from 4-year to 4-year) and reverse transfers (from 4-year to 2-year). There are more questions here than answers.

The loan performance outcomes metric does not use a direct measurement of the percentage of a college's borrowers who graduate with excessive debt, such as debt-to-income ratios, debt-service-to-income ratios or debt-service-to-discretionary-income ratios that exceed a particular threshold.

It is unclear why the U.S. Department of Education chose to benchmark labor market success against 200 percent of the poverty line. Eligibility for means-tested federal benefit programs is often based on comparing income with 100 percent, 125 percent, 130 percent, 150 percent or 185 percent of the poverty line. Although eligibility for the Federal Pell Grant is not currently based on a comparison of income with the poverty line, 84.4% of recipients have income below 200 percent of the poverty line, 93.0% have income below 250 percent of the poverty line and 97.4% have income below 300 percent of the poverty line, based on the 2011-12 National Postsecondary Student Aid Study (NPSAS).

The discussion of the impact of demographics is likewise minimal. The U.S. Department of Education proposes using a regression model based on family income, Expected Family Contribution (EFC), dependency status, parent educational attainment, age, gender, marital status, has children, veteran status, zip code, state of residence, transfer status and enrollment intensity, but not academic preparation (e.g., high school GPA and SAT/ACT scores), academic program or race. This is despite research that demonstrates a strong correlation between college completion and academic preparation. Without consideration of high school GPA and admissions test scores, it is not possible to distinguish between the selectivity of a college and the value added by the college. More than half of the better performance of elite colleges is due to their aggregating the most talented students. Accordingly, the ratings system may be meaningless in evaluating college performance, other than determining whether each college satisfies a set of minimum standards.

The current framework proposes to cluster predominantly 4-year institutions together and predominantly 2-year institutions together. This means that colleges that offer a mix of 4-year and 2-year degree programs will be compared with dissimilar institutions. The ratings need to be disaggregated by academic degree level, so that institutions that offer both Bachelor's degrees and Associate's degrees will have two ratings, one for each credential. The ratings also need to be disaggregated by or adjusted for other important institutional characteristics, such as selectivity, average endowment per student and state support per student, among other variables. Moreover, the outcome measures should perhaps be disaggregated by or normalized according to some of the access variables, such as Federal Pell Grant recipient status.

How will Metrics be Combined?

The method used to integrate the metrics is one of the most important unanswered questions. The proposal calls for a small number of unspecified performance categories, with three performance levels within each category: high, middle and low.

To the extent that the rating system is an attempt to influence consumer behavior – to shift enrollment to colleges that offer more “value” – it is unclear if having just three labels, as opposed to finer gradations, provides sufficient information to be useful to students and their families. There is a conflict between simplicity, accuracy and utility. In effect, the rating system may represent little more than a new set of minimum standards that colleges must satisfy for continued receipt of federal student aid funding. The limited information contained within the rating system may also make it difficult for colleges to determine what changes they need to make to improve their ratings.

Another risk is that the weights used to combine the metrics can be shaped to achieve a particular outcome. The choice of model will affect how the colleges are rated. One person’s definition of “value” may differ from another person’s definition. The design of the ratings system may be driven more by ideology than by policy considerations. The final ratings scheme may be based on subjective priorities as opposed to objective analysis.

There may also be significant unanticipated outcomes. For example, if the metrics seek to maximize graduation rates, as opposed to the number of college graduates, colleges may become more selective in their admissions policies. The easiest way to increase graduation rates is by filtering out high-risk students, such as low-income students, minority students, first-generation students and students who are single parents. That is a lot less expensive than removing the obstacles to student success. Helping high-risk students reach the finish line is more expensive than denying them access to a college education. There is a tension between access and completion, mediated by money.

Similarly, if the U.S. Department of Education were to try to improve graduation rates by setting minimum graduation rates for institutional eligibility for Title IV federal student aid funds, it would yield a decrease in the number of students graduating. (It would also lead to graduation inflation, as some schools would relax their graduation requirements.) Currently, 58% of students in Bachelor’s degree programs graduate within six years. If the U.S. Department of Education were to get rid of all colleges where less than half the students graduate in six years, 6.7 million students would lose eligibility for student financial aid, a total of $24 billion a year. While this change would increase the graduation rate by 14 percentage points, to 72%, it would decrease the number of students graduating by 26%. It would also decrease the number of colleges by 55% and the number of college students by 41%.

Consider a student who is a single parent. If the student’s babysitter is sick, the student has to stay home from school, missing class. The student’s academic performance deteriorates. If the student misses enough classes, he or she may be forced to drop out of college and may eventually default on his or her student loans. If faced with pressure to increase graduation rates, some colleges will change their admissions standards in ways that directly or indirectly prevent these students from obtaining a college education. Other colleges will respond by providing these students with access to reliable childcare resources, enabling them to graduate at the same rates as students who aren’t single parents.

Accuracy Matters

The proposed rating system will suffer from the GIGO problem. GIGO is an acronym for “garbage in, garbage out.” If one provides a computer program, such as a rating system, with inaccurate input data, it will produce inaccurate output results.

Available data is limited, suffers from accuracy problems and may be prone to manipulation. The main data sources include the Institutional Postsecondary Education Data System (IPEDS), the Common Data Set (CDS), and the National Student Loan Data System (NSLDS). IPEDS and CDS both contain unverified school-reported aggregate data. Several colleges have admitted to submitting inaccurate data to CDS, which is used by many college ranking systems. For example, some colleges based the debt-at-graduation figures reported to CDS on just need-based loans instead of all loans. Others calculated a mean debt-at-graduation figure instead of an average figure, by dividing the total debt at graduation by the total number of students who graduated instead of just the students who graduated with debt. Even NSLDS, which contains individual student-level data, suffers from data quality issues.

The data will also be prone to manipulation. For example, colleges will become more selective by admitting the least risky of the high-risk students. They might continue admitting Federal Pell Grant recipients, but limit these students to those who have the highest high school GPA or admissions test scores, or limit the students to the highest-income subset of Federal Pell Grant recipients.

Slow Progress

One could argue that the rating system has been in development since the White House introduced the FY2010 budget proposal on February 26, 2009. Page 378 of the U.S. Education of Education appendix to the FY2010 budget included a proposal to reallocate Perkins Loan funding according to a yet-to-be-determined formula:

“Loan volume would be allocated among degree-granting institutions using a method to be determined in consultation with Congress. The Administration intends for this new formula to encourage colleges to control costs and offer need-based aid to prevent excessive indebtedness. It may also reward schools that enroll and graduate students from low- and moderate-income families.”

A variation on this proposal was introduced in the U.S. House of Representatives on July 15, 2009, as part of the Student Aid and Fiscal Responsibility Act of 2009 (111th Congress, H.R. 3221), also known as SAFRA. The legislation proposed to base the allocation formula for Perkins Loans in part on incentives for colleges to maintain below-average tuition and to graduate Federal Pell Grant recipients. The legislation passed the House on September 17, 2009 by a vote of 253 to 171, but the proposals for reengineering the Perkins Loan were dropped in the Senate in what ultimately became the Health Care and Education Reconciliation Act of 2010 (P.L. 111-152).

The current rating system proposal might be little more than the latest reincarnation of SAFRA.

The U.S. Department of Education is soliciting comments from the public to [email protected] through February 17, 2015. These public comments will be informal, not through the regulatory process. It is unclear how the U.S. Department of Education will be able to finalize the details of the proposal and hold another public comment period on the final proposal in time to have the rating system implemented before the 2015-2016 academic year.

The Challenge of College Ratings and Rankings

Commercial organizations that attempt to evaluate college performance and create college rankings and ratings are subjected to significant criticism every year. So, why does the U.S. Department of Education believe that it can do better?

The U.S. Department of Education has a weak track record so far. For example, the College Scorecard reports the median debt of college graduates and dropouts combined. This yields little value to prospective students, who want to understand the likely debt burden if they graduate and the risks associated with failure. Blending the two figures into a single metric is not informative. Even if the U.S. Department of Education were to report the median debt of college graduates and dropouts separately, use of a single metric for each metric is still largely uninformative. Colleges where less than half the students borrow would report a median debt of zero. More information is necessary, such as the percentage of students graduating with debt and the mean/median debt among students who graduate with debt. Even then, the debt figures may be prone to the Yule-Simpson effect, where trends can disappear when groups of data are combined.

Similarly, the U.S. Department of Education’s attempts to develop a gainful employment metric were flawed by an arbitrary choice of thresholds for debt-service-to-income ratios. These thresholds were based on a small subset of old mortgage lender underwriting standards for overall and mortgage debt, as reported in a single paper, P. Scherschel, Student Indebtedness: Are Borrowers Pushing the Limits?, USA Group Foundation, 1998. (Most of the other papers were merely repeating the same set of flawed figures.) This derived threshold was not based on any kind of universal standard. For example, the U.S. Department of Education used a different threshold internally at the time. Moreover, the U.S. Department of Education’s gainful employment regulation effectively required all college graduates to be able to qualify for a mortgage as its criterion for gainful employment.

Conclusion

The U.S. Department of Education seems eager to move forward with a rating systems despite acknowledging numerous limitations and flaws.

The first principle of a rating system should be to do no harm.

The U.S. Department of Education needs to have the courage to abandon the rating system if, as seems apparent, current data sources are inadequate to do a credible job.

To paraphrase an idea from Stephen Jay Gould’s book, The Mismeasure of Man, just because you can calculate a number doesn’t mean it measures anything real.