Paramedic Skills Tests Discriminated Against Women
Employers use skills tests to assess whether applicants are a good fit for a job. Prescreening employees can save employers time and help them to get the best person for a job. But the test must be aimed at specific job-related requirements and must not discriminate.
According to a US Department of Labor (DOL) testing and assessment guide, an employment test is considered “good” if the following can be said about it:
- The test measures what it claims to measure consistently or reliably
- The test measures what it claims to measure and not a protected characteristic
- The test is job-relevant
- By using the test, more effective employment decisions can be made about individuals
The DOL says that the degree to which a test has these qualities is indicated by two technical properties: reliability and validity. Regarding reliability, a Forbes.com article notes that “An employer should be able to demonstrate that those who do well on the test do well in performing the job and those who score poorly on the test perform poorly on the job.” Regarding validity, Forbes says that “A test should consistently measure traits; otherwise it will be of little value in predicting a candidate’s future job performance. Similarly with validation, test reliability should be proven prior to the test being implemented.”
Tests don’t have to be exactly the same for men and women. For instance, the FBI requires its Academy trainees to do complete a physical fitness test (PFT) to get into the Academy and again to graduate. As part of the test, male trainees must do 30 untimed push-ups and female trainees must do 14 untimed push-ups. A male trainee who could do only 29 push-ups claimed that the disparate requirements were unfair. However, a Court noted that the FBI was trying to normalize testing standards to account for innate physiological differences between men and women. Because studies showed that the pass rates for male and female trainees were extremely close, the PFT imposed the same burden on men and women. See Bauer v. Lynch.
On the other hand, a test that’s the same for men and women can result in sex discrimination if it has a disparate impact on men and women. The DOL charged a federal food service contractor with sex discrimination because the contractor’s use of a general strength test for entry-level warehouse laborer jobs resulted in the hiring of nearly 300 men but only six women.
Female Paramedics Claimed that Skills Test Resulted in Sex Discrimination
In a recent case involving the Chicago Fire Department, five women charged that Chicago’s physical skills test resulted in sex discrimination. The women were all experienced paramedics, but they all failed Chicago’s physical skills entrance exam.
Before 2000, Chicago hired paramedics without using a test. But in 2000 it hired a company called Human Performance Systems, Inc. (HPS) to create a skills test. HPS’s president, Deborah Gebhardt, had previously created a skills test, for Chicago’s entry-level firefighters, that had a disparate impact on women. The women in this case argued that Chicago’s hiring of Gebhardt to create another test, without accepting competing bids, showed that Chicago wanted to reduce the number of women it hired as paramedics.
Between 2000 and 2009, nearly 1,100 applicants took Gebhardt’s entrance examination; 98% of the men passed while only 60% of female applicants passed. Because Chicago conceded that its physical-skills entrance test had an adverse impact on women, it then had to show that its test was job-related for the paramedic position and consistent with business necessity. Chicago relied on Gebhardt’s validity study to establish that its physical-skills test was job-related.
Creation of the Test
For the paramedic test, Gebhardt tested volunteer Chicago paramedics on physical skills that she had determined were necessary for the paramedic job: a modified stair-climb, leg lifts, arm-strength tests, and other tests. Then she got volunteers’ supervisors and peers to assess the volunteers’ job skills. If the ratings and the test scores yielded comparable assessments, Gebhardt could conclude that her skills tests were valid.
However, the job-performance ratings and skills-testing scores yielded significantly different assessments. Although female paramedics’ performances were close to males’ performances in the job-performance ratings, the women performed much worse on the skills test. But rather than setting aside her skills tests and creating new ones, Gebhardt provided rationales for ignoring the job-performance ratings. Instead she compared the skills test scores with work-sample tests that she designed with input from the Chicago Fire Department: a lift and carry, a stair-chair push, and a stretcher lift.
Federal regulations establish technical standards for validity studies, such as using a representative sample population to create the test and making sure that the primary focus of a test is on specific skills or knowledge that are learned on the job.
In this case, the volunteer paramedics weren’t representative of individuals who are normally available in the Chicago job market. Gebhardt admitted that the volunteer paramedics performed better than public- and private-sector paramedics normally perform.
Reliability: The lift-and-carry part of the test had a reliability score of only 0.503: a 50-50 chance of reliability. Gebhardt made no effort to separate the lift and carry from the rest of the test. Because each skill was validated by correlating it against all three work samples, the unreliability of the lift and carry undermined all three skills that Chicago tested in its physical entrance exam.
Validity: The women questioned whether the work samples used by Gebhardt were a valid measure of job skills, and the Court found that the work samples were much harder than the actual on-the-job required skills. For instance, although real paramedics raise a stretcher and then move, paramedics in Gephardt’s work sample had to lock their arms and raise the patient-laden stretcher up and down for 13 cycles. In addition, the work sample required paramedics to carry all the weight alone, while stretchers are usually wheeled or carried by at least two paramedics.
Because Chicago used the work-sample tests to validate the skills tests without ever validating the work samples, the Court could not conclude that the work samples reflected the “primary focus of” paramedic skills learned on the job.
The Court stated that there is nothing unfair about women characteristically obtaining lower physical-skills scores than men. But it reiterated that a test will be unfair unless differences in scores correlate with a difference in job performance.
The court also suggested that patients could benefit from coed paramedic teams, saying:
Perhaps mixed-gender teams could offer patients a more diverse combination of physical and psychological care than single-gender teams. A female paramedic might fit into a space where a male paramedic does not; a female victim might be helped by having a female paramedic on the team. And in this case, the validated testing of job-related skills simply is not there: it is not enough to show a strong correlation between two tests that Chicago created concurrently. To validate the other test, at least one test must itself be a valid measure of job skills.
This case shows that employers must be careful to test real-job skills and to avoid even the appearance of a conflict of interest in choosing who designs a test. Using a representative sample of people to create a well-designed test can help employers avoid conscious or unconscious biases that may creep into their decisionmaking. As studies have shown, the lack of women in some workplaces isn’t just a pipeline problem: companies sometimes increase the sex discrimination problem themselves. Employers can create a workplace culture of purpose to embrace and benefit from their employees’ diversity.