IQ Archive
Psychometrics

Validity

What is Validity?

Validity is the most important quality of any psychological test. It answers the simple question: “Does this test actually work?”

For an IQ test to be valid, it must demonstrate that the score it produces corresponds to real-world intelligence. It’s not enough to just generate a number; that number must predict something meaningful, like academic success, job performance, or problem-solving ability.

Types of Validity

To scientifically prove a test works, psychometricians look at three main types:

  1. Construct Validity: Does the test actually measure the theoretical construct of “intelligence”?
    • Evidence: A new test should correlate highly with established tests (like the Stanford-Binet or WAIS). If you score 130 on the WAIS and 70 on a new test, the new test lacks construct validity.
  2. Predictive Validity: Does the score predict future outcomes?
    • Evidence: Professional IQ tests have high predictive validity for grades, income, job complexity, and even health outcomes.
  3. Content Validity: Does the test cover a representative sample of cognitive skills?
    • Evidence: A test that only asks math questions lacks content validity because intelligence also includes verbal and spatial reasoning.

The Problem of “Face Validity”

There is a fourth type called Face Validity: Does the test look like an IQ test?

  • If an IQ test asked “What is your favorite color?”, it would have low Face Validity. Participants wouldn’t take it seriously. However, Face Validity is scientifically irrelevant — a test can look silly but be mathematically accurate.

Validity vs. Reliability

It is crucial to distinguish between Validity (Accuracy) and Reliability (Consistency).

  • Example: If you measure intelligence by measuring head circumference with a tape measure, you will get the exact same number every time (High Reliability), but it tells you nothing about intelligence (Low Validity).
  • Goal: A good IQ test must be both reliable (consistent) and valid (accurate).

Online Tests vs. Professional Tests

This is the main difference between a “real” IQ test and an internet quiz.

  • Professional Tests (WAIS): Have spent decades gathering data to prove they actually measure cognitive ability (High Validity).
  • Online Quizzes: Might output a number, but that number usually has no correlation to actual intelligence (Low/Zero Validity). They measure how good you are at that specific quiz, not how smart you are.

Construct Validity in Depth: How Researchers Establish It

Construct validity is the most theoretically rich form of validity, and establishing it requires a convergence of multiple lines of evidence:

Convergent Validity: Scores on the new test should correlate strongly with scores on other established measures of the same construct. For example, a new fluid intelligence test should correlate strongly (r > 0.70) with Raven’s Progressive Matrices and the WAIS Matrix Reasoning subtest.

Discriminant Validity: The test should not correlate highly with measures of unrelated constructs. An IQ test should correlate with academic achievement, but not with height, shoe size, or typing speed. If it does, the test may be measuring confounding variables.

Factor Analytic Evidence: The internal structure of the test should align with the theoretical model. A test claiming to measure the g-factor should show that all its subtests load strongly on a single general factor when factor analysis is applied.

Developmental Sensitivity: A valid intelligence test for children should show scores that increase systematically with age, since cognitive abilities are known to develop through childhood.

Intervention Sensitivity: If an educational intervention is known to improve certain cognitive abilities, a valid test measuring those abilities should be sensitive enough to detect the improvement.

The WAIS-IV and WISC-V have been subjected to extensive construct validity research across multiple decades, populations, and languages. Their validity is among the most thoroughly documented of any psychological instrument.

Predictive Validity: The Practical Test of IQ

Predictive validity is where IQ testing proves its most practical value. A test with high predictive validity gives us information that is genuinely useful for real decisions — about education, hiring, and research.

The predictive validity of standardized IQ tests is exceptional by social science standards:

  • Academic achievement: IQ correlates approximately r = 0.50–0.60 with school grades. This is the strongest single predictor of academic success among all psychological variables.
  • Job performance: For complex jobs (medicine, law, engineering, research), IQ is the single best predictor of job performance and trainability, with correlations of r = 0.50–0.60.
  • Income: IQ correlates r = 0.30–0.40 with adult income, accounting for a substantial portion of variance in economic outcomes.
  • Health and longevity: Higher IQ is associated with better health behaviors, more effective navigation of healthcare systems, and lower all-cause mortality. A landmark study following Scottish schoolchildren into old age found that childhood IQ was one of the strongest predictors of survival into old age.
  • Creativity and innovation: For highly complex domains (mathematics, science, technology), IQ is a necessary (though not sufficient) predictor of transformative contributions.

These correlations might seem modest, but in social science, where phenomena are influenced by dozens of interacting variables, an effect size of r = 0.50 is very large. No other single psychological variable consistently outperforms IQ as a predictor of cognitive performance outcomes.

Threats to Validity: When Tests Fail

Not all threats to validity come from poorly designed tests. Even a well-validated test can produce invalid scores in specific circumstances:

Cultural Bias: Tests validated on one population may have lower construct validity when administered to culturally different groups. The verbal comprehension subtests of the WAIS assume familiarity with concepts embedded in Western culture; these items may measure cultural exposure as much as cognitive ability in non-Western populations.

Testing Conditions: Severe test anxiety, illness, sleep deprivation, or a hostile testing environment can cause scores to underestimate true ability — reducing the predictive validity of an otherwise good instrument.

Coaching and Preparation: Unlike pure fluid intelligence tests (Raven’s), tests with crystallized components (vocabulary, general knowledge) are amenable to preparation. Scores achieved after intensive tutoring may not reflect the same construct as unprepared scores.

Ceiling and Floor Effects: At the extremes of the distribution, even valid tests lose discriminating power, reducing their validity for the very populations (profoundly gifted, profoundly intellectually disabled) where accurate measurement is most clinically important.

Conclusion: The Foundation of Meaningful Measurement

Validity is the bridge between a number and a meaning. Without it, an IQ score is just a label — arbitrary, potentially misleading, and divorced from the real-world cognitive capacities it claims to represent. With it, a score becomes a window into an individual’s cognitive architecture, a predictor of future outcomes, and a tool for making better decisions about education, employment, and clinical intervention. The extensive validity research behind professional IQ tests is what separates them from the noise of unvalidated online quizzes — and what makes them genuinely valuable instruments in the hands of trained professionals.

Related Terms

Reliability g-factor Construct Validity WAIS
← Back to Glossary