Eine Studie untersucht, wie sich ein „Stereotype Threat“ auf die Leistung auswirkt. Der Stereotype Threat, also die Bedrohung durch Stereotype ist ein bei Vertretern rein sozialer Begründungen gern angenommener Grund auf den die Geschlechterunterschiede zurückgeführt werden. Aus der Wikipedia dazu:
Bedrohung durch Stereotype (engl. stereotype threat) ist die Angst von Mitgliedern einer sozialen Gruppe, ihr Verhalten könnte ein negatives Stereotyp gegen diese Gruppe bestätigen. Dadurch kann es zu einer selbsterfüllenden Prophezeiung kommen, wenn nämlich diese Angst das Verhalten im Sinne des Vorurteils beeinflusst. Insbesondere in Testsituationen kann sich die Angst leistungsmindernd auswirken. Bedrohung durch Stereotype kann zum Beispiel Angehörige ethnischer Minderheiten und Frauen treffen.
Although the effect of stereotype threat concerning women and mathematics has been subject to various systematic reviews, none of them have been performed on the sub-population of children and adolescents. In this meta-analysis we estimated the effects of stereotype threat on performance of girls on math, science and spatial skills (MSSS) tests. Moreover, we studied publication bias and four moderators: test difficulty, presence of boys, gender equality within countries, and the type of control group that was used in the studies. We selected study samples when the study included girls, samples had a mean age below 18 years, the design was (quasi-)experimental, the stereotype threat manipulation was administered between-subjects, and the dependent variable was a MSSS test related to a gender stereotype favoring boys. To analyze the 47 effect sizes, we used random effects and mixed effects models. The estimated mean effect size equaled −0.22 and significantly differed from 0. None of the moderator variables was significant; however, there were several signs for the presence of publication bias. We conclude that publication bias might seriously distort the literature on the effects of stereotype threat among schoolgirls. We propose a large replication study to provide a less biased effect size estimate
Einer der Forscher ist auch bei Twitter.
Sie haben also einen kleinen Effekt gefunden und gehen bei diesem davon aus, dass er sich eher durch eine Publikationbias erklären läßt. Dazu:
Der Publikationsbias ist die statistisch verzerrte (engl. bias [ˈbaɪəs]) Darstellung der Datenlage in wissenschaftlichen Zeitschriften infolge einer bevorzugten Veröffentlichung von Studien mit „positiven“ bzw. signifikanten Ergebnissen. Er wurde 1959 von dem Statistiker Theodore Sterling entdeckt. Positive Befunde sind leichter zu publizieren als solche mit „negativen“, also nicht-signifikanten Ergebnissen und sind zudem häufiger in Fachzeitschriften mit hohem Einflussfaktor veröffentlicht.
Das wäre also der Umstand, dass Unterschiede eher veröffentlicht werden als wenn man nichts findet.
Aus der Studie:
To estimate the overall effect size, we used a random effects model. In accordance with our hypothesis as well as the former literature, we found a small average standardized mean difference, g = −0.22, z = −3.63, p b .001, CI95 = −0.34; −0.10, indicating that girls who have been exposed to a stereotype threat on average score lower on the MSSS tests compared to girls who have not been exposed to such a threat. Furthermore, we found a significant amount of heterogeneity using the restricted maximum likelihood estimator, τ^2 = 0.10, Q(46) = 117.19, p b .001, CI95 = 0.04; 0.19, which indicates there is variability among the underlying population effect sizes. This estimated heterogeneity accounts for a large share of the total variability, I 2 = 61.75%. The 95% credibility interval, an estimation of the boundaries in which 95% of the true effect sizes are expected to fall, lies between −0.85 and 0.41 (Viechtbauer, 2010)
Der Ergebnisbaum ergibt folgendes Bild:
Dazu aus der Studie:
We used several methods to test for the presence of publication bias. First, we ran several tests on the funnel plot (see Fig. 3) to assess funnel plot asymmetry. According to the estimations of the trim and fill method (Duval & Tweedie, 2000), the funnel plot would be symmetric if 11 effect sizes would have been imputed on the right side of the funnel plot. Actual imputation of those missing effect sizes (Duval & Tweedie, 2000) reduced the estimated effect size to g = −0.07, z = −1.10, p = .27, CI95 = −0.21; 0.06. Because this altered effect size did not differ significantly from zero whereas our original effect size estimation of g = −0.22 did, this pattern is a first indication that our results might be distorted by publication bias. Both Egger’s test (Sterne & Egger, 2005; z = −3.25, p = .001) and Begg and Mazumdar’s (1994) rank correlation test, Kendall’s τ = −.27, p = .01, indicated funnel plot asymmetry. This finding indicates that imprecise study samples (i.e., study samples with a larger standard error) on average contribute to a more negative effect than precise study samples. The relation between imprecise samples and the effect sizes is illustrated in Fig. 4 using a cumulative meta-analysis sorted by the sampling variance of the samples (Borenstein, Hedges, Higgins, & Rothstein, 2009). This cumulative process first carries out a “meta-analysis” on the sample with the smallest sampling variance and proceeds adding the study with smallest remaining sampling variance and re-analyzing until all samples are included in the meta-analysis. The drifting trend of the estimated effect sizes visualizes the effect that small imprecise study samples have on the estimations of the mean effect. We created subsets to estimate the effects of large study samples (N ≥ 60) and small study samples (N b 60). We found a stronger effect in the subset of smaller study samples, g = −0.34, z = −3.76, p b .001, CI95 = −0.52; −0.16, CrI95 = −0.96; 0.27, k = 24, and a small and nonsignificant effect for the subset of larger study samples,g = −0.13, z = −1.63, p = .10, CI95 = −0.29; 0.03, CrI95 = −0.75; 0.49, k = 23. Finally, Ioannidis and Trikalinos’s exploratory test (Ioannidis & Trikalinos, 2007) showed that this meta-analysis contains more statistically significant effects than would be expected based on the cumulative power of all study samples, χ2 (1) = 8.50, p = .004.6 The excess of statistically significant findings is another indicator of publication bias (Bakker et al., 2012; Francis, 2012). To check the alternative explanation that the excess of statistically significant findings is due to the practice of p-hacking we created a p-curve (Fig. 5) using the online app from Simonsohn et al. (2013). The p-curve depicts the theoretical distribution of p-values when there is no effect present (solid line), the theoretical distribution of p-values when an effect is present and the tests have 33% power (dotted line), and the observed distribution of the significant p-values in our meta-analysis (dashed line). The observed distribution was right-skewed, χ2 (30) = 62.87, p b .001, which indicated that there is an effect present that is not simply the result of practices like p-hacking.7 Overall,
most publication bias tests indicate that the estimated effect size is likely to be inflated.
Also in der Meta Studie nur ein sehr kleiner Effekt, bei dem es einige Anzeichen dafür gibt, dass er tatsächlich gar nicht besteht.Wäre das richtig, dann wären erhebliche Zweifel an diesen Theorien sehr berechtigt.
Die Studie hat auch schon ihren Weg in die englische Wikipedia gefunden. Dort heißt es unter „Kritik“:
The stereotype threat explanation of achievement gaps has attracted criticism. According to Paul R. Sackett, Chaitra M. Hardison, and Michael J. Cullen, both the media and scholarly literature have wrongly concluded that eliminating stereotype threat could completely eliminate differences in test performance between European Americans and African Americans. Sackett et al. have pointed out that, in Steele and Aronson’s (1995) experiments where stereotype threat was removed, an achievement gap of approximately one standard deviation remained between the groups, which is very close in size to that routinely reported between African American and European Americans‘ average scores on large-scale standardized tests such as the SAT. In subsequent correspondence between Sackett et al. and Steele and Aronson, Sackett et al. wrote that „They [Steele and Aronson] agree that it is a misinterpretation of the Steele and Aronson (1995) results to conclude that eliminating stereotype threat eliminates the African American-White test-score gap.“
Arthur R. Jensen criticised stereotype threat theory on the basis that it invokes an additional mechanism to explain effects which could be, according to him, explained by other, well-known, and well-established theories, such as test anxiety and especially theYerkes–Dodson law. In Jensen’s view, the effects which are attributed to stereotype threat may simply reflect „the interaction of ability level with test anxiety as a function of test complexity“.
In 2009, Wei examined real-world testing over a broad population (rather than lab assessments with questionable external validity), and found the opposite of stereotype threat: randomly assigned gendered questions actually raised female students‘ scores by 0.05 standard deviations. The lack of stereotype threat replicates an earlier large experiment with Advanced Placement exams which found no stereotype threat.
Gijsbert Stoet and David C. Geary reviewed the evidence for the stereotype threat explanation of the achievement gap in mathematics between men and women. They concluded that the relevant stereotype threat research has many methodological problems, such as not having a control group, and that the stereotype threat literature on this topic misrepresents itself as „well established“. They concluded that the evidence is in fact very weak.
Failures to replicate and publication bias
Meta-analysis of stereotype threat on girls showing asymmetry typical of publication bias. From Flore, P. C., & Wicherts, J. M. (2015)
Whether the effect occurs at all has also been questioned, with researchers failing to replicate the finding. Flore and Wicherts concluded the reported effect is small, but also that the field is inflated by publication bias. They argue that, correcting for this, the most likely true effect size is near zero (see meta-analytic plot, highlighting both the restriction of large effect to low-powered studies, and the plot asymmetry which occurs when publication bias is active).
Earlier meta-analyses reached similar conclusions. For instance, Ganley et al. (2013) examined stereotype threat on mathematics test performance. They report a series of 3 studies, with a total sample of 931 students. These included both childhood and adolescent subjects and three activation methods, ranging from implicit to explicit. While they found some evidence of gender differences in math, these occurred regardless of stereotype threat. Importantly, they found „no evidence that the mathematics performance of school-age girls was impacted by stereotype threat“. In addition, they report that evidence for stereotype threat in children appears to be subject to publication bias. The literature may reflect selective publication of false-positive effects in underpowered studies, where large, well-controlled studies find smaller or non-significant effects:
nonsignificant findings were almost always reported in an article along with some significant stereotype threat effects found either at another age (Ambady et al., 2001; Muzzatti & Agnoli, 2007), only with certain students (Keller, 2007), on certain items (Keller, 2007; Neuville & Croizet, 2007), or in certain contexts (Huguet & Regner, 2007, Study 2; Picho & Stephens, 2012; Tomasetto et al., 2011). Importantly, none of the three unpublished dissertations showed a stereotype threat effect. This observation suggests the possibility that publication bias is occurring. Publication bias refers to the fact that studies with null results are often not written up for publication or accepted for publication (Begg, 1994). This bias is a serious concern, especially if these results are being used to make recommendations for interventions.
In a study designed to see whether incentives could overcome stereotype threat in mathematics tests, Fryer Levitt and List (2008) could not replicate the stereotype threat, finding instead a modest facilitation effect of threat for males and females.
Da sind also noch einige weitere Studien zu finden, die in die gleiche Richtung gehen.