[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for ...

 

Perspectives You Won't Read in Perspectives: Thoughts on Gender, Power, & Eminence and more...




Perspectives You Won't Read in Perspectives: Thoughts on Gender, Power, & Eminence

[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

 

This is a guest post by Katie Corker on behalf of a group of us

 
 
Rejection hurts. No amount of Netflix binge watching, nor ice cream eating, nor crying to one's dog* really takes the sting out of feeling rejected. Yet, as scientific researchers, we have to deal with an almost constant stream of rejection - there's never enough grant money or journal space to go around. 
 
Which brings us to today's topic. All six of us were recently rejected** from the Perspectives on Psychological Science special issue featuring commentaries on scientific eminence. The new call for submissions was a follow-up to an earlier symposium entitled "Am I Famous Yet?", which featured commentaries on fame and merit in psychological research from seven eminent white men and Alice Eagly.*** The new call was issued in response to a chorus of nasty women and other dissidents who insisted that their viewpoints hadn't been represented by the scholars in the original special issue. The new call explicitly invited these "diverse perspectives" to speak up (in 1,500 words or less****).  
 
Each of the six of us independently rose to the challenge and submitted comments. None of us were particularly surprised to receive rejections - after all, getting rejected is just about the most ordinary thing that can happen to a practicing researcher. Word started to spread among the rejected, however, and we quickly discovered that many of the themes we had written about were shared across our pieces. That judgments of eminence were biased along predictable socio-demographic lines. That overemphasis on eminence creates perverse incentives. That a focus on communal goals and working in teams was woefully absent from judgments of eminence.***** 
 
Hm. It appeared to us that some perspectives were potentially being systemically excluded from Perspectives!****** Wouldn't it be a shame if the new call for submissions yielded yet more published pieces that simply reinforced the original papers? What would be the point of asking for more viewpoints at all? 
 
Luckily it's 2017. We don't have to publish in Perspectives for you to hear our voices. We hope you enjoy our preprints, and we look forward to discussing and improving******* this work.
 
The manuscripts:
 
 
-Katie Corker
on behalf of the authors (Katie Corker, Fernanda Ferreira, Åse Innes-Ker, Cindy Pickett, Lani Shiota, & Simine Vazire)
 
 
* Kirby has had enough of your shit:  Dog



** Technically, some of the six of us got R&Rs, but the revisions requested were so dramatic that we have a hard time imagining being able to make them without compromising the main themes of our original pieces. 
 
*** It shouldn't come as a surprise that Eagly's commentary concerned issues relating to gender and power. It was also the only co-authored piece (David Miller was also an author) in the bunch. Eagly and Miller's title was "Scientific eminence:  Where are all the women?" Hi Alice and David - we're over here!
 
**** Doesn't appear that the original special issue had such a tight word limit, but who are we to judge?
 
***** We were also disheartened to notice that, ironically, many of themes we raised in our pieces surfaced in the treatment we received from the editor and reviewers. For instance, we were told to provide evidence for claims like "women in psychological science may face discrimination." One reviewer even claimed that white men are actually at a disadvantage when it comes to receiving awards in our field. We collectively wondered why we, the "diverse voices," were seemingly being held to higher standard of evidence than the pieces in the original symposium. Color us surprised that as members of a group stereotyped as less competent, and as outsiders to the eminence club, we had to work impossibly hard to be seen as competent (see Biernat & Kobrynowicz, 1997, among many others).
 
****** On the other hand, it's entirely possible that there were 20 such submissions, and the six of us represent the weakest of the bunch. Hard to get over that imposter syndrome...
 
******* We've chosen to post the unedited submissions so that you can see them in their original form. Of course, the reviewers did raise some good points, and we anticipate revising these papers in the future to address some of the issues they raised.
      
 

looking under the hood

[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

Screen Shot 2017-03-02 at 4.07.25 PM
lipstick on a hippo
 

before modern regulations, used car dealers didn't have to be transparent.  they could take any lemon, pretend it was a solid car, and fleece their customers.  this is how used car dealers became the butt of many jokes.
 
scientists are in danger of meeting the same fate.*  the scientific market is unregulated, which means that scientists can wrap their shaky findings in nice packaging and fool many, including themselves.  in a paper that just came out in Collabra: Psychology,** i describe how lessons from the used car market can save us.  this blog post is the story of how i came up with this idea.
 
last summer, i read Smaldino and McElreath's great paper on "The natural selection of bad science."  i agreed with almost everything in there, but there was one thing about it that rattled me.  their argument rests on the assumption that journals do a bad job of selecting for rigorous science. they write "An incentive structure that rewards publication quantity will, in the absence of countervailing forces, select for methods that produce the greatest number of publishable results." (p. 13).  that's obviously true, but it's not necessarily bad.  what makes it bad is that "publishable result" is not the same thing as "solid study" - if only high quality studies were publishable, then this wouldn't be a problem. 
 
so this selection pressure that Smaldino and McElreath describe is only problematic to the extent that "publishable result" fails to track "good science."  i agree with them that, currently, journals*** don't do a great job of selecting for good science, and so we're stuck in a terrible incentive structure.  but it makes me sad that they seem to have given up on journals actually doing their job. i'm more optimistic that this can be fixed, and i spend a lot of time thinking about how we can fix it.****
 
a few weeks later, i read a well-known economics article by Akerlof, "The market for "lemons": Quality uncertainty and the market mechanism" (he later won a nobel prize for this work).  in this article, Akerlof employs the used car market to illustrate how a lack of transparency (which he calls "information asymmetry") destroys markets.  when a seller knows a lot more about a product than buyers do, there is little incentive for the seller to sell good products, because she can pass off shoddy products as good ones, and buyers can't tell the difference.  the buyer eventually figures out that he can't tell the difference between good and bad products ("quality uncertainty"), but that the average product is shoddy (because the cars fall apart soon after they're sold). therefore, buyers come to lose trust in the entire market, refuse to buy any products, and the market falls apart.
 
it dawned on me that journals are currently in much the same position as buyers of used cars.  the sellers are the authors, and the information asymmetry is all the stuff the author knows that the editor and reviewers do not: what the authors predicted ahead of time, how they collected their data, what the raw data look like, what modifications to the data or analyses were made along the way (e.g., data exclusions, transformations) and why, what analyses or studies were conducted but not reported, etc.  without all of this information, reviewers can only evaluate the final polished product, which is similar to a car buyer evaluating a used car based only on its outward appearance. 
 
because manuscripts are evaluated based on superficial characteristics, we find ourselves in the terrible situation described by Smaldino and McElreath: there is little incentive for authors to do rigorous work when their products are only being evaluated on the shine of their exterior.  you can put lipstick on a hippo, and the editor/reviewers won't know the difference. worst of all, you won't necessarily know the difference, because you're so motivated to believe that it's a beautiful hippo (i.e., real effect).
 
that's one difference between researchers and slimy used car dealers***** -- authors of scientific findings probably believe they are producing high-quality products even when they're not.  journals keep buying them, and until recent replication efforts, the findings were never really put to the test (at least not publicly).  
 
the replicability crisis is the realization that we have been buying into findings that may not be as solid as they looked. it's not that authors were fleecing us, it's that we were all pouring lots of time, money, and effort into products that, unbeknownst to us, were often not as pretty as they seemed.
 
the cycle we've been stuck in, and the one described by Smaldino and McElreath, is the same one Akerlof explained with the used car market.  happily, that means Akerlof's paper also points to the solution: transparency.  journals have the power to change the incentive structure.  all they need to do is reduce the information asymmetry between authors and reviewers by requiring more transparency on the part of authors.  give the reviewers and editors (and, ideally, all readers) the information they need to accurately evaluate the quality of the science.  if publication decisions are strongly linked to the quality of the science, this will provide incentives for authors to do more rigorous work.
 
we would laugh at a buyer who buys a used car without looking under the hood.  yet this is what journals (and readers) are often doing in science.  likewise, we would laugh at a car dealer who doesn't want us to look under the hood and instead says "trust me," but we tolerate the same behavior in scientists.
 
but, you might say, scientists are more trustworthy than used car dealers!  sure,****** but we are also supposed to be more committed to transparency.  indeed, transparency is a hallmark of science - it's basically what makes science different from other ways of knowing (e.g., authority, intuition, etc.).  in other words, it's what makes us better than used car dealers.  
 
 
* if you've listened to paula poundstone on NPR, you might think it's already too late
 
** full disclosure: i am a senior editor at Collabra:Psychology.  i tried to publish this paper elsewhere but they didn't want it.  i do worry about the conflict inherent in publishing in a journal i'm an editor at, and i have been trying avoid it.  my rationale for making an exception here is that publishing in Collabra: Psychology is not yet so career-boosting that i feel i am getting a major kickback, but perhaps i am wrong.  also, if it helps, you can see the reviews and action letter from a previous rejection, which i submitted to Collabra for streamlined review. [pdf]
 
*** the other journals.
 
**** sometimes i lay awake in bed for hours thinking about this.  well, doing that or reading the shitgibbon's tweets and planning my next act of resistance.
 
***** sorry used car dealers.  i have little reason to smear you. well, except for the time you didn't want to let me test drive a car then called me a "bold lady" when i insisted that i could be trusted with a manual transmission. (i was 28 years old.) (and i have always driven a stick shift.) (because i am french.) 
 
****** not actually sure.  
 
Hippo2
underwater hippos
 

power grab: why i'm not that worried about false negatives

[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

Bear26


i've been bitching and moaning for a long time about the low statistical power of psych studies.  i've been wrong.

our studies would be underpowered, if we actually followed the rules of Null Hypothesis Significance Testing (but kept our sample sizes as small as they are).  but the way we actually do research, our effective statistical power is actually very high, much higher than our small sample sizes should allow.  

let's start at the beginning.

background (skip this if you know NHST)

NHST tableNull Hypothesis Significance Testing (over)simplified

in this table, power is the probability of ending up in the bottom right cell if we are in the right column (i.e., the probability of rejecting the null hypothesis if the null is false).  in Null Hypothesis Significance Testing (NHST), we don't know which column we're in, we only know which row we end up in.  if we get a result with p < .05, we are in the bottom row (and we can publish!*  yay!).  if we end up with a result with p > .05, we end up in the top row (null result, hard to publish, boo).  within each column, the probability of ending up in each of the two cells (top row, bottom row) adds up to 100%.  so, when we are in the left column (i.e., when the null is actually true, unbeknownst to us), the probability of getting a false positive (typically assumed to be 5%, if we use p < .05 as our threshold for statistical significance) plus the probability of a correct rejection (95%) add up to 100%.  and, when we are in the right column (i.e., when the null is false, also unbeknownst to - but hoped for by - us), the probability of a false negative (ideally at or below 20%) plus the probability of a hit (i.e., statistical power; 80%) add up to 100%.

side note: even if the false positive rate actually is 5% when the null is true, it does not follow that only 5% of significant findings are false positives.  5% is the proportion of findings in the left column that are in the bottom left cell.  what we really want to know is the proportion of results in the bottom row that are in the bottom left cell (i.e., the proportion of false positives among all significant results).  this is called the Positive Predictive Value (PPV) and would likely correspond closely to the rate of false positives in the published literature (since the published literature consists almost entirely of significant key findings).  but we don't know what it is, and it could be much higher than 5%, even if the false positive rate in the left column really was 5%.**

back to the main point.

we have small sample sizes in social/personality psychology.  small sample sizes often lead to low power, at least with the effect sizes (and between-subjects designs) we're typically dealing with in social and personality psychology.  therefore, like many others, i have been beating the drum for larger sample sizes.

not background

our samples are too small, but despite our small samples, we have been operating with very high effective power.  because we've been taking shortcuts.

the guidelines about power (and about false positives and false negatives) only apply when we follow the rules of NHST. we do not follow the rules of NHST.  following the rules of NHST (and thus being able to interpret p-values the way we would like to interpret them, the way we teach undergrads to interpret them) would require making a prediction and pre-registering a key test of that prediction, and only interpreting the p-value associated with that key test (and treating everything else as preliminary, exploratory findings that need to be followed up on). 

since we violate the rules of NHST quite often, by HARKing (Hypothesizing After Results are Known), p-hacking, and not pre-registering, we do not actually have a false positive error rate of 5% when the null is true.  that's not new - that's the crux of the replicability crisis.  but there's another side of that coin.  

the point of p-hacking is to get into the bottom row of the NHST table - we cherry-pick analyses so that we end up with significant results (or we interpret all significant results as robust, even when we should not because we didn't predict them).  in other words, we maximize our chances of ending up in the bottom row.  this means that, when we're in the left column (i.e., when the null is true), we inflate our chances of getting a false positive to something quite a bit higher than 5%.  

but it also means that, when we're in the right column (i.e., when the null hypothesis is false), we increase our chances of a hit well beyond what our sample size should buy us.  that is, we increase our power. but it's a bald-faced power grab.  we didn't earn that power.

that sounds like a good thing, and it has its perks for sure.  for one thing, we end up with far fewer false negatives. indeed, it's one of the main reasons i'm not worried about false negatives.  even if we start with 50% power (i.e., if we have 50% chance of a hit when the null is false, if we follow the rules of NHST), and then we bend the rules a bit (give ourselves some wiggle room to adjust our analyses based on what we see in the data), we could easily be operating with 80% effective power (i haven't done the simulations but i'm sure one of you will***).  

what's the downside?  well, all the false positives.  p-hacking is safe as long as our predictions are correct (i.e., as long as the null is false, and we're in the right column).  then we're just increasing our power.  but if we already know that our predictions are correct, we don't need science.  if we aren't putting our theories to a strong test - giving ourselves a serious chance of ending up with a true null effect - then why bother collecting data?  why not just decide truth based on the strength of our theory and reasoning?

to be a science, we have to take seriously the possibility that the null is true - that we're wrong.  and when we do that, pushing things that would otherwise end up in the top row into the bottom row becomes much riskier.  if we can make many null effects look like significant results, our PPV (and rate of false positives in the published literature) gets all out of whack. a significant p-value no longer means much.

nevertheless, all of us who have been saying that our studies are underpowered were wrong.  or at least we were imprecise.  our studies would be underpowered if we were not p-hacking, if we pre-registered,**** and if we only interpreted p-values for planned analyses. but if we're allowed to do what we've always done, our power is actually quite high.  and so is our false positive rate.

also

other reasons i'm not that worried about false negatives:

  • they typically don't become established fact, as false positives are wont to do, because null results are hard to publish as key findings.  if they aren't published, they are unlikely to deter others from pursuing the same question.
  • when they are published as side results, they are less likely to become established fact because, well, they're not the key results.
  • if they do make it into the literature as established fact, a contradictory (i.e., significant) result would probably be relatively easy to publish because it would be a) counter-intuitive, and b) significant (unlike results contradicting false positives, which may be seen as counter-intuitive but would still be subject to the bias against null results).

in short, while i agree with Fiedler, Kutzner, & Krueger (2012)***** that "The truncation of research on a valid hypothesis is more damaging [...] than the replication of research on a wrong hypothesis", i don't think many lines of research get irreversibly truncated by false negatives.  first, because the lab that was testing the valid hypothesis is likely motivated to find a significant result, and has many tools at its disposal to get there (e.g., p-hacking), even if the original p-value is not significant.  second, because even if that lab concludes there is no effect, that conclusion is unlikely to spread widely.

so, next time someone tells you your study is underpowered, be flattered.  they're assuming you don't want to p-hack, or take shortcuts, that you want to earn your power the hard way. no help from the russians.*******

* good luck with that.

** it's not.

*** or, you know, use your github to write up a paper on it with Rmarkdown which you'll put in your jupyter notebook before you make a shinyapp with the figshare connected to the databrary. 

**** another reason to love pre-registration:  if we all engaged in thorough pre-registration, we could stop all the yapping and get to the bottom of this replicability thing. rigorous pre-registration will force us to face the truth about the existence and magnitude of our effects, whatever it may be.  can we reliably get our effects with our typical sample sizes if we remove the p-hacking shortcut?  let's stop arguing****** and find out!

***** Fiedler et al. also discuss "theoretical false negatives", which i won't get into here.  this post is concerned only with statistical false negatives.  in my view, what Fiedler et al. call "theoretical false negatives" are so different from statistical false negatives that they deserve an entirely different label.

****** ok, let's not completely stop arguing - what will the psychMAP moderators do, take up knitting?

******* too soon?


Bear25
psychMAP moderators, when they're not moderating
 

 

 

 

now is the time to double down on self-examination

[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

IMG_8627 (1)

it can be tempting, when contemplating the onslaught that science is likely to face from the next administration and congress, to scrub away any sign of self-criticism or weakness that could be used against us.  as a "softer" science, psychology has reason to be especially nervous.*

but hiding our flaws is exactly the wrong response.  if we do that, we will be contributing to our own demise. the best weapon anti-science people can use against us is to point to evidence that we are no different from other ways of knowing.  that we have no authority when it comes to empirical/scientific questions.  our authority comes from the fact that we are open to scrutiny, to criticism, to being wrong.  the failed replications, and the fact that we are publishing and discussing them openly, is the best evidence we have that we are a real science.  that we are different from propaganda, appeals to authority, or intuition.  we are falsifiable.  the proof is that we have, on occasion, falsified ourselves. 
 
we should wear our battle with replicability as a badge of honor.  it's why the public should trust us to get things right, in the long run.  it's why it means something when we have confidence in a scientific discovery.  we should be proud of the fact that we don't take that word - discovery - lightly.  we scrutinize, criticize, attempt to falsify. that's why we will survive attacks on science.
 
yes, our failures will be used against us.  we will lose some battles.  but if we let those attacks scare us away from self-criticism and self-correction, we will have lost the war.
 
when we find a flaw in our process or in our literature, we need to responsibly communicate what it means and what it doesn't mean.  the failed replications i've seen in the last few years have been careful to do that.  those producing these critiques and "failures" are not trying to tear us - or anyone - down.  they are demonstrating just how tough we are.
 
the next few years are going to get even more challenging, but we need to resist the temptation to give in to a bunker mentality, to shield ourselves from criticism.  we need to be even more transparent, even more honest, than we have been until now.  we cannot fight ignorance with secrecy, we must face it head on with openness and faith in the scientific enterprise.

* if you came here for the jokes, you're out of luck.  this is a Very Serious post. mostly because my hotel room does not have a mini bar.**

** canada is, apparently, not perfect.
 
 
Unlocked2
      
 

a little bit louder now

[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the positions or policies of any institution with which I am affiliated.] 

  Hillary_nasty_woman

i can't even begin to imagine how many times in her life hillary clinton has had to bite her tongue.  through everything, through trump, i never saw her even begin to lose her cool.  
 
on nov. 8th, i realized that i had assumed that all of that self-control, all of that turning the other cheek, meant that she had earned a win.  i made the same gamble on hillary's behalf that women and minorities make every day: that if we take the blows without flinching, if we don't play the woman/minority card, if we keep demonstrating our competence, people will have no choice but to recognize it.  
 
one lesson from this election is that this strategy doesn't always work.  it assumes a reality that does not yet exist.  we will not get extra credit for our patience and forbearance in the face of sexism, racism, homophobia, or xenophobia.  no one will pat us on the back for biting our tongues.  it is time to start speaking up.
 
i hope that when people watched the debates, and admired clinton's strength, they made the parallel to the less extreme situations that women and minorities face every day.  we don't often face someone as vile as trump, but if we want to be successful we have to be prepared to absorb smaller slights on a regular basis, to keep our cool when faced with ignorant, unfair, or offensive comments.  
 
why don't we call them out?  because it's extremely difficult.  because there are too many things to call out.  because there is backlash.  because we often don't know for sure how much sexism is to blame for any particular event.  because often the person being sexist is someone we like and respect, and we don't want to make them uncomfortable. because we often don't want to derail the conversation, or take away from the larger goal of the group. because there is gaslighting. because it's fucking exhausting to speak up.  
 
it is very hard to decide whether and when to speak up.*  i don't know the right answer, but hillary's loss has convinced me that the answer is: more.  so when it feels like the right thing to do, i will try very hard to say something.  saying words out loud is not my strong suit, so we'll see how it goes.  i'll probably fail, a lot.**  
 
i'll also try to speak up more when people are good. when people stick their necks out for women, for minorities, for people who have less voice, less visibility, more to lose. i think one reason i underestimated how much of an uphill battle hillary was facing is because i know a lot of good people. people who believed in me stubbornly, persistently, fiercely. people who treated me as an authority on my own experience, and on other topics, well before i thought i had anything worth saying, or trusted my own voice. i cannot thank those people enough, but i can try to pay it forward. 
 
speaking up scares the shit out of me. but if you do it with me, it'll seem a little less scary.*** i'm not talking about being a jerk, i'm talking about being more honest.  flinching, if you feel like flinching.  giving other people the chance to hear your experience.  telling people how things look and sound to you does not make you a nasty woman. and staying silent will not guarantee any reward.
 
* for example, i deliberated for a long time about whether or not to publish this blog post.  i'm still not sure about it.  presumably if you are reading this i decided to post it.  yikes.

** i have already failed, twice, since writing this sentence.

*** reading Lindy West's book, Shrill, also makes it seem a little less scary.  go read it.
 
Shrill
      
 
 
   
Email subscriptions powered by FeedBlitz, LLC, 365 Boston Post Rd, Suite 123, Sudbury, MA 01776, USA.