[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for ...

 

nothing beats something and more...




nothing beats something

[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

Donkey1
hello

i'm reading a book called Nothing is True and Everything is Possible, about Russia, and the name keeps haunting me (the content of the book is good, too).  sometimes i worry that this goes for science, too.  it's not just that when nothing is true, everything is possible.  but when everything is possible, nothing is true.  

sometimes studying human behavior is so wild and messy that it feels like anything goes.  we create ad hoc scales more often than we probably should, and we invent manipulations out of thin air, rarely pausing to validate them.  if the best we can do is a college student sample, that sin will be forgiven.  if we can't get a behavioral measure of a behavior, a self-report will often do.  we do what's possible because, well, what's the alternative?

i'm here to consider the exceedingly unpopular view that the alternative - to do nothing - is sometimes preferable to doing what's possible.

science is hard.  social science can be especially hard, because we usually need to study humans, and humans are a bitch to study.*  there's this idea out there that it's better to collect noisy, messy data than no data at all.  i've often repeated this claim myself.  indeed, i taught it when i used David Funder's excellent personality textbook.  Funder's third law** is that "something beats nothing, two times out of three."  Funder was smart to add the caveat, because i think sometimes nothing beats something.

slowly over the last few years, i've come to the conclusion that sometimes crappy data is actually worse than no data.  i think that we sometimes fall into the trap of thinking that, because a phenomenon is really, really, really hard to study, very flawed studies are the best we can hope for and so are ok.  it's almost like we are committed to the belief that everything must be possible for a reasonably industrious researcher to study, so if the only option is a bad study, then a bad study must be good enough.

let me be clear: i have definitely made this argument myself, possibly as recently as last month.  i'm no stranger to the feeling that just getting adequate data feels almost impossible, and so inadequate data must do.  for example, i was pretty proud of myself when my lab decided to not just code actual behavior from EAR recordings of >300 participants, and not just double-code the recordings, but TRIPLE code the recordings.  when, after two years of coding by over 50 coders (which came on the heels of several years of data collection), we checked the reliabilities on those triple codings and found them wanting, it was very, very tempting to say "well, it's the best we can do."  luckily i had learned about the spearman-brown prophecy formula, and so had no excuse - adding more raters was definitely going to help, it's a mathematical fact.  so we got three more coders per recording (which took about two more years and 50 more coders).  and let me tell you, it sucked.  we are sick of coding.***  we wish we could have spent those two years doing new projects.  but we got the reliabilities up, and that will let us get much better estimates.  science is not all creativity and innovation.  it is a lot of boring, hard work.  if you don't sometimes hate science, you might be doing it wrong.

even better examples of people who kept going when most of us would have said "good enough:" the two researchers i highlighted in my blog post, "super power."  their efforts really make me rethink my definition of impossible.  

but sometimes getting good data really is impossible, at least for any single lab.  and i would like to float the idea that when this is the case, there may be something noble in walking away, and choosing to do nothing rather than something.  sometimes, the nicest thing you can do for your research question is to leave it unanswered.  (my colleague wiebke bleidorn pointed out that this is called "addition by subtraction.")  if you insist on extracting information from really crappy data, you put yourself at really high risk**** of  reading patterns into noise.  just because you need your data to tell you something, to move the needle, doesn't mean it can.  if our field allowed people to get credit for publishing little bits of (really hard-to-collect) data without making any claims whatsoever, this might be a viable approach, but i don't think that currently exists (though there are low-recognition ways to do this, of course).

the wishful thinking that we can always extract some knowledge from even small bits of messy data can lead to serious, widespread problems.  it's easy to point fingers at other fields, but this problem is alive and well in psychology.  i'm very familiar with the argument that this sample size or that method must be good enough because it's all we can realistically do, given limited resources, hiring or promotion expectations, etc..  it used to be one of the most common responses i'd hear to calls for larger samples or better (i.e., harder) methods.  people seem to have backed off of making the argument out loud, but i think it's still very common in people's minds - this attitude that it's inappropriate to say a method or design isn't good enough because it was hard/expensive/time-consuming to do.  (indeed, i think this is the most common internal response most of us have when we get criticized for studying only WEIRD samples.)

here's what i wish we all did (including me):  1) ask yourself what the ideal study would be to test your research question.  2)  ask yourself if you're willing and able to do something pretty close to it.  3)  if not, ask yourself why not.  really push yourself.  don't let yourself off the hook just because it would be a lot harder than what you're used to doing.  if the research question is important, why isn't it worth the really hard work?  4)  if you're still not willing or able to do something close to the ideal study on your own, either a) move on to a different (or narrower) research question, or b) join forces with other labs.  

i know this is idealistic.  i know we have to keep publishing otherwise our food pellets stop coming.  so let's pick questions we can actually hope to answer with the resources we have.  (this post is really one big, public NOTE TO SELF.  i have been extremely guilty of biting off way more than i can chew,***** and want to learn to scale back my expectations of what kinds of research questions i can rigorously test on my own.)  or join collaborative projects that pool resources to tackle the really hard, important questions, and find a way to deal with the issue of spreading credit around.  if we stop trying to do the impossible, i think we'll find that more things are true.

* also, sometimes to interact with.  or be near.  

** Funder calls his lessons "laws" somewhat tongue-in-cheek.  the whole book is a treat, but if you're new to personality psych, i especially recommend chapters 1 through 7.  felicity's professor at the university of new york used it, almost certainly because of it's engaging yet accurate prose.  this is a very serious footnote, i am not kidding even a little bit.  read it.

*** it is kind of like if your lab had to eat asparagus for six meals a week for several years, and then you realized you had several more years of compulsory asparagus-eating.  (except our pee don't stink.)

**** 100% risk

***** true story: my undergrad thesis examined how sex-role identity interacts with gender to predict attitudes towards women... among high school students... in two different countries.... one of which is Samoa, a remote, non-English-speaking island in the middle of the South Pacific.  

Donkey2
did someone say asparagus?
 

 

bitter carrots*

 [DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

 

Camel1
carrots

if you had told me five years ago that even one in twenty social/personality psych papers would provide links to their data and code, or to a pre-registration, i would have thought that would be huge progress.**   i've long been a fan of the nudges that encourage these kinds of practices (e.g., badges), and until recently i thought going as far as to require this kind of transparency (even with room for legitimate exceptions) was probably unrealistic - our field didn't seem ready for that.   i was sympathetic to the carrots-not-sticks approach.
 
but there's a problem with carrots-not-sticks.   we're asking researchers to eat the carrots, but some of the carrots are pretty bitter.   sometimes, when researchers are transparent, that brings information to light that undermines their claims, and readers don't buy the claims.   that's a necessary side effect of transparency.   and it means we can't in good faith tell researchers that transparency is always in their best interest and will be its own reward.   we can't lure people with carrots, and pretend all of the carrots are delicious and fun to eat.   sometimes carrots are hard to swallow.
 
i think it's time to admit that the main argument for transparency isn't self-interest - it's that transparency is just better for science.*** 
 
imagine the following scenarios:
 
scenario 1: you get a paper to review, and the authors have shared their data and code.   you look at the data and realize there is a coding error, or something else that makes the results uninterpretable (i.e., suggests the study needs to be re-run to fix the error).   you point this out in your review, the editor agrees and rejects the manuscript.

scenario 2: you get a paper to review, and the authors have shared a link to their pre-registration.   by comparing the manuscript and the pre-registration you realize that the analysis that the authors present as their planned analysis, and interpret the p-value for, is not actually the one they had specified a priori as their key planned analysis.   knowing this, you can tell that the claims in the paper are not supported by the evidence.   the editor agrees and rejects the manuscript.
 
scenarios 1 and 2 seem pretty straightforward.   but now consider scenarios 3 and 4:
 
scenario 3: you get a paper to review and the authors do not provide their data and code, but there is no sign of anything wrong.
 
scenario 4: you get a paper to review and the authors did not preregister their study, but they claim that the key result they present was their planned analysis, and interpret the p-value as if it was the only test they ran. 
 
what should you do in scenarios 3 and 4?
one option, and i think the way most of us have been operating, is to assume that the data have no anomalies, and the key analysis was indeed the one planned test that was run.   but is this fair to the authors in scenarios 1 and 2?   in scenarios 3 and 4, we're giving the authors the benefit of the doubt because they didn't let us verify their claims.   in scenarios 1 and 2 we're punishing them because they did let us verify their claims, and we learned that their claims were not justified.
 
but what else could we do in scenarios 3 and 4?   assume that their studies had the same flaws as the studies in scenarios 1 and 2?   that doesn't seem fair to the authors in scenarios 3 and 4.
 
when some authors choose to be transparent, we have no choice but to use the extra information they give us to assess the rigor of their study and the credibility of their claims.   but that also puts us in an impossible position with respect to the manuscripts in which authors are not transparent.   we can't assume these non-transparent studies have flaws, and we can't assume they don't.  
 
it seems to me the only fair thing to do is to make transparency the default.****   whenever possible, authors should be expected to share the data and code necessary to reproduce their results unless that's legally or ethically problematic.   and if authors claim that their key analysis was planned (i.e., if they're saying they're doing hypothesis testing and/or interpreting a p-value), we should ask that they document this plan (i.e., pre-register), or present their work as exploratory and their conclusions as less sure.   it's just not fair to let some authors say "trust me" when other authors are willing to say "check for yourself."
 
i know that's not the world we live in, and as long as transparency is not the default, we have to treat papers like those in scenarios 3 and 4 somewhere in the gray area between flawless and deeply flawed.   but my heart really goes out to the authors in scenarios like 1 and 2.   it would be completely rational for those authors to feel like they are paying too high a price for transparency.   (i've tried to make the case that there is a silver lining - their transparency makes the review process more fruitful for them, because it allows reviewers and editors to pinpoint specific ways they could improve their work which wouldn't have been possible without their openness.   but i'm sure that's not much consolation if they see papers like those in scenarios 3 and 4 getting published over theirs.)
 
my sense is that many people are getting on board with the credibility revolution, but only so long as all of the incentives are carrots, and not sticks.   as long as we can choose which carrots we want to go for, and not be punished if we don't eat our veggies, everyone is happy.   but that won't work in the long run.   it was perhaps a necessary step on the way to more widespread changes, but i think we need to start seriously considering making carrot-eating the default (also known as using sticks).   i can't think of how to make the current opt-in system we have fair.   if you can, i'd love to hear it.
 
* for more on problematic carrot-eating, see this by james heathers.
** in honor of the finding that two spaces after periods is the morally superior formatting, i am compensating for years of being bullied into one space by using three paces.   (yes, i'm aware of the shakiness of the evidence but i never let that get in the way of a good footnote.)   #iwantmyspacesback
*** i don't mean that everything should always be transparent, and i don't know anyone who believes that.   i mean that things that can legally and ethically be made transparent usually should be.  
****  this seems like a good time to remind readers that i do not set policy for any journals, and i am conscious of the difference between my personal fantasies and the realities of editorial responsibility (as tempting as it is to use my vast editorial powers to force everyone to put five spaces between sentences).*****   
***** editorial abuses of power is a topic for another blog post. #bringonthelawsuits
 
 
 
Polar bear carrot
mmmm carrots
      
 

An Oath for Scientists

[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

Bear oath1

i've been thinking a lot about what it means to be a scientist.  being a scientist comes with certain obligations, and ignoring those obligations can give science a bad name.  it seems to me we could do more to make scientists aware of this responsibility when they decide whether or not to join the profession.

our most important obligation as scientists, to my mind, is preserving science's credibility. that doesn't mean we can't make mistakes, but above all else, we should be committed to opening ourselves up to scrutiny and correcting our errors.

to make these values a bit more concrete, i tried to adapt the hippocratic oath to scientists. you can tell how solemn this oath is by the use of capitalization.

the values i tried to capture were inspired by Merton's norms, but in the spirit of Merton's norm of universalism, i refrained from naming the oath after him (or anyone). it is very far from comprehensive, and i know it's cheesy, but i ask you, dear reader: if you can't engage in a little facile sentimentality on new year's day, when can you? 

 

An Oath for Scientists 

I swear that I will, according to my ability and judgment, protect the credibility of science by carrying out this oath and this indenture.

To make the grounds for my scientific claims transparent and available for others to scrutinize, to welcome that scrutiny and accept that others will be skeptical of my claims, to help others verify the soundness of my claims; to describe my methods in sufficient detail for others to repeat them, to not obstruct others' attempts to replicate my work; to report all evidence I know of for or against my claim, to not suppress evidence against my conclusions, to correct my past claims if I learn that they were wrong, to support the dissemination of evidence that disconfirms or contradicts my past claims.

I will hold myself and all other scientists to this oath, and I will not exempt any scientist because of her status or reputation. I will judge scientific claims based on the evidence, not the scientist making the claim. Neither will I hold any scientific claim or finding as sacred. Similarly, I will recognize as valuable the work of scientists who aim to correct errors in the scientific record.  

In whatever claims I present in my role as scientist, I will not knowingly overstate or exaggerate the evidence, I will not make claims out of interest for advancing my own station, and I will disclose any personal interest that may be perceived as biasing my judgment.  I will protect the credibility of my profession by making careful, transparent, calibrated claims.

Now if I carry out this oath, and break it not, may I gain for ever reputation among all scientists for my work; but if I transgress it and forswear myself, may the opposite befall me.

 

Polar bear oath1

 

Guest Post by Shira Gabriel: Don't Go Chasing Waterfalls

 [DISCLAIMER: The opinions expressed in my posts, and guest posts, are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

 

Guest post by Shira Gabriel

Don’t go chasing waterfalls, please stick to the rivers and the lakes that you're used to.

I haven’t always been the most enthusiastic respondent to all the changes in the field around scientific methods.  Even changes that I know are for the better, like attending to power and thus running fewer studies with more participants, I have gone along with grudgingly.  It is the same attitude I have towards eating more vegetables and less sugar.  I miss cake and I am tired of carrots, but I know that it is best in the long run. I miss running dozens of studies in a semester, but I know that it is best in the long run. 

It is not like I never knew about power, but I never focused on it, like many other people in the field.  I had vague ideas of how big my cell sizes should be (ideas that were totally wrong, I have learned since) and I would run studies using those vague ideas.  If I got support for my hypotheses-- great!  But when I didn’t, I would spend time with the data trying to figure out what went "wrong" -- I would look to see if there was something I could learn.  I would look to see if there was some other pattern in the data that could tell me why I didn’t find what I predicted and perhaps clue me into some other interesting phenomena. 

You know, like Pavlov trying to figure out why his saliva samples were messed up and discovering classical conditioning.  That was me, just waiting for the moment when I would discover my own version of classical conditioning1.

I am going to be honest here: I love doing that.  I love looking at data like a vast cave filled with goodies where one never knows what can be found.  I love looking for patterns in findings and then thinking hard about what those mean and whether, even though I had been WRONG at first, there was something else going on – something new and exciting. I love the hope of discovering something that makes sense in a brand new way. It is like detective work and exploration and mystery, all rolled in one.  I’m like Nancy Drew in a lab coat2.

Before anyone calls the data police on me, the next step was never publication.  I didn’t commit that particular sin. Instead if I found something I didn’t predict, I would design another study to test whether that new finding was real or not.  That was step two.

But this movement made me look back at my lab and the work we have done for the past 17 years3 and realize that although this has worked for me a couple of times I have also chased a lot -- A LOT -- of waterfalls that have turned into nothing. In other words, I have thrown a lot of good data after bad.4

And looking back, the ones that did work -- that turned into productive research areas with publishable findings -- were the ones that had stronger effects that replicated over different DVS right from the start.

When I chased something that wasn't as strong, I wasted huge amounts of my time and resources and, worse yet, the precious time of my grad students.  That happened more than is comfortable for me to admit.

So, I think a big benefit for me of our new culture and rules is that I spend less time chasing waterfalls.  My lab spends more time on each study (since we need bigger Ns) but we don't follow up unexpected findings unless we are really confident in them.  If just one DV or one interaction looks interesting, we let it go for what it likely is -- a statistical fluke.

And we don't just do that because it is what we are now supposed to do, we do it because empirically speaking I SHOULD have been doing that for 17 years.  I spent too much time chasing after patterns that turned out to be nothing5.

So I think my lab works better and smarter now because of this change.

As long as I am being so honest, I should admit that I miss chasing waterfalls.  Just last week, one of my current PhD students6 and I were looking at a study that is a part of a really solid research program of hers that thoughtfully and carefully increases our science.  And in her latest dataset, I felt the mist of a far-off possible waterfall in an unexpected interaction.  Could this be the big one? It was tempting, but we aren’t going to chase the waterfall.  As much as it seems fun and exciting, our science isn’t built on the drama and danger of waterfalls. To paraphrase the wise and wonderful TLC, I am sticking to the rivers and lakes that I am used to.  That is how science advances.

Footnotes

  1. Still waiting, in case you were wondering.
  2. I don’t wear a lab coat. More like Nancy Drew in Yoga pant and a stained sweatshirt, but same difference.
  3. I am really old.
  4. Or is it bad after good? I can never remember which way it is supposed to go.
  5. How can you tell if an unexpected finding is a waterfall or classical conditioning? You can’t. But here are the four things I now look for: are the effects consistent across similar DVs; can we look back and find similar things in old datasets; do we have a sound theoretical explanation for the surprising findings; and, finally, can that theoretical explanation lead to other hypotheses that we can look at in the data.  Only if a good chunk of that works out will we move on to collect more data. And yah, “good chunk” is not quantifiable. Sometimes Nancy Drew has to follow her instincts.
  6. Elaine Paravati. She rocks.
      
 

results blind vs. results bling*

 

[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

Octopus24
show-off

in many areas of science, our results sections are kind of like instagram posts.  beautiful, clear, but not necessarily accurate. researchers can cherry-pick the best angle, filter out the splotches, and make an ordinary hot dog look scrumptious (or make a lemon look like spiffy car).**  but what's even more fascinating to me is that our reaction to other people's results are often like our reactions to other people's instagram posts: "wow! that's aMAZing! how did she get that!"

i've fallen prey to this myself.  i used to teach bem's chapter on "writing the empirical journal article," that tells researchers to think of their dataset as a jewel, and "to cut and polish it, to select the facets to highlight, and to craft the best setting for it."  i taught this to graduate students, and then i would literally turn around, read a published paper, and think "what a beautiful jewel!"***

as with instagram, it's impossible to mentally adjust our reaction for the filtering the result could have gone through.  it's hard to imagine what the rough cut might've looked like.  it's hard to keep in mind that there could've been other studies, other measures, other conditions, other ways to clean the data or to run the analyses.  and we never know  - maybe this is one of those #nofilter shots.

in short, we can't avoid being blinded by shiny results.  

what can we do?

there are a few stopgaps.  for example, as an author, i can disclose as much as possible about the process of data collection and analysis, and the results (e.g., the 21 word solution).  as a reader, i'll often pause when i get to the end of the method section and ask myself - is this study well-suited to the researchers' goals?  would i think it should be published if i had to evaluate it just based on the method?    

another partial solution is pre-registration, the #nofilter of scientific research. by pre-registering, a researcher is committing to showing you the raw results, without any room for touching-up (for the planned analyses - the exploratory analyses can be looked at from any and all angles).  with a good pre-registration, readers can be pretty confident they're getting a realistic picture of the results, except for one problem. the editors and reviewers make their evaluations after seeing the results, so they can still consciously or unconsciously filter their evaluation through biases like wanting only counterintuitive, or significant, findings.  so pre-registration solves our problem only as along as editors and reviewers see the value of honestly-reported work, even if it's not as eye-catching as the filtered stuff. as long as editors and reviewers are human,**** this will likely be a problem.

the best solution to this problem, however, is to evaluate research before anyone knows the results.  this is the idea behind registered reports, now offered by Collabra: Psychology, the official journal of the Society for the Improvement of Psychological Science.  an author submits their paper before collecting data, with the introduction, proposed method, and proposed analyses, and makes a case for why this study is worth doing and will produce informative results.  the editor and reviewers evaluate the rationale, the design and procedures, the planned analyses and the conclusions the authors propose to draw from the various possible results.  the reviewers and editor give feedback that can still be incorporated into the proposed method.  then, if and when the editor is satisfied that the study is worth running and the results will be informative or useful regardless of the outcome, the authors get an "in principle acceptance" - a guarantee that their paper will be published so long as they stick to the plan, and the data pass some basic quality checks.  the final draft goes through another quick round of review to verify these conditions are met, and the paper is published regardless of the outcome of the study.

registered reports have many appealing characteristics.  for the author, they can get feedback before the study is conducted, and they can get a guarantee that their results will get published even if their prediction turns out to be incorrect, freeing them to be genuinely open to disconfirmation of their predictions.  it's nice to be able to have less at stake when running a study - it makes for more objectivity, and greater emotional stability.*****  

for science, the advantage is that registered reports do not suffer from publication bias - if all results are published, the published literature will present an unbiased set of results, which means science can be cumulative, as it's meant to be.  meta-scientists can analyze a set of published results and get an accurate picture of the distribution of effects, test for moderators, etc.  the only downside i can think of is that journals will be less able to select studies on the basis of projected citation impact - the 'in principle acceptance' means they have to publish even the findings that may not help their bottom line.  call me callous but i'm not going to shed too many tears over that.

not everything can be pre-registered, or done as a registered report.  for one thing, there's lots of very valuable existing data out there, and we shouldn't leave it hanging out to dry.  for another, we should often explore our data beyond testing the hypothesis that the study was originally designed to test.  many fascinating hypotheses could be generated from such fishing expeditions, and so long as we don't fool ourselves into thinking that we're testing those hypotheses when we're in fact just generating them, this is an important part of the scientific process.

 the fact that we keep falling for results bling, instead of evaluating research results-blind, just means we're human.  when you know the results are impressive, you're biased to think the method was rigorous.  the problem is that we too easily forget that there are many ways to come by impressive-looking results.  and even if we remember that filtering was possible, it's not like we can just magically know what the results look like unfiltered. it's like trying to imagine what your friends' unfiltered instagram pictures look like. 

are we ready to see what science looks like without the filters?  will we still get excited when we see it au naturel?  let's hope so - to love science is to accept it for what it really looks like, warts and all.

 

* this title was inspired by a typo.  
** although i'm using agentic verbs like "filtering" or "prettying up," i don't believe most of these distortions happen intentionally.  much of the fun in exploring a dataset is trying to find the most beautiful result we can.  it's hard to remember everything we tried, or what we might have thought was the best analysis before we knew how the results looked.  most of the touching up i refer to in this post comes from researchers engaging in flexible data analysis without even realizing they're doing so.  of course this is an assumption on my part, but the pretty large discrepancies between the results of pre-registered studies and similar but not-pre-registered studies suggests that flexibility in data analysis leads to exaggerated results.
***  words you'll never actually hear me say.
**** canine editing: not as effective as it sounds. 
***** personality change intervention prediction: if registered reports become the norm, scientists' neuroticism will drop by half a standard deviation. (side effect: positive affect may also take a hit)

 


Octopus25octopus vulgaris