[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for ...

 

the proof of the pudding is not, it turns out, in the eating and more...




the proof of the pudding is not, it turns out, in the eating

[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 
 
Bears eating pumpkinshappy halloween
 
here's an argument i've heard against registered reports and results-blind reviewing: "judging studies based on their methods is like judging a baking contest based on the recipes."  the implication being that this would be ridiculous.
 
i've been thinking a lot about this analogy and i love it.  not because i agree with it, but because i think it gets at the crux of the disagreement about the value of negative (null) results.  it's about whether we think the value of a study comes from its results or its methods.
 
the baking contest analogy rests on the assumption that the goal of science is to produce the best-tasting results.  according to this logic, the more we can produce delicious, mouth-watering results, the better we're doing.  accumulating knowledge is like putting together a display case of exquisite desserts.  and being able to produce a delicious result is itself evidence that your methods are good.  we know a good study when we see a juicy result.  after all, you wouldn't be able to produce a delicious cake if your recipe was crap.
 
this analogy probably sounds reasonable in part because of how we talk about negative results - as failures.  in baking that's probably apt - i don't want to eat your failed donut.*  but in science, the negative result might be the accurate one - you can't judge the truthiness of the result from the taste it leaves in your mouth.  we may not like negative results, but we can't just toss them in the bin.
 
here's what i think is a better analogy for science.  producing scientific knowledge is like putting together a cookbook that allows other people to follow recipes to reliably produce specific outcomes.  if someone wants their recipe included in the cookbook, we don't necessarily need for the recipe to produce something yummy, we want it to produce something predictable.  maybe it's bland, maybe it's sour, maybe it tastes like cilantro.  the point is that we know what it will produce most of the time (within some specified range of uncertainty), regardless of who the cook is.  in other words, what we really want to know is "what happens when we follow this recipe?"  and we want to know this for a wide range of recipes, not just the ones that produce delicious results, because the world is full of strange mixtures and combinations that are bound to occur, and we often want to know what the outcome is when those ingredients mix.  
 
in the case of psychology, the recipe might be something like "measure intelligence and happiness in a bunch of college students in the US and correlate the two variables" and the outcome might be "tiny relationship." that's not as delicious as a large relationship, but it's important to know anyway, because it's a fact that helps us understand the world and now constrains future theories.
 
we can definitely reserve the right to say that some recipes are not interesting enough to want to know what happens when we follow them (e.g., maybe it's not worth knowing what the correlation is between shoe size and liking mackerel**), but we can decide that based on the recipe, without knowing the result.  every once in a while, there will be a recipe that looks boring but produces a result that is actually interesting (e.g., maybe inhaling dog hair cures cancer), and we should certainly have a mechanism for those kinds of studies to make it into the literature, too.  but at least a good chunk of the time, it's the recipe that makes the study interesting, not the result.
 
what's the problem with choosing what to publish based on the tastiness of the outcome?  first, if we reward the tastiness of the result, we are incentivizing people to take shortcuts to produce tasty results even if that means deviating from the recipe.  that might work at a potluck (why no, i didn't use any baking powder in these eclairs!).  but in science, if you take shortcuts to get an impressive result, and you don't report that (because it would no longer be impressive if you admitted that you had to use a little baking powder to get your choux pastry to rise), that corrupts the scientific record.  
 
second, it gives too little weight to the quality of the methods.  even putting aside questionable research practices like p-hacking and selective reporting, there's the problem of accurate interpretation.  in science, unlike in baking,*** you can make scrumptious claims with crappy ingredients.  if we don't pay close attention to the soundness of the methods, we risk letting in all kinds of overblown or misinterpreted findings.  if we look closely at the recipe, we may be able to tell that it's not what it claims to be. for example, if you say you're going to make a carrot cake but you're using candy corn instead of carrots, it may taste delicious but it won't be a carrot cake.  (in this analogy, the carrot is the construct you're claiming to be studying and the candy corn is the really crappy operationalization of that construct.  keep up!)  there are many ways to produce replicable findings that don't actually support the theoretical claims they're used to bolster. 
 
third, publications are pieces of the scientific record, not prizes.  and if we only record the successes, there is no hope of a cumulative record. often the same people who are against giving negative results a chance also say that we shouldn't worry about the replicability of individual studies because no one thinks that an individual study is definitive anyway.  they appeal to our ability to accumulate evidence across a set of studies.  but that's only a safeguard against flawed individual studies if the larger set of studies is unbiased - if it includes the negative results as well as the positive ones.  science can't accumulate if we only record the successes (this problem is compounded if the successes can be exaggerated or p-hacked, but it's still a problem even in the absence of those distortions).
 
the response i usually get at this point is that many negative results are just failures of execution - poorly designed studies, unskilled experimenters, etc.  i have two answers to this: 1) even if this is true, we need a way to identify the rare valid negative result, so that we have a chance to know when an effect really is zero.  if the negative result itself is a reason to ignore the study, we'll never be able to correct false positives.  2)  the same argument can be made about positive results.  i know how to make any null association significant: measure both variables with the same method.  or in the case of an experiment, throw in a confound or a demand characteristic.  how do i identify these problems? not by saying it must be spurious because it's easy to mess up a study in the direction of producing a significant result, but by scrutinizing the method.  the same should be done for studies with negative results that we suspect of being shoddy.
 
we must have a way of evaluating quality independent of the outcome of the study.  that means requiring much more transparency about how the study was run and analyzed.  in the absence of transparency, it's easy to use results as a crutch.  (i do it, too.  for example, i sometimes use p-values to gauge how robust the results are, because i often can't see everything i need to see (e.g., the a priori design and analysis plan) to tell whether the research was conducted rigorously.)  but we must demand more information - open materials, pre-registration, open data, evidence of the validity of measures and manipulations, positive controls, etc. - so that we can separate the evaluation of the quality of the study from the outcome of the study.
 
some argue that committing to publishing some studies regardless of the outcome will lead to a bunch of boring papers because people will only test questions they can predict the answer to.  i think the opposite is true.  when we aren't evaluated based on our results, we can take more risks.  it's when we need to produce significant results that we're incentivized to only test hypotheses we're pretty sure we can confirm.  indeed, as i learned from one of Anne Scheel's tweets, Gigerenzer (1998) reported that Wallach and Wallach (1994, 1998) showed that most "theories" tested in JPSP and JESP papers are almost tautological.  
 
in some fields, p-hacking is called "testing to a foregone conclusion."  let's stop publishing only papers with foregone conclusions.  contrary to what often gets repeated, the way to  encourage creativity, risk-taking, and curiosity is by publishing rigorous studies testing interesting questions we don't yet know the answers to.  we can have our cake and eat it, too – we just can't guarantee it'll be tasty.****
 
* i would probably eat your failed donut.
** spurious correlation, dutchness
*** if there's a way to hack your way to delicious food without using good methods, please let me know.
**** you thought i'd given up on the tortured analogy, didn't you?
 
Goats eating pumpkinthis pumpkin has been hacked.
 

Guest post by Alexa Tullett: a not-so-hypothetical HIBAR

[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

 The following is a guest post by Alexa Tullett.

Review Submission Background

I was invited to review “When Both The Original Study and Its Failed Replication Are Correct: Feeling Observed Eliminates the Facial-Feedback Effect.” A day before the review was due I went to complete it and realized that the link to the pre-registration wasn’t working. I emailed the journal and asked if I could get access to the pre-registration before I completed my review. Three days after that the handling editor, Dr. Kitayama, forwarded me the updated pre-registration link. Three days after that (I had not yet submitted my review) I got an email saying that the handling editor was moving forward to make a decision without my review. I emailed the journal and asked if it would be ok if I submitted my review by the end of the day, and the peer review coordinator, Charlie Retzlaff, said that would be fine. I did so, emailing my review to Charlie because the paper was no longer in my reviewer center. Three days after that I emailed Dr. Kitayama and asked if he had received my review. He replied and said he had already made his decision, but would forward my review to the authors. When forwarding my review to the authors, Dr. Kitayama asked that they attend to all issues raised.   

Review of “When Both The Original Study and Its Failed Replication Are Correct: Feeling Observed Eliminates the Facial-Feedback Effect.”

In this manuscript the authors provide a theoretical explanation for a failed replication of the facial-feedback effect. They suggest that the reason the original study found the effect while the replications did not was that in the original study participants were not being observed, whereas in all of the replication studies participants were told that they were being monitored by video camera. They conduct a test of this hypothesis, and conclude that the presence of a camera moderated the facial-feedback effect such that it was observed when the camera was absent but not when the camera was present. I think that conducting an empirical test of a hidden moderator explanation for a widely publicized RRR is a terrific idea. Currently, however, I found the evidence provided in the present study to be fairly weak and largely inconclusive. The combination of low statistical power to detect the key interaction, a high p value (p = .051), and some apparent flexibility in data analysis options weakens the evidentiary value of the present study. I elaborate on these points below.

Major Points

  1. The authors calculate that they would need 485 participants to achieve power of 80% to detect an interaction, which is the key prediction (i.e., a moderation effect). However, they decide to use a sample of 200, which becomes 166 after exclusions (note, the authors’ pre-registered plan was to replace excluded participants, but they did not do so because they finished collecting data on the last day of the academic year). In justifying their sample size, the authors write “we opted to align the number of participants in our study with the replication study, namely, based on the power to detect the simple effects… [remainder of quote redacted*].” This seems to reflect a misunderstanding about statistical power; having higher statistical power in the present study does nothing to undermine comparisons to the replication studies, it simply allows a more precise estimate of effect size (whether simple or interactive) in the present study. This can only be an advantage. Moreover, it is the comparisons within the current sample that are critical to testing the authors’ research question, not comparisons between the current sample and the previous replications. The decision to go with lower statistical power in the present study comes at the cost of underpowering the authors’ main analysis (i.e., the interaction between condition and expression). Indeed, the interaction is not significant (p = .051), and thus the results do not technically support the authors’ hypothesis within an NHST framework.
  2. I very much appreciated that the authors pre-registered their methodology and data analysis plan and made this information publically available on OSF. I still have some concerns, however, about remaining flexibility regarding data analysis. 20 participants were excluded because they reported during the debriefing that they did not hold the pen as instructed. How was this decision made? Given that this represents 10% of the sample, it would be helpful to know whether there are multiple reasonable decisions about who should be excluded and how much the results vary depending on these decisions. Also, the authors made some exclusions that were not pre-registered: 2 participants who did not agree to use the pen as instructed, and another 4 who suspected the cover story of video recording. Although these exclusions seem reasonable, it would also seem reasonable to me to include these people. This potential flexibility regarding exclusion criteria has implications for interpreting the p value (.051) for their main analysis. Although this number may seem so close to the p = .05 cutoff that it should be counted as significant, if this “marginal” p-value is dependent on making one out of several reasonable decisions regarding exclusions then the results may be weaker than they appear.

*Because the remainder of this quote differed from the published paper I redacted it from my review to preserve the confidentiality of the review process.

 

nothing beats something

[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

Donkey1
hello

i'm reading a book called Nothing is True and Everything is Possible, about Russia, and the name keeps haunting me (the content of the book is good, too).  sometimes i worry that this goes for science, too.  it's not just that when nothing is true, everything is possible.  but when everything is possible, nothing is true.  

sometimes studying human behavior is so wild and messy that it feels like anything goes.  we create ad hoc scales more often than we probably should, and we invent manipulations out of thin air, rarely pausing to validate them.  if the best we can do is a college student sample, that sin will be forgiven.  if we can't get a behavioral measure of a behavior, a self-report will often do.  we do what's possible because, well, what's the alternative?

i'm here to consider the exceedingly unpopular view that the alternative - to do nothing - is sometimes preferable to doing what's possible.

science is hard.  social science can be especially hard, because we usually need to study humans, and humans are a bitch to study.*  there's this idea out there that it's better to collect noisy, messy data than no data at all.  i've often repeated this claim myself.  indeed, i taught it when i used David Funder's excellent personality textbook.  Funder's third law** is that "something beats nothing, two times out of three."  Funder was smart to add the caveat, because i think sometimes nothing beats something.

slowly over the last few years, i've come to the conclusion that sometimes crappy data is actually worse than no data.  i think that we sometimes fall into the trap of thinking that, because a phenomenon is really, really, really hard to study, very flawed studies are the best we can hope for and so are ok.  it's almost like we are committed to the belief that everything must be possible for a reasonably industrious researcher to study, so if the only option is a bad study, then a bad study must be good enough.

let me be clear: i have definitely made this argument myself, possibly as recently as last month.  i'm no stranger to the feeling that just getting adequate data feels almost impossible, and so inadequate data must do.  for example, i was pretty proud of myself when my lab decided to not just code actual behavior from EAR recordings of >300 participants, and not just double-code the recordings, but TRIPLE code the recordings.  when, after two years of coding by over 50 coders (which came on the heels of several years of data collection), we checked the reliabilities on those triple codings and found them wanting, it was very, very tempting to say "well, it's the best we can do."  luckily i had learned about the spearman-brown prophecy formula, and so had no excuse - adding more raters was definitely going to help, it's a mathematical fact.  so we got three more coders per recording (which took about two more years and 50 more coders).  and let me tell you, it sucked.  we are sick of coding.***  we wish we could have spent those two years doing new projects.  but we got the reliabilities up, and that will let us get much better estimates.  science is not all creativity and innovation.  it is a lot of boring, hard work.  if you don't sometimes hate science, you might be doing it wrong.

even better examples of people who kept going when most of us would have said "good enough:" the two researchers i highlighted in my blog post, "super power."  their efforts really make me rethink my definition of impossible.  

but sometimes getting good data really is impossible, at least for any single lab.  and i would like to float the idea that when this is the case, there may be something noble in walking away, and choosing to do nothing rather than something.  sometimes, the nicest thing you can do for your research question is to leave it unanswered.  (my colleague wiebke bleidorn pointed out that this is called "addition by subtraction.")  if you insist on extracting information from really crappy data, you put yourself at really high risk**** of  reading patterns into noise.  just because you need your data to tell you something, to move the needle, doesn't mean it can.  if our field allowed people to get credit for publishing little bits of (really hard-to-collect) data without making any claims whatsoever, this might be a viable approach, but i don't think that currently exists (though there are low-recognition ways to do this, of course).

the wishful thinking that we can always extract some knowledge from even small bits of messy data can lead to serious, widespread problems.  it's easy to point fingers at other fields, but this problem is alive and well in psychology.  i'm very familiar with the argument that this sample size or that method must be good enough because it's all we can realistically do, given limited resources, hiring or promotion expectations, etc..  it used to be one of the most common responses i'd hear to calls for larger samples or better (i.e., harder) methods.  people seem to have backed off of making the argument out loud, but i think it's still very common in people's minds - this attitude that it's inappropriate to say a method or design isn't good enough because it was hard/expensive/time-consuming to do.  (indeed, i think this is the most common internal response most of us have when we get criticized for studying only WEIRD samples.)

here's what i wish we all did (including me):  1) ask yourself what the ideal study would be to test your research question.  2)  ask yourself if you're willing and able to do something pretty close to it.  3)  if not, ask yourself why not.  really push yourself.  don't let yourself off the hook just because it would be a lot harder than what you're used to doing.  if the research question is important, why isn't it worth the really hard work?  4)  if you're still not willing or able to do something close to the ideal study on your own, either a) move on to a different (or narrower) research question, or b) join forces with other labs.  

i know this is idealistic.  i know we have to keep publishing otherwise our food pellets stop coming.  so let's pick questions we can actually hope to answer with the resources we have.  (this post is really one big, public NOTE TO SELF.  i have been extremely guilty of biting off way more than i can chew,***** and want to learn to scale back my expectations of what kinds of research questions i can rigorously test on my own.)  or join collaborative projects that pool resources to tackle the really hard, important questions, and find a way to deal with the issue of spreading credit around.  if we stop trying to do the impossible, i think we'll find that more things are true.

* also, sometimes to interact with.  or be near.  

** Funder calls his lessons "laws" somewhat tongue-in-cheek.  the whole book is a treat, but if you're new to personality psych, i especially recommend chapters 1 through 7.  felicity's professor at the university of new york used it, almost certainly because of it's engaging yet accurate prose.  this is a very serious footnote, i am not kidding even a little bit.  read it.

*** it is kind of like if your lab had to eat asparagus for six meals a week for several years, and then you realized you had several more years of compulsory asparagus-eating.  (except our pee don't stink.)

**** 100% risk

***** true story: my undergrad thesis examined how sex-role identity interacts with gender to predict attitudes towards women... among high school students... in two different countries.... one of which is Samoa, a remote, non-English-speaking island in the middle of the South Pacific.  

Donkey2
did someone say asparagus?
 

 

bitter carrots*

 [DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

 

Camel1
carrots

if you had told me five years ago that even one in twenty social/personality psych papers would provide links to their data and code, or to a pre-registration, i would have thought that would be huge progress.**   i've long been a fan of the nudges that encourage these kinds of practices (e.g., badges), and until recently i thought going as far as to require this kind of transparency (even with room for legitimate exceptions) was probably unrealistic - our field didn't seem ready for that.   i was sympathetic to the carrots-not-sticks approach.
 
but there's a problem with carrots-not-sticks.   we're asking researchers to eat the carrots, but some of the carrots are pretty bitter.   sometimes, when researchers are transparent, that brings information to light that undermines their claims, and readers don't buy the claims.   that's a necessary side effect of transparency.   and it means we can't in good faith tell researchers that transparency is always in their best interest and will be its own reward.   we can't lure people with carrots, and pretend all of the carrots are delicious and fun to eat.   sometimes carrots are hard to swallow.
 
i think it's time to admit that the main argument for transparency isn't self-interest - it's that transparency is just better for science.*** 
 
imagine the following scenarios:
 
scenario 1: you get a paper to review, and the authors have shared their data and code.   you look at the data and realize there is a coding error, or something else that makes the results uninterpretable (i.e., suggests the study needs to be re-run to fix the error).   you point this out in your review, the editor agrees and rejects the manuscript.

scenario 2: you get a paper to review, and the authors have shared a link to their pre-registration.   by comparing the manuscript and the pre-registration you realize that the analysis that the authors present as their planned analysis, and interpret the p-value for, is not actually the one they had specified a priori as their key planned analysis.   knowing this, you can tell that the claims in the paper are not supported by the evidence.   the editor agrees and rejects the manuscript.
 
scenarios 1 and 2 seem pretty straightforward.   but now consider scenarios 3 and 4:
 
scenario 3: you get a paper to review and the authors do not provide their data and code, but there is no sign of anything wrong.
 
scenario 4: you get a paper to review and the authors did not preregister their study, but they claim that the key result they present was their planned analysis, and interpret the p-value as if it was the only test they ran. 
 
what should you do in scenarios 3 and 4?
one option, and i think the way most of us have been operating, is to assume that the data have no anomalies, and the key analysis was indeed the one planned test that was run.   but is this fair to the authors in scenarios 1 and 2?   in scenarios 3 and 4, we're giving the authors the benefit of the doubt because they didn't let us verify their claims.   in scenarios 1 and 2 we're punishing them because they did let us verify their claims, and we learned that their claims were not justified.
 
but what else could we do in scenarios 3 and 4?   assume that their studies had the same flaws as the studies in scenarios 1 and 2?   that doesn't seem fair to the authors in scenarios 3 and 4.
 
when some authors choose to be transparent, we have no choice but to use the extra information they give us to assess the rigor of their study and the credibility of their claims.   but that also puts us in an impossible position with respect to the manuscripts in which authors are not transparent.   we can't assume these non-transparent studies have flaws, and we can't assume they don't.  
 
it seems to me the only fair thing to do is to make transparency the default.****   whenever possible, authors should be expected to share the data and code necessary to reproduce their results unless that's legally or ethically problematic.   and if authors claim that their key analysis was planned (i.e., if they're saying they're doing hypothesis testing and/or interpreting a p-value), we should ask that they document this plan (i.e., pre-register), or present their work as exploratory and their conclusions as less sure.   it's just not fair to let some authors say "trust me" when other authors are willing to say "check for yourself."
 
i know that's not the world we live in, and as long as transparency is not the default, we have to treat papers like those in scenarios 3 and 4 somewhere in the gray area between flawless and deeply flawed.   but my heart really goes out to the authors in scenarios like 1 and 2.   it would be completely rational for those authors to feel like they are paying too high a price for transparency.   (i've tried to make the case that there is a silver lining - their transparency makes the review process more fruitful for them, because it allows reviewers and editors to pinpoint specific ways they could improve their work which wouldn't have been possible without their openness.   but i'm sure that's not much consolation if they see papers like those in scenarios 3 and 4 getting published over theirs.)
 
my sense is that many people are getting on board with the credibility revolution, but only so long as all of the incentives are carrots, and not sticks.   as long as we can choose which carrots we want to go for, and not be punished if we don't eat our veggies, everyone is happy.   but that won't work in the long run.   it was perhaps a necessary step on the way to more widespread changes, but i think we need to start seriously considering making carrot-eating the default (also known as using sticks).   i can't think of how to make the current opt-in system we have fair.   if you can, i'd love to hear it.
 
* for more on problematic carrot-eating, see this by james heathers.
** in honor of the finding that two spaces after periods is the morally superior formatting, i am compensating for years of being bullied into one space by using three paces.   (yes, i'm aware of the shakiness of the evidence but i never let that get in the way of a good footnote.)   #iwantmyspacesback
*** i don't mean that everything should always be transparent, and i don't know anyone who believes that.   i mean that things that can legally and ethically be made transparent usually should be.  
****  this seems like a good time to remind readers that i do not set policy for any journals, and i am conscious of the difference between my personal fantasies and the realities of editorial responsibility (as tempting as it is to use my vast editorial powers to force everyone to put five spaces between sentences).*****   
***** editorial abuses of power is a topic for another blog post. #bringonthelawsuits
 
 
 
Polar bear carrot
mmmm carrots
      
 

An Oath for Scientists

[DISCLAIMER: The opinions expressed in my posts are personal opinions, and they do not reflect the editorial policy of Social Psychological and Personality Science or its sponsoring associations, which are responsible for setting editorial policy for the journal.] 

Bear oath1

i've been thinking a lot about what it means to be a scientist.  being a scientist comes with certain obligations, and ignoring those obligations can give science a bad name.  it seems to me we could do more to make scientists aware of this responsibility when they decide whether or not to join the profession.

our most important obligation as scientists, to my mind, is preserving science's credibility. that doesn't mean we can't make mistakes, but above all else, we should be committed to opening ourselves up to scrutiny and correcting our errors.

to make these values a bit more concrete, i tried to adapt the hippocratic oath to scientists. you can tell how solemn this oath is by the use of capitalization.

the values i tried to capture were inspired by Merton's norms, but in the spirit of Merton's norm of universalism, i refrained from naming the oath after him (or anyone). it is very far from comprehensive, and i know it's cheesy, but i ask you, dear reader: if you can't engage in a little facile sentimentality on new year's day, when can you? 

 

An Oath for Scientists 

I swear that I will, according to my ability and judgment, protect the credibility of science by carrying out this oath and this indenture.

To make the grounds for my scientific claims transparent and available for others to scrutinize, to welcome that scrutiny and accept that others will be skeptical of my claims, to help others verify the soundness of my claims; to describe my methods in sufficient detail for others to repeat them, to not obstruct others' attempts to replicate my work; to report all evidence I know of for or against my claim, to not suppress evidence against my conclusions, to correct my past claims if I learn that they were wrong, to support the dissemination of evidence that disconfirms or contradicts my past claims.

I will hold myself and all other scientists to this oath, and I will not exempt any scientist because of her status or reputation. I will judge scientific claims based on the evidence, not the scientist making the claim. Neither will I hold any scientific claim or finding as sacred. Similarly, I will recognize as valuable the work of scientists who aim to correct errors in the scientific record.  

In whatever claims I present in my role as scientist, I will not knowingly overstate or exaggerate the evidence, I will not make claims out of interest for advancing my own station, and I will disclose any personal interest that may be perceived as biasing my judgment.  I will protect the credibility of my profession by making careful, transparent, calibrated claims.

Now if I carry out this oath, and break it not, may I gain for ever reputation among all scientists for my work; but if I transgress it and forswear myself, may the opposite befall me.

 

Polar bear oath1