Much talk is made of the crisis in replication in various areas. I report one experience.
In 2010 I was the lead author of a research paper which was presented at a conference in Melbourne. We were investigating the idea that if people said ‘fuck’ while talking, that speech recognition would improve – the Holy Grail of those working in the area. It was intuitively obvious to see reasons why that might happen.
We had available to us an online game my co-author had developed as advertising for the movie Despicable Me. It was the first (only?) speech enabled internet game. The user would tell Minions to do things and they would do them. While developing it, Manny figured that it would be used mostly by boys of an age to find it vastly amusing to say ‘Fucking play table-tennis’ as an alternative to ‘Play table-tennis’. Etc. As he tested the system he had the idea that maybe he was being recognised more when he used the word ‘fucking’.
So we devised a test where people were given a set of randomly generated orders to give the Minions. They all included expressions without swearwords, with ‘fucking’ and with other swearwords instead. Nobody knew why they were doing the exercise. For lack of resources I asked people I knew to do this for me. We had a small group, but I thought a reliable one as a consequence.
We established statistical significance for the hypothesis that ‘fuck’ as an intensifier did improve recognition. You can see What’s the Magic Word? here.
About a year later we decided to do the experiment again. I didn’t have any more friends left (I know, it’s sad) but we had the resource available of AMT – the Turk. We did not achieve statistical significance. There were various possibilities as to why that was. Of course it could have been that our first experiment did not produce accurate data. But we had to acknowledge two important differences in the nature of the data collected which might also have been at issue. Firstly, using lowly paid (though we paid them much better than going rates) Turks. Secondly our Turks were US based. In our first experiment our users had been mostly Australians with a few English thrown in.
Ideally we would have used Australians again. However, at the time (I don’t know about now) this idea of crowd-sourced labour had not come to Australia yet. Furthermore and rendering the idea of a close if not exact repetition of the experiment impossible, the Minions game was taken offline and we could no longer use it.
We do think from time to time of exploring the ideas further, but the one thing we can state for sure is that neither we nor anybody else can repeat our original experiment.
For anybody looking at the paper, please note that it had a second agenda which is why it is oddly written by normal standards of academic work. Having proofread various papers in the same area which struck me as excruciatingly boring, I questioned the idea that one had to write in that way in order to be published. I wanted to write something that would be both interesting and intelligible to a person walking in off the street to listen to the paper being presented. In fact you have to pay a lot of money to go to academic conferences, so people off the street are excluded. I believe that is wrong and I believe that many, if not all, academic papers could and should be written in ways more accessible to the population at large.