Double oops
Interestingly, my change “solved” the perceived bias in the shuffling code, but it actually introduced real bias in an otherwise correct implementation of the Knuth-Fisher-Yates shuffle algorithm.
This was pointed out to me yesterday (11 March 2017) in the comments on the answer I copy-pasted by Steve Marx. His comment included a link to a post by Jeff Attwood on his Coding Horror blog, aptly titled The Danger of Naïveté.
Sooooo…. In this case the perils of copy-paste programming didn’t materialize, and succumbing to the “your shuffle is biased” peer-pressure actually led to introducing a change that introduced an actual bias that was only perceived bias before.
The morale of this story still stands: don’t copy-paste code you don’t understand. I am now adding to it: don’t change your code just because someone tells you there is something wrong with your code. Let them show it to be wrong.
Couple of weeks ago, you were looking for a quick way to shuffle the names of your teammates so you could make the order in which everybody got a turn in the standup a bit more unpredictable and could also make appointing the “designated driver” for the standup a bit of a lottery.
Rush
You were in a bit of a rush. So you did a quick search and one of the first Google results looked to provide exactly what you needed: How to randomize (shuffle) a JavaScript array?
/** * Randomize array element order in-place. * Using Durstenfeld shuffle algorithm. */ function shuffleArray(array) { for (var i = array.length - 1; i > 0; i--) { var j = Math.floor(Math.random() * (i + 1)); var temp = array[i]; array[i] = array[j]; array[j] = temp; } return array; }
That should do the trick.
A little voice inside your head spoke up: “You know what you always say: don’t copy-paste what you don’t understand.”
You told it to shut up.
Admittedly, you don’t really understand what’s going on in the Math.floor() statement, but what could be wrong with this tiny bit of code. Besides, it’s coming from StackOverflow. Not exactly the worst site in the world. And from an answer with 301 upvotes! What can be wrong with something like that?
So you copy paste the function provided in the answer in your page, call it from your own shuffleAndShow()
function that you already coded to show an unsorted list of names.
Job done.
You and your teammates now enjoy a little unpredictability in the order in which you speak during your standup and in determining who is going to be the “designated driver” for the next standup.
Yay!
Favoritism
Couple of weeks later, some of your teammates are starting to complain. The shuffle page seems to have a distinct preference for the lower part of the alphabet.
Doesn’t ruffle the feathers with regard to the speaking order, but being the designated driver (chairing the meeting, presenting the board, opening issues, etc so everyone in the call is on the same page) is seen as a bit of a duty that most would rather avoid.
Hmm.
You decide to look into it. There is really only one line in that piece of code that could be the culprit.
var j = Math.floor(Math.random() * (i + 1));
What could be wrong with it?
Google leads you to Generating random whole numbers in JavaScript in a specific range?.
The answer by Ionuț G. Stan has a really comprehensive answer. Reading makes you understand what is going on and why the function you copy-pasted is favoring the lower end of the array passed into it.
To get a really random number between a min and a max, you need
/** * Returns a random number between min (inclusive) and max (exclusive) */ function getRandomArbitrary(min, max) { return Math.random() * (max - min) + min; }
And to get a random integer value between min and max, you need:
/** * Returns a random integer between min (inclusive) and max (inclusive) * Using Math.round() will give you a non-uniform distribution! */ function getRandomInt(min, max) { return Math.floor(Math.random() * (max - min + 1)) + min; }
Hmm. Ok. So the code you copied did get the use of floor()
correct.
But the i + 1
it uses to multiply the result of the math.random()
call doesn’t seem to make any sense in the light of Ionut’s explanation.
It should be:
var j = Math.floor(Math.random() * (max - min + 1)) + min;
Where max the index of the last item in the array passed to the shuffle
function and min is the index of the first item in any array.
The index of the last item in any Javascript array is array.length() - 1
.
The index of the first item in any Javascript array is 0
.
Which leads to:
var j = Math.floor(Math.random() * ((array.length() - 1) - 0 + 1)) + 0; // ......max........... min min
And:
var j = Math.floor(Math.random() * array.length());
Morale of the story
This of course was a very insignificant piece of code and not much (if any) harm was done. But it does illustrate an important point.
It’s ironic because, it happened to me and I am known to speak out against copy-paste programming. That is, I don’t mind copy pasting, but at least understand what you are copying. The first time I succumb to the allure of copy-paste programming without understanding what the code is doing, I get bitten in the behind. Ouch.
On the positive side: serves me right and reinforces my own message to me.
Don’t copy-paste anything you don’t understand.
What have you copy-pasted that has come back to haunt you?
Aangezien de kans groot is dat je zelf achteraan het alfabetisch gesorteerde lijstje staat, geloofde zeker niemand dat het niet met opzet was…
Wel grappig dat ik direct zag dat met het teruglopen van de iterator alleen met een lager element gewisseld werd, maar mijn eerste reactie was: oh ja, daar was een reden voor. Ga je net zo hard de fout in.
The code you found on Stack Overflow is actually correct, and your proposed fix will introduce bias. See https://blog.codinghorror.com/the-danger-of-naivete/ for a nice writeup of this mistake.
Thanks Steve. Edited the post to point this out more pointedly 😉 then buried in the comments.
Hi Marjan, what exactly did you edit in the post? The original version (copy pasted) is still correct and your modification is still wrong (introduces bias).
Googled: copy and paste harms software
This webpage came up.
I was copy and pasting information into an Access DB. We tracked down the only variable to be copy and pasting. After stopping copy and pasting, the Access DB worked again.
This partially makes sense since you can paste as text to leave out bold, italics, etc.
Otherwise, what additional code is in a copy and paste that could harm software?
Thanks,
“copy-paste programming” is not about the technicalities of copy-pasting, but about the idea that when you create software by copy-pasting some code that you found on the internet or in a book without actually understanding what that code is doing and how it is achieving it, you can create all sorts of problems, subtle and not so subtle. That’s what this post is about – me copy-pasting something, apparently creating an error because I copy-pasted a flawed solution, then “correcting” it (without still really understanding what was going on) and actually introducing a bug/bias doing so. If you don’t understand some code, don’t copy-paste it is the message of this post.
Please remove this article. It is misleading and wastes people’s time. It took me quite a bit of Googling to understand that your article is incorrect and that the function on Stack Overflow is actually the correct one.
In the comments you say that you have corrected the article, which is doubly misleading, because now people might think that the information in the article is accurate, but it is still not. The whole line of reasoning that takes up most of the space is incorrect.
The only thing you seemed to have added is a kind of disclaimer in the beginning, and it actually took me a couple of times searching through the article to find it, because it looks like your standard inconsequential introduction that you tend to automatically skip.
Thanks, you just proved my point.