Visualizzazione post con etichetta ben goldacre bad science. Mostra tutti i post
Visualizzazione post con etichetta ben goldacre bad science. Mostra tutti i post

venerdì 5 maggio 2017

L'uomo che disegnava bersagli intorno ai fori di pallottola

Bad Stats - Bad Science by Ben Goldacre
Noi non sappiamo leggere le statistiche se non ci vengono “tradotte” in un linguaggio naturale.
L’umile assunzione di questa verità appartiene ad ogni persona di buon senso.
Prendiamo i “miracoli del colesterolo”…
… Let’s say the risk of having a heart attack in your fifties is 50 per cent higher if you have high cholesterol. That sounds pretty bad. Let’s say the extra risk of having a heart attack if you have high cholesterol is only 2 per cent. That sounds OK to me. But they’re the same (hypothetical) figures…
Noi non sappiamo leggere le statistiche perché il cervello umano non comprende il reale significato delle probabilità e dei fattori di rischio. Siamo fatti così, inutile insistere.
Su questo fatto i giornali ci giocano, evitando accuratamente di proporre i medesimi risultati in termini di frequenze naturali, ovvero di numeri assoluti, qualcosa che il nostro cervello afferra meglio…
… Out of a hundred men in their fifties with normal cholesterol, four will be expected to have a heart attack; whereas out of a hundred men with high cholesterol, six will be expected to have a heart attack. That’s two extra heart attacks per hundred. Those are called ‘natural frequencies’. Natural frequencies are readily understandable, because instead of using probabilities, or percentages, or anything even slightly technical or difficult, they use concrete numbers, just like the ones you use every day to check if you’ve lost a kid on a coach trip, or got the right change in a shop. Lots of people have argued that we evolved to reason and do maths with concrete numbers like these, and not with probabilities, so we find them more intuitive…
Rischio. Tra assoluto e relativo ci passa il mare, e i giornali ci giocano…
… you could have a 50 per cent increase in risk (the ‘relative risk increase’); or a 2 per cent increase in risk (the ‘absolute risk increase’); or, let me ram it home, the easy one, the informative one, an extra two heart attacks for every hundred men, the natural frequency…
La carne rossa causa il cancro. Ovvio. Ma in che misura? Al prof in TV è richiesto di impressionare il pubblico senza mentire. Che fa? Ecco un tipico dialogo con l’anchorman…
… Try this, on bowel cancer, from the Today programme on Radio 4: ‘A bigger risk meaning what, Professor Bingham?’ ‘A third higher risk.’ ‘That sounds an awful lot, a third higher risk; what are we talking about in terms of numbers here?’ ‘A difference … of around about twenty people per year.’ ‘So it’s still a small number?’ ‘Umm … per 10,000…’…
Antidepressivi e infarti
… The reports were based on a study that had observed participants over four years, and the results suggested, using natural frequencies, that you would expect one extra heart attack for every 1,005 people taking ibuprofen…
Ecco la notizia sul nesso come riportata dai media, ancora il giochetto di prendere il rischio assoluto anziché quello relativo…
… ‘British research revealed that patients taking ibuprofen to treat arthritis face a 24 per cent increased risk of suffering a heart attack.’ Feel the fear. Almost everyone reported the relative risk increases…
E i ricercatori non sono meno nel drammatizzare. A volte cercano le luci della ribalta più avidamente di una soubrette.
***
H.G. Wells previde che la statistica sarebbe stata il fulcro della civiltà a venire. Giusto. Ma previde anche che ci saremmo abituati ad interpretarle correttamente. Sbagliato, sbagliato, sbagliato…
… Over a hundred years ago, H.G. Wells said that statistical thinking would one day be as important as the ability to read and write in a modern technological society. I disagree; probabilistic reasoning is difficult for everyone, but everyone understands normal numbers…
***
Facciamo un esempio: lo sapevate che la cannabis attualmente in circolazione è molto più potente di quella di una volta? La notizia…
… The Independent was in favour of legalising cannabis for many years, but in March 2007 it decided to change its stance. One option would have been simply to explain this as a change of heart, or a reconsideration of the moral issues. Instead it was decorated with science—as cowardly zealots have done from eugenics through to prohibition—and justified with a fictitious change in the facts… Twice in this story we are told that cannabis is twenty-five times stronger than it was a decade ago… The data from the Laboratory of the Government Chemist goes from 1975 to 1989. Cannabis resin pootles around between 6 per cent and 10 per cent THC, herbal between 4 per cent and 6 per cent. There is no clear trend. The Forensic Science Service data then takes over to produce the more modern figures, showing not much change in resin, and domestically produced indoor herbal cannabis doubling in potency from 6 per cent to around 12 or 14 per cent. (2003–05 data in table under references)…. The rising trend of cannabis potency is gradual, fairly unspectacular, and driven largely by the increased availability of domestic, intensively grown indoor herbal cannabis…. ‘Twenty-five times stronger’, remember. Repeatedly, and on the front page. If you were in the mood to quibble with the Independent’s moral and political reasoning, as well as its evident and shameless venality, you could argue that intensive indoor cultivation of a plant which grows perfectly well outdoors is the cannabis industry’s reaction to the product’s illegality itself… In the mid-1980s, during Ronald Reagan’s ‘war on drugs’ and Zammo’s ‘Just say no’ campaign on Grange Hill, American campaigners were claiming that cannabis was fourteen times stronger than in 1970. Which sets you thinking. If it was fourteen times stronger in 1986 than in 1970, and it’s twenty-five times stronger today than at the beginning of the 1990s, does that mean it’s now 350 times stronger than in 1970? That’s not even a crystal in a plant pot. It’s impossible…
Ricorda un’altra vicenda, quella dell’alluvione di cocaina in arrivo nelle nostre città (marzo 2006). L’articolo…
… ‘Use of the addictive drug by children doubles in a year,’ said the subheading. Was this true?…
I dati erano tratti da fonti governative.
Ma la fonte sembrava minimizzare nel suo commento, parlava di “nessun aumento”. Per fortuna che il fiero giornalista investigativo aveva fiutato il marcio, ovvero aveva scoperto che in realtà i consumatori di cocaina erano raddoppiati!…
… If you read the press release for the government survey on which the story is based, it reports ‘almost no change in patterns of drug use, drinking or smoking since 2000’. But this was a government press release, and journalists are paid to investigate…
La fonte documentale
… You can download the full document online. It’s a survey of 9,000 children, aged eleven to fifteen, in 305 schools. The three-page summary said, again, that there was no change in prevalence of drug use. If you look at the full report you will find the raw data tables: when asked whether they had used cocaine in the past year, 1 per cent said yes in 2004, and 2 per cent said yes in 2005. So the newspapers were right: it doubled? No. Almost all the figures given were 1 per cent or 2 per cent…
Ecco: nel 2003 l’1% degli intervistati diceva di aver consumato cocaina. Nel 2004 il 2%. Possiamo davvero parlare di raddoppio?
Senza contare dell’ “arrotondamento perduto”…
… The actual figures were 1.4 per cent for 2004, and 1.9 per cent for 2005, not 1 per cent and 2 per cent…
Traduciamo tutto in termini di rischio, ma di rischio relativo…
… What we now have is a relative risk increase of 35.7 per cent, or an absolute risk increase of 0.5 per cent. Using the real numbers, out of 9,000 kids we have about forty-five more saying ‘Yes’ to the question ‘Did you take cocaine in the past year?’ Presented with a small increase like this, you have to think: is it statistically significant?…
Nonostante questo, sembrerebbe che l’incremento sia statisticamente significativo. Allora perché gli estensori della statistica dicevano che non vi era alcun incremento? Perché?
Partiamo dall’inizio, cos’è la significatività statistica?…
… It’s just a way of expressing the likelihood that the result you got was attributable merely to chance. Sometimes you might throw ‘heads’ five times in a row, with a completely normal coin, especially if you kept tossing it for long enough… The standard cut-off point for statistical significance is a p-value of 0.05, which is just another way of saying, ‘If I did this experiment a hundred times, I’d expect a spurious positive result on five occasions, just by chance.’…
Ma attenzione: la significatività statistica assume che i casi osservati siano indipendenti, il che non è mai vero del tutto nel mondo reale. Il comportamento degli studenti, per esempio, è influenzato da tanti fattori comuni (mode, eventi, trend…). Tanto è vero che se replichiamo nel mondo concreto il sondaggio non otteniamo mai il 5% canonico…
… To ‘data mine’, taking it out of its real-world context, and saying it is significant, is misleading. The statistical test for significance assumes that every data point is independent, but here the data is ‘clustered’, as statisticians say. They are not data points, they are real children, in 305 schools. They hang out together, they copy each other, they buy drugs from each other, there are crazes, epidemics, group interactions… The increase of forty-five kids taking cocaine could have been a massive epidemic of cocaine use in one school…
Urge correggere il risultato. Gli statistici chiamano questa correzione “clustering” (una tecnica per far la tara alla dipendenza insita tra i data points)…
… As statisticians would say, you must ‘correct for clustering’. This is done with clever maths which makes everyone’s head hurt. All you need to know is that the reasons why you must ‘correct for clustering’ are transparent, obvious and easy, as we have just seen… When you correct for clustering, you greatly reduce the significance of the results…
Cosa resta dopo questa correzione?
Ben poco, anche perché, nel caso di specie, bisogna apportarne una ulteriore.
Quando testi molte relazioni in teoria puoi scegliere quelle che ti fanno più comodo. Aumenta così la possibilità che alcune siano positive per puro caso, viene la tentazione di assumerle scartando le altre. Il metodo scientifico, infatti, imporrebbe di fare delle ipotesi tramite un modello e poi di verificarle. Guardare ai dati per costruire delle ipotesi non è il modo corretto di procedere…
… Will our increase in cocaine use, already down from ‘doubled’ to ‘35.7 per cent’, even survive? No. Because there is a final problem with this data: there is so much of it to choose from. There are dozens of data points in the report: on solvents, cigarettes, ketamine, cannabis, and so on. It is standard practice in research that we only accept a finding as significant if it has a p-value of 0.05 or less. But as we said, a p-value of 0.05 means that for every hundred comparisons you do, five will be positive by chance alone. From this report you could have done dozens of comparisons, and some of them would indeed have shown increases in usage—but by chance alone, and the cocaine figure could be one of those…
Analogia: se lancio ripetutamente il dado potrò poi scegliere ad hoc delle serie di 6 in modo da dimostrare che non c’è casualità…
… If you roll a pair of dice often enough, you will get a double six three times in a row on many occasions…
Lo studio in oggetto contiene una miriade di confronti tra variabili le più disparate. E’ quindi uno studio che induce i ricercatori in tentazione. In casi del genere occorre procedere con la “correzione di Bonferroni”, una rettifica deontologica/metodologica che si applica comunemente in casi del genere…
… This is why statisticians do a ‘correction for multiple comparisons’, a correction for ‘rolling the dice’ lots of times. This, like correcting for clustering, is particularly brutal on the data, and often reduces the significance of findings dramatically…
Dopo quest’ultima correzione, del “raddoppio” di cui parla l’alacre giornalista investigativo non resta più niente.
I nerd che hanno stilato lo studio in oggetto, oltre ad interpretare correttamente il passaggio dall’1% al 2%, conoscevano bene la “correzione per cluster” e la “correzione di Bonferroni”, per questo concludevano che “non si registra alcun aumento nel consumo di cocaina. Per questo, e non per tacere al popolo una “scomoda” verità.
***
Ma la piaga più vistosa delle statistiche sono i campioni mal selezionati
… There are also some perfectly simple ways to generate ridiculous statistics, and two common favourites are to select an unusual sample group, and to ask them a stupid question. Let’s say 70 per cent of all women want Prince Charles to be told to stop interfering in public life. Oh, hang on—70 per cent of all women who visit my website want Prince Charles to be told to stop interfering in public life…
Esempio: disponibilità dei medici a fare aborti
… Telegraph in the last days of 2007. ‘Doctors Say No to Abortions in their Surgeries’ was the headline. ‘Family doctors are threatening a revolt against government plans to allow them to perform abortions in their surgeries… ‘Four out of five GPs do not want to carry out terminations even though the idea is being tested in NHS pilot schemes, a survey has revealed.’…
La fonte della notizia…
… It was an online vote on a doctors’ chat site that produced this major news story. Here is the question, and the options given:   ‘GPs should carry out abortions in their surgeries’ Strongly agree, agree, don’t know, disagree, strongly disagree…
Primo: dubbi sulla formulazione della domanda
… Is that ‘should’ as in ‘should’? As in ‘ought to’?… Are they just saying no because they’re grumbling about more work and low morale? More than that, what exactly does ‘abortion’ mean here?…
***
Un altro caso esemplare. Avete presente quanti omicidi commettono gli psicopatici?…
… In 2006, after a major government report, the media reported that one murder a week is committed by someone with psychiatric problems. Psychiatrists should do better, the newspapers told us, and prevent more of these murders. All of us would agree…
Non si potrebbe fermarli prima? Non si potrebbe trattenere i soggetti più pericolosi?
Chi pensa a soluzioni del genere non ha chiaro il concetto di frequenza di base e di falso positivo
… the blood test for HIV has a very high ‘sensitivity’, at 0.999. That means that if you do have the virus, there is a 99.9 per cent chance that the blood test will be positive. They would also say the test has a high ‘specificity’ of 0.9999—so, if you are not infected, there is a 99.99 per cent chance that the test will be negative. What a smashing blood test.* But if you look at it from the perspective of the person being tested, the maths gets slightly counterintuitive. Because weirdly, the meaning, the predictive value, of an individual’s positive or negative test is changed in different situations, depending on the background rarity of the event that the test is trying to detect. The rarer the event in your population, the worse your test becomes, even though it is the same test. This is easier to understand with concrete figures. Let’s say the HIV infection rate among high-risk men in a particular area is 1.5 per cent. We use our excellent blood test on 10,000 of these men, and we can expect 151 positive blood results overall: 150 will be our truly HIV-positive men, who will get true positive blood tests; and one will be the one false positive we could expect from having 10,000 HIV-negative men being given a test that is wrong one time in 10,000. So, if you get a positive HIV blood test result, in these circumstances your chances of being truly HIV positive are 150 out of 151. It’s a highly predictive test. Let’s now use the same test where the background HIV infection rate in the population is about one in 10,000. If we test 10,000 people, we can expect two positive blood results overall. One from the person who really is HIV positive; and the one false positive that we could expect, again, from having 10,000 HIV-negative men being tested with a test that is wrong one time in 10,000. Suddenly, when the background rate of an event is rare, even our previously brilliant blood test becomes a bit rubbish. For the two men with a positive HIV blood test result, in this population where only one in 10,000 has HIV, it’s only 50:50 odds on whether they really are HIV positive…
L’esame psichiatrico dei soggetti pericolosi ha falsi positivi notevoli abbinati poi a frequenze di base comunque piuttosto basse. Assurdo fermare dei soggetti in condizioni tanto incerte…
… Let’s think about violence. The best predictive tool for psychiatric violence has a ‘sensitivity’ of 0.75, and a ‘specificity’ of 0.75. It’s tougher to be accurate when predicting an event in humans, with human minds and changing human…
Basta fare qualche calcolo…
… Let’s say 5 per cent of patients seen by a community mental health team will be involved in a violent event in a year. Using the same maths as we did for the HIV tests, your ‘0.75’ predictive tool would be wrong eighty-six times out of a hundred. For serious violence, occurring at 1 per cent a year, with our best ‘0.75’ tool, you inaccurately finger your potential perpetrator ninety-seven times out of a hundred. Will you preventively detain ninety-seven people to prevent three violent events?…
Mettere praticamente in gabbia 96 persone per salvare tre vite è un po’ esagerato. O no?
***
Il caso Clark
… In 1999 solicitor Sally Clark was put on trial for murdering her two babies…
La prova della sua colpevolezza…
… At her trial, Professor Sir Roy Meadow, an expert in parents who harm their children, was called to give expert evidence. Meadow famously quoted ‘one in seventy-three million’ as the chance of two children in the same family dying of Sudden Infant Death Syndrome (SIDS)….
Troppo improbabile che due bambini muoiano insieme per ragioni naturali: doveva averli assassinati lui per forza!
Cosa c’è che non va in questo ragionamento.
Innanzitutto si commette una “fallacia ecologica”: certi fatti non sono indipendenti e l’operatore logico “contemporaneamente” non si rende fattorizzando…
… The figure of ‘one in seventy-three million’ itself is iffy, as everyone now accepts. It was calculated as 8,543 × 8,543, as if the chances of two SIDS episodes in this one family were independent of each other. This feels wrong from the outset, and anyone can see why: there might be environmental or genetic factors at play, both of which would be shared by the two babies…
Poi c’è la “fallacia dell’accusatore”, il quale tiene conto solo dell’improbabile innocenza. E l’improbabile colpevolezza che fine ha fatto?…
… Many press reports at the time stated that one in seventy-three million was the likelihood that the deaths of Sally Clark’s two children were accidental: that is, the likelihood that she was innocent… Once this rare event has occurred, the jury needs to weigh up two competing explanations for the babies’ deaths: double SIDS or double murder. Under normal circumstances—before any babies have died—double SIDS is very unlikely, and so is double murder… If we really wanted to play statistics, we would need to know which is relatively more rare, double SIDS or double murder. People have tried to calculate the relative risks of these two events, and one paper says it comes out at around 2:1 in favour of double SIDS… the rarity of double SIDS is irrelevant, because double murder is rare too…
***
A posteriori nessun caso puo’ essere definito sorprendente: Richard Feynman in merito alla cosa…
… You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing… Richard Feynman…
Ecco il caso dell’infermiera assassina: troppi morti durante i suoi turni…
… A nurse called Lucia de Berk has been in prison for six years in Holland, convicted of seven counts of murder and three of attempted murder. An unusually large number of people died when she was on shift, and that, essentially, along with some very weak circumstantial evidence, is the substance of the case against her… The judgement was largely based on a figure of ‘one in 342 million against’….
Calma: mai fidarsi delle “previsioni” fatte dopo. Le previsioni si fanno prima: una cosa è rara solo se imprevedibile…
… It’s only weird and startling when something very, very specific and unlikely happens if you have specifically predicted it beforehand…
L’uomo che disegnava bersagli intorno ai fori di pallottola…
… Imagine I am standing near a large wooden barn with an enormous machine gun. I place a blindfold over my eyes and—laughing maniacally—I fire off many thousands and thousands of bullets into the side of the barn. I then drop the gun, walk over to the wall, examine it closely for some time, all over, pacing up and down. I find one spot where there are three bullet holes close to each other, then draw a target around them, announcing proudly that I am an excellent marksman…
Prima le ipotesi, poi le evidenze. Ecco come opera la scienza…
… a cardinal rule of any research involving statistics: you cannot find your hypothesis in your results…
I rischi dell’indagine a ritroso
… To collect more data, the investigators went back to the wards to see if they could find more suspicious deaths. But all the people who were asked to remember ‘suspicious incidents’ knew that they were being asked because Lucia might be a serial killer. There was a high risk that ‘an incident was suspicious’ became synonymous with ‘Lucia was present’…
Qui bisogna essere chiari: alcuni fenomeni non possono essere verificati, cosicché puo’ essere interessante formulare delle ipotesi a posteriori, è tutto quel che abbiamo in mano. pensiamo solo al caso del principio antropico. Tuttavia, bisogna essere ben consapevoli della differenza tra il metodo scientifico più rigoroso e questo modo di agire.

14 Bad Stats - Bad Science by Ben Goldacre

14 Bad StatsRead more at location 3574
Note: 14@@@@@@@@@@@@@@@@ Edit
The biggest statisticRead more at location 3579
Note: t Edit
Let’s say the risk of having a heart attack in your fifties is 50 per cent higher if you have high cholesterol. That sounds pretty bad. Let’s say the extra risk of having a heart attack if you have high cholesterol is only 2 per cent. That sounds OK to me. But they’re the same (hypothetical) figures.Read more at location 3582
Note: I MIRACOLI DEL COLESTEROLO Edit
Out of a hundred men in their fifties with normal cholesterol, four will be expected to have a heart attack; whereas out of a hundred men with high cholesterol, six will be expected to have a heart attack. That’s two extra heart attacks per hundred. Those are called ‘natural frequencies’. Natural frequencies are readily understandable, because instead of using probabilities, or percentages, or anything even slightly technical or difficult, they use concrete numbers, just like the ones you use every day to check if you’ve lost a kid on a coach trip, or got the right change in a shop. Lots of people have argued that we evolved to reason and do maths with concrete numbers like these, and not with probabilities, so we find them more intuitive.Read more at location 3584
Note: NOI CAPIAMO LE FREQ NATURALI. PROB E % CI CONF. E I GIORNALI CI GIOCANO Edit
you could have a 50 per cent increase in risk (the ‘relative risk increase’); or a 2 per cent increase in risk (the ‘absolute risk increase’); or, let me ram it home, the easy one, the informative one, an extra two heart attacks for every hundred men, the natural frequency.Read more at location 3591
Note: ASSOLUTI E RELATIVI Edit
red meat causes bowel cancer,Read more at location 3594
Try this, on bowel cancer, from the Today programme on Radio 4: ‘A bigger risk meaning what, Professor Bingham?’ ‘A third higher risk.’ ‘That sounds an awful lot, a third higher risk; what are we talking about in terms of numbers here?’ ‘A difference … of around about twenty people per year.’ ‘So it’s still a small number?’ ‘Umm … per 10,000…’Read more at location 3595
Note: CARNE E TUMORI. IL PROF IN TV CHE DEVE SPARARE ALTO Edit
painkillers and heart attacks,Read more at location 3604
The reports were based on a study that had observed participants over four years, and the results suggested, using natural frequencies, that you would expect one extra heart attack for every 1,005 people taking ibuprofen.Read more at location 3605
Note: ANTIDEPRESSIVI E INFARTO Edit
‘British research revealed that patients taking ibuprofen to treat arthritis face a 24 per cent increased risk of suffering a heart attack.’ Feel the fear. Almost everyone reported the relative risk increases:Read more at location 3607
Note: ECCO LA NOTIZIA RIPORTATA Edit
academics can themselves be as guilty as the rest when it comes to overdramatising their researchRead more at location 3614
Note: I RICERCATORI NN SONO DA MENO NEL DRAMMATIZZARE Edit
Over a hundred years ago, H.G. Wells said that statistical thinking would one day be as important as the ability to read and write in a modern technological society. I disagree; probabilistic reasoning is difficult for everyone, but everyone understands normal numbers.Read more at location 3621
Note: ALLE STAT NN CI ABITUEREMO MAI Edit
Choosing your figuresRead more at location 3624
Note: t Edit
The Independent was in favour of legalising cannabis for many years, but in March 2007 it decided to change its stance. One option would have been simply to explain this as a change of heart, or a reconsideration of the moral issues. Instead it was decorated with science—as cowardly zealots have done from eugenics through to prohibition—and justified with a fictitious change in the facts.Read more at location 3627
Note: LA CANNABIS POTENZIATA NEGLI ULTIMI ANNI Edit
Twice in this story we are told that cannabis is twenty-five times stronger than it was a decade ago.Read more at location 3634
Note: c Edit
The data from the Laboratory of the Government Chemist goes from 1975 to 1989. Cannabis resin pootles around between 6 per cent and 10 per cent THC, herbal between 4 per cent and 6 per cent. There is no clear trend. The Forensic Science Service data then takes over to produce the more modern figures, showing not much change in resin, and domestically produced indoor herbal cannabis doubling in potency from 6 per cent to around 12 or 14 per cent. (2003–05 data in table under references).Read more at location 3641
Note: c Edit
The rising trend of cannabis potency is gradual, fairly unspectacular, and driven largely by the increased availability of domestic, intensively grown indoor herbal cannabis.Read more at location 3647
Note: c Edit
‘Twenty-five times stronger’, remember. Repeatedly, and on the front page. If you were in the mood to quibble with the Independent’s moral and political reasoning, as well as its evident and shameless venality, you could argue that intensive indoor cultivation of a plant which grows perfectly well outdoors is the cannabis industry’s reaction to the product’s illegality itself.Read more at location 3653
Note: c Edit
In the mid-1980s, during Ronald Reagan’s ‘war on drugs’ and Zammo’s ‘Just say no’ campaign on Grange Hill, American campaigners were claiming that cannabis was fourteen times stronger than in 1970. Which sets you thinking. If it was fourteen times stronger in 1986 than in 1970, and it’s twenty-five times stronger today than at the beginning of the 1990s, does that mean it’s now 350 times stronger than in 1970? That’s not even a crystal in a plant pot. It’s impossible.Read more at location 3664
Note: c Edit
Cocaine floods the playgroundRead more at location 3670
Note: t Edit
The Times in March 2006Read more at location 3671
‘Cocaine Floods the Playground’.Read more at location 3671
‘Use of the addictive drug by children doubles in a year,’ said the subheading. Was this true?Read more at location 3672
Note: L ARTICOLO Edit
If you read the press release for the government survey on which the story is based, it reports ‘almost no change in patterns of drug use, drinking or smoking since 2000’. But this was a government press release, and journalists are paid to investigate:Read more at location 3673
Note: UFFICIALITÀ E GIORNALISMO INVEST Edit
You can download the full document online. It’s a survey of 9,000 children, aged eleven to fifteen, in 305 schools. The three-page summary said, again, that there was no change in prevalence of drug use. If you look at the full report you will find the raw data tables: when asked whether they had used cocaine in the past year, 1 per cent said yes in 2004, and 2 per cent said yes in 2005. So the newspapers were right: it doubled? No. Almost all the figures given were 1 per cent or 2 per cent.Read more at location 3676
Note: LA FONTE DOCUMENTALE DELL ARTICOLO Edit
Note: 1% E 2% NON È UN RADDOPPIO Edit
The actual figures were 1.4 per cent for 2004, and 1.9 per cent for 2005, not 1 per cent and 2 per cent.Read more at location 3681
Note: L ARROTONDAMENTO PERDUTO Edit
What we now have is a relative risk increase of 35.7 per cent, or an absolute risk increase of 0.5 per cent. Using the real numbers, out of 9,000 kids we have about forty-five more saying ‘Yes’ to the question ‘Did you take cocaine in the past year?’ Presented with a small increase like this, you have to think: is it statistically significant?Read more at location 3683
Note: TRADUCIAMO IN RISCHIO Edit
What does ‘statistically significant’Read more at location 3686
Note: t Edit
It’s just a way of expressing the likelihood that the result you got was attributable merely to chance. Sometimes you might throw ‘heads’ five times in a row, with a completely normal coin, especially if you kept tossing it for long enough.Read more at location 3686
Note: SIGNIFICANZA STATIST Edit
The standard cut-off point for statistical significance is a p-value of 0.05, which is just another way of saying, ‘If I did this experiment a hundred times, I’d expect a spurious positive result on five occasions, just by chance.’Read more at location 3689
Note: c Edit
To ‘data mine’, taking it out of its real-world context, and saying it is significant, is misleading. The statistical test for significance assumes that every data point is independent, but here the data is ‘clustered’, as statisticians say. They are not data points, they are real children, in 305 schools. They hang out together, they copy each other, they buy drugs from each other, there are crazes, epidemics, group interactions.Read more at location 3695
Note: MA ATTENZIONE: SS ASSUME CHE I CASI OSSERVATI SIANO INDIPENDENTI E NN CONNESSI COME LO SONO NEL MONDO REALE. È TALMENTE VERO CHE LE REPLICHE NN RISPETTANO CERTO IL 5% CANONICO Edit
The increase of forty-five kids taking cocaine could have been a massive epidemic of cocaine use in one school,Read more at location 3698
Note: ESEMPIO DI COME L ASSUNTO NN TIENE Edit
As statisticians would say, you must ‘correct for clustering’. This is done with clever maths which makes everyone’s head hurt. All you need to know is that the reasons why you must ‘correct for clustering’ are transparent, obvious and easy, as we have just seenRead more at location 3704
Note: CORREGGERE CON IL CLUSTERING Edit
When you correct for clustering, you greatly reduce the significance of the results.Read more at location 3707
Note: c Edit
Will our increase in cocaine use, already down from ‘doubled’ to ‘35.7 per cent’, even survive? No. Because there is a final problem with this data: there is so much of it to choose from. There are dozens of data points in the report: on solvents, cigarettes, ketamine, cannabis, and so on. It is standard practice in research that we only accept a finding as significant if it has a p-value of 0.05 or less. But as we said, a p-value of 0.05 means that for every hundred comparisons you do, five will be positive by chance alone. From this report you could have done dozens of comparisons, and some of them would indeed have shown increases in usage—but by chance alone, and the cocaine figure could be one of those.Read more at location 3708
Note: ALTRO PROB: QUANDO TESTI MOLTE RELAZIONI LA PROB CHE ALCUNE SIANO POSOTIVE PER PURO CASO AUMENTA E QUINDI LA TENTAZIONE DI ASSUMERLE SCARTANDO LE ALTRE. IL METODO SCIENTIFICO IMPONE INVECE DI FORMULARE PRIMA LE IPOTESI E POI VERIFICARE Edit
If you roll a pair of dice often enough, you will get a double six three times in a row on many occasions.Read more at location 3713
Note: SE LANCI MOLTE VOLTE AVRAI DELLE SEQUENZE STRANE DA IMPUTARE AL CASO Edit
This is why statisticians do a ‘correction for multiple comparisons’, a correction for ‘rolling the dice’ lots of times. This, like correcting for clustering, is particularly brutal on the data, and often reduces the significance of findings dramatically.Read more at location 3714
Note: CORREZIONE DELLE COMPARAZIONI MULTIPLE. CORREZIONE DI DEONTOLOGIA METODOLOGICA PIÙ CHE FATTUALE. Edit
Data dredgingRead more at location 3716
Bonferroni’s correction for multiple comparisons.Read more at location 3718
OK, back to an easy oneRead more at location 3724
Note: t Edit
There are also some perfectly simple ways to generate ridiculous statistics, and two common favourites are to select an unusual sample group, and to ask them a stupid question. Let’s say 70 per cent of all women want Prince Charles to be told to stop interfering in public life. Oh, hang on—70 per cent of all women who visit my website want Prince Charles to be told to stop interfering in public life.Read more at location 3724
Note: I CAMPIONI MAL SELEZIONATI. UNA PIAGA Edit
selection bias:Read more at location 3728
Telegraph in the last days of 2007. ‘Doctors Say No to Abortions in their Surgeries’ was the headline. ‘Family doctors are threatening a revolt against government plans to allow them to perform abortions in their surgeries,Read more at location 3729
Note: ES: DISP A FARE ABORTI Edit
‘Four out of five GPs do not want to carry out terminations even though the idea is being tested in NHS pilot schemes, a survey has revealed.’Read more at location 3731
Note: c Edit
It was an online vote on a doctors’ chat site that produced this major news story. Here is the question, and the options given:   ‘GPs should carry out abortions in their surgeries’ Strongly agree, agree, don’t know, disagree, strongly disagree.Read more at location 3734
Note: LA FONTE DOCUMENTALE Edit
Is that ‘should’ as in ‘should’? As in ‘ought to’?Read more at location 3737
Note: DUBBI SULLA FORMULA Edit
Are they just saying no because they’re grumbling about more work and low morale? More than that, what exactly does ‘abortion’ mean here?Read more at location 3739
Note: c Edit
Beating you upRead more at location 3753
Note: t Edit
In 2006, after a major government report, the media reported that one murder a week is committed by someone with psychiatric problems. Psychiatrists should do better, the newspapers told us, and prevent more of these murders. All of us would agree,Read more at location 3755
Note: GLI OMICIDI DEGLI PSICOPATICI. NN SI NPOTREBBE FERMARLI PRIMA Edit
the blood test for HIV has a very high ‘sensitivity’, at 0.999. That means that if you do have the virus, there is a 99.9 per cent chance that the blood test will be positive. They would also say the test has a high ‘specificity’ of 0.9999—so, if you are not infected, there is a 99.99 per cent chance that the test will be negative. What a smashing blood test.* But if you look at it from the perspective of the person being tested, the maths gets slightly counterintuitive. Because weirdly, the meaning, the predictive value, of an individual’s positive or negative test is changed in different situations, depending on the background rarity of the event that the test is trying to detect. The rarer the event in your population, the worse your test becomes, even though it is the same test. This is easier to understand with concrete figures. Let’s say the HIV infection rate among high-risk men in a particular area is 1.5 per cent. We use our excellent blood test on 10,000 of these men, and we can expect 151 positive blood results overall: 150 will be our truly HIV-positive men, who will get true positive blood tests; and one will be the one false positive we could expect from having 10,000 HIV-negative men being given a test that is wrong one time in 10,000. So, if you get a positive HIV blood test result, in these circumstances your chances of being truly HIV positive are 150 out of 151. It’s a highly predictive test. Let’s now use the same test where the background HIV infection rate in the population is about one in 10,000. If we test 10,000 people, we can expect two positive blood results overall. One from the person who really is HIV positive; and the one false positive that we could expect, again, from having 10,000 HIV-negative men being tested with a test that is wrong one time in 10,000. Suddenly, when the background rate of an event is rare, even our previously brilliant blood test becomes a bit rubbish. For the two men with a positive HIV blood test result, in this population where only one in 10,000 has HIV, it’s only 50:50 odds on whether they really are HIV positive.Read more at location 3762
Note: L INGANNO DELLA FREQ DI ASE E DEI FALSI POSITIVI Edit
Let’s think about violence. The best predictive tool for psychiatric violence has a ‘sensitivity’ of 0.75, and a ‘specificity’ of 0.75. It’s tougher to be accurate when predicting an event in humans, with human minds and changing human lives.Read more at location 3778
Note: GLI PSICHIATRI HANNO TEST CON FALSI POSITIVI NOTEVOLI. IMPOSSIBILE AGIRE CON I POTENZIALI OMICIDI DATE LE FREQ DI BASE Edit
Let’s say 5 per cent of patients seen by a community mental health team will be involved in a violent event in a year. Using the same maths as we did for the HIV tests, your ‘0.75’ predictive tool would be wrong eighty-six times out of a hundred. For serious violence, occurring at 1 per cent a year, with our best ‘0.75’ tool, you inaccurately finger your potential perpetrator ninety-seven times out of a hundred. Will you preventively detain ninety-seven people to prevent three violent events?Read more at location 3780
Note: QUALCHE CALCOLO Edit
Locking you upRead more at location 3790
Note: t Edit
In 1999 solicitor Sally Clark was put on trial for murdering her two babies.Read more at location 3790
Note: IL CASO CLARK Edit
At her trial, Professor Sir Roy Meadow, an expert in parents who harm their children, was called to give expert evidence. Meadow famously quoted ‘one in seventy-three million’ as the chance of two children in the same family dying of Sudden Infant Death Syndrome (SIDS).Read more at location 3792
Note: c LA PROVA DELLA COLPEVOLEZZA Edit
The ecological fallacyRead more at location 3798
Note: t Edit
The figure of ‘one in seventy-three million’ itself is iffy, as everyone now accepts. It was calculated as 8,543 × 8,543, as if the chances of two SIDS episodes in this one family were independent of each other. This feels wrong from the outset, and anyone can see why: there might be environmental or genetic factors at play, both of which would be shared by the two babies.Read more at location 3798
Note: CERTI FATTI NN SONO INDIPENDENTI E IL CONTEMPORANEAMENTE NN SI RENDE FATTORIZZANDO Edit
The prosecutor’s fallacyRead more at location 3803
Note: t Edit
Many press reports at the time stated that one in seventy-three million was the likelihood that the deaths of Sally Clark’s two children were accidental: that is, the likelihood that she was innocent.Read more at location 3804
Note: PERCENTUALE DI INNOCENZA? Edit
Once this rare event has occurred, the jury needs to weigh up two competing explanations for the babies’ deaths: double SIDS or double murder. Under normal circumstances—before any babies have died—double SIDS is very unlikely, and so is double murder.Read more at location 3808
Note: LA FALLACIA DELL ACCUSATORE Edit
If we really wanted to play statistics, we would need to know which is relatively more rare, double SIDS or double murder. People have tried to calculate the relative risks of these two events, and one paper says it comes out at around 2:1 in favour of double SIDS.Read more at location 3810
Note: c Edit
the rarity of double SIDS is irrelevant, because double murder is rare too.Read more at location 3817
Note: c Edit
Losing the lotteryRead more at location 3824
Note: t Edit
You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing… Richard FeynmanRead more at location 3825
Note: RF. IL CASO SORPRENDENTE A POSTERIORI Edit
A nurse called Lucia de Berk has been in prison for six years in Holland, convicted of seven counts of murder and three of attempted murder. An unusually large number of people died when she was on shift, and that, essentially, along with some very weak circumstantial evidence, is the substance of the case against her.Read more at location 3829
Note: L INFERMIERA ASSASSINA Edit
The judgement was largely based on a figure of ‘one in 342 million against’.Read more at location 3832
Note: c Edit
It’s only weird and startling when something very, very specific and unlikely happens if you have specifically predicted it beforehand.Read more at location 3836
Note: LE PREVISIONI SI FANNO PRIMA. UNA COSA RARA È VERAMENTE RARA SE IMPREVEDIBILE Edit
Imagine I am standing near a large wooden barn with an enormous machine gun. I place a blindfold over my eyes and—laughing maniacally—I fire off many thousands and thousands of bullets into the side of the barn. I then drop the gun, walk over to the wall, examine it closely for some time, all over, pacing up and down. I find one spot where there are three bullet holes close to each other, then draw a target around them, announcing proudly that I am an excellent marksman.Read more at location 3838
Note: L UOMO CHE DISGNAVA BERSAGLI INTONO AI FORI DELLE PALLOTTOLE Edit
a cardinal rule of any research involving statistics: you cannot find your hypothesis in your results.Read more at location 3844
Note: PRIMA LE IPOTESI POI IL RISULTATO Edit
a rather complex, philosophical, mathematical form of circularity:Read more at location 3847
To collect more data, the investigators went back to the wards to see if they could find more suspicious deaths. But all the people who were asked to remember ‘suspicious incidents’ knew that they were being asked because Lucia might be a serial killer. There was a high risk that ‘an incident was suspicious’ became synonymous with ‘Lucia was present’.Read more at location 3848
Note: I RISCHI DELL INDAGINE A RITROSO Edit