Visualizzazione dei post in ordine di pertinenza per la query koretz. Ordina per data Mostra tutti i post
Visualizzazione dei post in ordine di pertinenza per la query koretz. Ordina per data Mostra tutti i post

mercoledì 2 luglio 2008

Test e Rolling Stones

Pur caldeggiando l' introduzione di test e classifiche nelle nostre scuole, sono consapevole dei limiti di questo strumento. La lettura di Koretz in questo senso è illuminante.

Il test high stake è una roba seria. E' una roba sulla base della quale si distribuiscono i finanziamenti e si scaglionano le carriere. Dobbiamo quindi essere consapevoli sia della loro necessità, sia dei loro limiti.

Ne sintetizzo una dozzina tanto per capirsi.

Innanzitutto un buon test è difficile e costoso da costruire. E quando bisogna tirare la cinghia anche questo conta. Si rischia di ripiegare su cio' che sembra un po' inferiore ma in realtà è del tutto inservibile. Anche perchè la soglia tra il top e la robaccia sta molto vicino al top.

Il test è un sondaggio e la costruzione di un campione corretto è tutt' altro che scontata, così come è difficile individuare delle proxy affidabili.

A volte tanto lavorio si rivela vano.

Se non fosse così non si capirebbe come mai, secondo il PISA, gli studenti USA sopravanzano quelli norvegesi, mentre secondo il TIMSS sia vero il contrario. I due test sono molto rigorosi, peccato vengano sempre presentati senza enfatizzare la grande e inevitabile deviazione standard. Si scoprirebbe che ordinare sulle competenze matematiche norvegesi e americani è insignificante. Soldi buttati?

Cio' non toglie che gli studenti giapponesi apprendano la matematica meglio di americani e norvegesi. Lo dicono i test, ma questa volta lo dicono in modo chiaro.

Oltretutto molte virtù dello studente sfuggono ai test.

Posso conoscere l' algebra ma non sapere quando applicare queste conoscenze. Il test difficilmente segnala lacune del genere.

Altro inconveniente: un prof. puo' eccellere come motivatore. Se la sua carriera dipendesse unicamente dai test rischierebbe grosso.

Il test incentiva i prof a fare meglio, lo dicono tutti. Vero, li incentiva anche a barare però.

A barare materialmente durante la prova, innanzitutto.

Andiamoci a rileggere il primo capitolo di Frekeconomics dove l' economista investigatore risale ai prof disonesti studiando la topologia random degli errori. E' uno spasso ma è anche istruttivo.

E teniamo conto di una cosa: il numero di insegnanti "bari" insediati nel distretto scolastico di Chicago è nella media nazionale, ma la qualità professionale di chi dà loro la caccia laggiù, eccede di gran lunga quella media.

Gli onesti barano invece fornendo preparazioni mirate, in molti casi è possibile. Cio' distorce l'esito poichè quel test è tarato per misurare a campione una preparazione più ampia.

Per giudicare un prof bisogna considerare i "miglioramenti" rispetto al test d' ingresso. L' esperienza degli hight stake spesso ci dice che i miglioramenti sono strepitosi. Purtroppo sono anche molto inaffidabili in quanto dovuti a preparazioni mirate.

Anche le condizioni in cui un test viene somministrato contano. I casi di incoerenza negli esiti si sprecano e per lo più sono dovuti proprio a questa variabile.

Neutralizzare questa variabile è estremamente costoso. Spesso si fa prima rinunciando al test.

Poi c' è l' uso improprio. L' esperienza concreta insegna che test costruiti con certe finalità vengono poi utilizzati per altre che al profano sembrano simili. Chi li maneggia vuole risparmiare senza rendersi conto delle distorsioni che cio' procura.

Le School Chart dei vari sistemi scolastici americani sono un caso che Koretz descrive nel dettaglio.

La preparazione di un allievo dipende dalla qualità della scuola. Ma dipende anche dal contesto che lo ospita (famiglia, amici...). Per classificare le scuole bisogna fare la tara. Compito improbo! Chiedere a chi stima il cosiddetto SES (social economic status). Koretz dedica un capitolo all' acrostico.

L' esito di un test deve essere reso con una scala adeguata. Spesso quando tutto è stato fatto bene, quando il percorso sembra netto, s' inciampa rovinosamente nell' ultimo ostacolo.

***


Oggi nella scuola e tra i prof vige un egalitarismo ingiusto. I test aumenteranno di molto le diseguaglianze e manterranno elementi di ingiustizia. Il gioco vale la candela? Per me sì, ma se giudico dalla cultura sindacalese che impregna l' istituzione che più soffre l' ombra lunga del sessantotto, mi vengono i brividi.

Per me sì soprattutto se i test non saranno l' unico indicatore per giudicare la scuola (ecco alcune variabili alternative: profitto universitario degli alunni di provenienza, indicatori oggettivi sulle strutture, esami diretti ai professori, acquisizioni charter delle scuole low school, autonomia e competizione attraverso i vouchers tra istituti in presenza di forti college premium, test tarati con il SES, retta libera per le scuole high score...).

Che atteggiamento assumere dunque nei confronti dei test? Personalmente mi adeguo al principio "Rolling Stones". In molti non troveranno nei test mai cio' che cercano e sognano, cio' non toglie che potrebbero trovare ugualmente cio' di cui hanno un dannato bisogno.

"... No, you can't always get what you want... but if you try sometime... you find
You get what you need..."


... così almeno ho la scusa per riascoltarmi il pezzo.


add: anche Israel dubita: http://gisrael.blogspot.com/2010/12/la-scuola-fa-schifo-e-se-fosse-ottima.html

sabato 25 febbraio 2017

la miglior scuola in città

“Scusi Dr. Koretz, puo’ dirmi per cortesia la miglior scuola in città a cui iscrivere mio figlio?”
E’ questa la domanda che si sente fare tutti i giorni Daniel Koretz.
Poiché per vivere valuta le scuole attraverso i test scolastici – il suo libro “Measuring Up è una Bibbia – la cosa non desta meraviglia.
Ma la sua risposta delude quasi sempre.
Di solito invita a valutare
… the strength of the school’s music or athletic programs, some special curricular emphasis, school size, social heterogeneity, and so on…
Poi consiglia di visitare di persona le scuola per valutare se sembrano posti promettenti.
Osservare e descrivere, dunque. Una roba faticosa.
Il genitore che ha interpellato Koretz lo congeda velocemente e freddamente, è palpabile la sua insoddisfazione, vuole qualcosa di meno complicato da un progettista di test. Qualcosa di meno ambiguo. Per esempio la scuola che fa meglio nei test…
… They wanted something simpler: the names of the schools with the highest test scores…
C’è una risposta standard da dare a questi scocciatori…
… “If all you want is high average test scores, tell your realtor that you want to buy into the highest-income neighborhood you can manage. That will buy you the highest average score you can afford.”…
Segui il denaro: più si paga, più i test sono migliori. Andate nei quartieri a più alto reddito medio e lì troverete le scuole che fanno meglio nei test.
Il nervosismo è frutto di un’incomprensione: c’è chi crede che conoscere l’esito di un test ci dica l’essenziale su uno studente o  una scuola.
Un’altra credenza malriposta è che progettare e somministrare un test sia una cosa semplice: detto, fatto.
Le parole del Presidente Bush presentando il programma “No Child Left Behind” tradiscono questa credenza…
… “A reading comprehension test is a reading comprehension test. And a math test in the fourth grade—there’s not many ways you can foul up a test … It’s pretty easy to ‘norm’ the results.”…
Sbagliato: non c’è niente di più facile che “sporcare” un test e renderlo inutile, nella fortunata ipotesi che il test non sia già fallato di per sé.
I test sembrano semplici ma sono difficilissimi da preparare e somministrare. Farlo in massa è praticamente impossibile.
Ormai si parla dei test scolastici anche al bar
… For many years, Parade magazine has featured a regular column by Marilyn vos Savant, who is declared by the magazine to have the highest IQ in the country. Rather than simply saying that Ms. vos Savant is one damned smart person, if indeed she is, the editors use the everyday vocabulary of “IQ”…
Ma pochi frequentatori di bar sanno cos’è l’ IQ e come si testa? C’è da dubitarne, il concetto non è affatto immediato.
Altro mito: credere che i test siano indicatori potentissimi
… it is just another way of saying that she is smart. But it does seem to give the assertion more weight, a patina of scientific credibility…
Sarebbe molto più appropriato dire che Tizio è un tipo intelligente (come facevano i nostri nonni) che far riferimento al suo IQ.
***
Cosa complica maledettamente le cose?
Innanzitutto il fatto che i test siano moltissimi, praticamente infiniti.
Non esiste un test che ci dia un’immagine completa del lavoro fatto da una scuola. E nemmeno tutti i test messi insieme riescono nell’impresa.
Innanzitutto perché considerano solo un sottoisieme degli scopi educativi. Poi perché non sono misurazione diretta di qualcosa ma semplici stime che utilizzano campionature.
Un test scolastico è come un sondaggio. Si guarda a poche cose per farsi un’idea del tutto.
***
Un problema dei test è la loro frequente invalidità: si presenta quando due test in teoria equivalenti danno esiti diversi. Un esempio:
… For example, for more than three decades the federal government has funded a large-scale assessment of students nationwide called the National Assessment of Educational Progress, often simply labeled NAEP (pronounced “nape”), which is widely considered the best single barometer of the achievement of the nation’s youth. There are actually two NAEP assessments, one (the main NAEP) designed for detailed reporting in any given year, and a second designed to provide the most consistent estimates of long-term trends. Both show that mathematics achievement has been improving in both grade four and grade eight—particularly in the fourth grade, where the increase has been among the most rapid nationwide changes in performance, up or down, ever recorded. But the upward trend in the main NAEP has been markedly faster than the improvement in the long-term-trend NAEP. Why? Because the tests measure mathematics somewhat differently,…
Invalsi, Pisa, Timss… le graduatorie su questo e quello cambiano sempre.
Cambiano anche nel tempo. Quando un test ha conseguenze sostanziali (carriera, stipendi…), guarda caso, i miglioramenti sono iperbolici. L’esempio del Texas…
… The experience in Texas during George Bush’s tenure as governor provides a good illustration. At that time, the state used the Texas Assessment of Academic Skills (TAAS) to evaluate schools, and high-school students were required to pass this test in order to receive a diploma. Texas students showed dramatically more progress on the TAAS than they did on the National Assessment of Educational Progress…
Ma si tratta di miglioramenti ben poco rassicuranti, in genere frutto della pratica “teaching to test”.
***
C’è poi un problema di attendibilità: studenti che fanno due volte lo stesso test ottenendo risultati differenti.
Il SAT si somministra più volte, per esempio. Ma non sempre è possibile, specie se la massa degli studenti è cospicua.
Molti test progettati per essere equivalenti hanno contenuti diversi (è ovvio, non si puo’ sottoporre lo stesso identico test), ma i contenuti non sono mai neutrali.
Parte della fluttuazione è dovuta dallo stato di forma dell’allievo. Magari il soggetto è nervoso o ha dormito poco.
Non ha senso dare grande peso a piccole differenze.
***
Poi ci sono i problemi di scala: come riportare gli esiti?
Noi siamo abituati con i voti: una scala arbitraria che rende impossibili i confronti…
… We know that to obtain a grade of “A” can require much more in one class than in another…
Ma non è facile superare questi limiti: scale diverse danno rappresentazioni diverse della performance e la cosa limita comunque i confronti.
***
Poi c’è il problema dei test lacunosi (o fallati): sono i test che non funzionano come dovrebbero.
Esempio di test fallato in sfavore degli immigrati
… For example, a mathematics test that requires reading complex text and writing long answers may be biased against immigrant students who are competent in mathematics but have not yet achieved fluency in English…
Qui si pongono problemi: se un test è perfettamente neutrale risulta fallato per i poveri. Che fare? la cosa crea imbarazzo…
… For instance, if poor students in a given city attend inferior schools, a completely unbiased test is likely to give them lower scores because the inferior teaching they received impeded their learning…
E che dire dei test fallati contro le donne? Qui si entra in questioni filosofiche. Il fatto è che il test discrimina: lo facciamo proprio per poter discriminare!
***
Poi c’è un problema di settaggio: un test deve essere mirato al suo scopo, di solito più angusto di quel che si crede.
Per esempio, voglio valutare la scuola o gli studenti? Occorrono test differenti a seconda dell’obbiettivo…
… For example, the assessment designs that are best for providing descriptive information about the performance of groups (such as schools, districts, states, or even entire nations) are not suitable for systems in which the performance of individual students must be compared. Adding large, complex, demanding tasks to an assessment may extend the range of skills you can assess, but at the cost of making information about individual students less trustworthy….
***
Riassumiamo i cinque problemi chiave: invalidità, attendibilità, rappresentazione, lacunosità e settaggio.
Si tratta di problemi che richiedono soluzioni complicate e fragili. Purtroppo, c’è sempre chi tende ad associare le complicazioni al trascurabile.
***
Ma poi ci sono almeno un paio di problemi ancora più importanti, vediamoli.
Cos’è un test? Essenzialmente un sondaggio.
Per risolvere un certo problema, per esempio, noi attiviamo 1000 abilità differenti ma solo la misurazione di alcune è fattibile. Tra queste è necessario selezionare un campione rappresentativo della totalità. Se sbagliamo campione, il test si puo’ buttare.
La logica dei test è la medesima dei sondaggi…
… ON SEPTEMBER 10, 2004, a Zogby International poll of 1,018 likely voters showed George W. Bush with a 4-percentage-point lead over John Kerry in the presidential election campaign. These results were a reasonably good prediction: Bush’s margin when he won two months later was about 2.5 percent…
A volte sondaggi del genere falliscono miseramente: un esempio storico è la corsa Dewey vs Truman. Ma anche di recente Trump e Brexit.
Eppure non possiamo farne a meno, di solito ci prendono. Una cosa è certa: la bontà del sondaggio dipende dal campione prescelto. Ma anche da come sono poste le domande. Esempio…
… Original question: “What is the average number of days each week you have butter?” Revised question: “The next question is just about butter. Not including margarine, what is the average number of days each week you have butter?”…
Questo qui sopra è il caso di due domande equivalenti a cui si è risposto in modo molto diverso.
Poi conta la voglia di rispondere in modo onesto. Ci sono domande che incentivano la “disonestà”; se chiedo a un tale quanto guadagna magari costui non ha voglia di dirmelo.
Onnipresente poi è il “social desirability bias”, ovvero la voglia di compiacere l’intervistatore dicendo la “cosa giusta”. Nei sondaggi nessuno è razzista o sessista, e tutti fanno volontariato…
… For example, a study published in 1950 documented substantial overreporting of several different types of socially desirable behavior. Thirty-four percent of respondents reported that they had contributed to a specific local charity when they had not, and 13 to 28 percent of respondents claimed to have voted in various elections in which they had not…
I test scolastici sono sondaggi e hanno dunque tutte le pecche dei sondaggi…
… Educational achievement tests are in many ways analogous to this Zogby poll in that they are a proxy for a better and more comprehensive measure that we cannot obtain… The full range of skills or knowledge about which the test provides an estimate—analogous to the votes of the entire population of voters in the Zogby survey—is generally called the domain by those in the trade…
***
Ma cosa misuriamo esattamente in un test scolastico? Quanto è rappresentativo il campione prescelto?
Qui comincia la diatriba che divide. Ci sono i critici
… there are some aspects of the goals of education that achievement tests are unable to measure…
E ci sono gli entusiasti…
… Tests measure what is important, their argument goes, and those who focus on other “goals” are softies…
I critici hanno molte frecce al loro arco, non si puo’ non riconoscere dei limiti alla capacità di quantificare l’istruzione passata nel discente.
A dirlo non è il sindacalista anti-meritocratico ma un padre della psicometria come E. F. Lindquist in un articolo dove oltre mezzo secolo fa c’era già tutto: “Preliminary Considerations in Objective Test Construction”.
Lindquist anticipò le controversie attuali affermando che gli scopi educativi sono vari e solo alcuni possono essere standardizzati.
Esempio di scopi non standardizzabili: la voglia di apprendere. Oppure: l’abilità nell’applicare in modo pertinente cio’ che si è appreso.
L’ esperienza ci dice che i test misurano variabili di grande importanza. Ma altre – non meno importanti - sono inevitabilmente trascurate.
Un esempio di atteggiamento accorto
… ITBS manual advises school administrators explicitly to treat test scores as specialized information that is a supplement to, not a replacement for, other information about students’ performance….
C’è poi un’altra lacuna…
… Second, Lindquist argued that even many of the goals of schooling that are amenable to standardized testing can be assessed only in a less direct fashion than we would like
Lo scopo dell’istruzione è troppo lontano e generico per capire se stiamo misurando le variabili giuste.
Per esempio, perché insegniamo l’algebra? Un’ipotesi…
… to teach students how to reason algebraically so that they can apply this reasoning to the vast array of circumstances outside of school to which it is relevant. This sort of very general goal, however, is remote from decisions about the algebra content to be taught in a given middle school this Thursday morning… curriculum designers and teachers must make a large number of specific decisions about what algebra to teach. For example, do students learn to factor quadratic equations? Many considerations shape these decisions, not just a subject’s possible utility in a wide range of work-related and other contexts years later…
Ma è un’ipotesi vaga: si rischia di misurare abilità che non verranno mai chiamate in causa o attivate dal soggetto.
Si possono imparare tante cose ma se poi non si sarà in grado di capire quando e come usare cio’ che si è imparato? Un aneddoto gustoso
… Many years ago, I had Sunday brunch in Manhattan with three New Yorkers. All were highly educated, and all had taken at least one or two semesters of mathematics beyond high school. In my experience, New York natives make their way about town in part by drawing on a prodigious knowledge of the location of various landmarks, such as the original Barnes and Noble store on Fifth Avenue. That Sunday morning, I found to my surprise that none of the three New Yorkers could figure out the location of the restaurant where we were to have brunch. It was on one of the main avenues, and they knew the address, but they could not figure out the cross street. I suggested that the problem might turn out to be a very simple one. I asked if they knew where the addresses on the avenues in that part of Manhattan reached zero and, if so, whether they reached zero at the same street. They quickly agreed that they did and gave me the name of the cross street. I then asked if the addresses increased at the same rate on these avenues, and if so, at what rate. That is, how many numbers did the addresses increase with each cross street? They were quite certain that the rate was the same, but it took a little more work to figure out what it was. Using a few landmarks they knew (including the original Barnes and Noble store), they figured out the rate for a couple of avenues. The rates were the same. At that point, they had the answer, although they had not yet realized it…
Per orientarsi gli studenti avrebbero dovuto risolvere una semplice equazione di promo grado. Non lo hanno capito, anche se di solito all’università risolvevano problemi matematici enormemente più difficili…
… All three were competent in dealing with algebra much more complex than this, but they had not developed the habit of thinking of real-world problems in terms of the mathematics they had learned in the classroom…
Nel mondo ideale dovremmo valutare le persone osservandole direttamente all’opera sui problemi che saranno chiamati ad affrontare anche dopo, ma i test scolastici sono lontanissimi dal mondo ideale della valutazione, ci si arrabatta quindi in qualche modo…
… a test author usually has to focus on the proximate goals of educators, even if these are only proxies for the ultimate social goals of education…
Lindquist raccomandava di testare le conoscenze specifiche
… Lindquist wanted as much as practical to isolate specific knowledge… tests to include tasks that focus narrowly on these specifics… attempting to create test items that present complex, “authentic” tasks more similar to those students might encounter out of school…
La tendenza è stata di segno opposto.
***
Come si puo’ concludere sulla base di queste considerazioni?
Che i test sono uno strumento utile ma incompleto.
Che è temerario abbinare all’esito dei test conseguenze così importanti come lo stipendio o la carriera (test high stake).
Che i giudizi vanno espressi tenendo conto dei test ma non solo (una componente tra le altre). Un po’ come fanno le migliori università
… they conduct a “holistic” review of applicants, considering not only SAT or ACT scores but also grades, personal statements, persistence in extracurricular activities, and so on…
studying

giovedì 7 ottobre 2010

Limitarsi ai test in entrata

I test somministrati agli studenti delle scuole possono essere invalidi e/o inaffidabili.

Attenzione a non confondere, faccio solo due esempi per chiarirmi.

Quando pesavamo la Marghe avevamo una bilancia che indicava delle cifre sballate (invalide), non avevamo nessuna idea di quanto pesasse realmente la cucciolotta. Ma questo non ci importava, sapevamo infatti che la bilancia era affidabile e i vari pesi riscontrati nel tempo erano confrontabili. Era sufficiente.

Chissà cosa misura il QI. Di sicuro è affidabile. In altri termini: il QI misura qualcosa di preciso, non si sa se sia l’ intelligenza di una persona ma è comunque qualcosa di ben identificato.

Morale: un test inaffidabile è anche invalido ma non viceversa.

Un test invalido non misura l’ oggetto corretto. Un test inaffidabile commette degli errori di misura.

Ma i test possono essere soggetti ad altre distorsioni (bias). Se il candidato si emoziona rende meno, per esempio. Una preparazione mirata ai test è truffaldina in certi contesti.. Per non parlare dell’ “aiutino” che puo’ provenire da un “somministratore interessato”..

Anche elaborare una “scala” che dia rappresentazione fedele dei risultati ottenuti è pressoché impossibile.

Daniel Koretz è l’ uomo migliore per sviscerare al meglio le quattro difficoltà.

Io mi son fatto l’ idea che i test debbano essere impiegati all’ inizio piuttosto che alla fine del “trattamento” scolastico. Ma tale dovrebbe essere l’ uso di ogni esame. Il giudizio finale lo dà solo l’ ambiente destinato ad ospitare l’ “educato”.

Daniel Koretz – Measuring up – Harvard press

martedì 24 giugno 2008

Le rette nella scuola pubblica

Misurare con un indicatore quantitativo la bontà di un istituto scolastico è compito impervio.

Dice: bisogna individuare il differenziale di preparazione dell' allievo da quando entra in quella scuola rispetto a quando esce.

Ma la "preparazione" include anche elementi incommensurabili.

Si puo' benissimo convenire senza arruolarsi tra gli anti-testmen barrcaderi.

Molto meglio, in questi casi, assumere nei confronti del test quello che Koretz chiama "il Principio Rolling Stones":

"... No, you can't always get what you want... but if you try sometime... you find
You get what you need..."

Fiduciosi proseguimo alla ricerca della bisogna e subito altre barriere si frappongono.

Inanzitutto l' atteggiamento leggermente fraudolento di alcuni istituti che va sotto il nome di inflazione da test. Consiste nell' organizzano unicamente in funzione dei test trascurando altri aspetti della preparazione.

In secondo luogo la curva di progressione: i miglioramenti non procedono linearmente, se si parte da livelli alti non sarà facile migliorare molto.

In terzo luogo il contesto (famiglia, amici) continua ad influenzare le prestazioni dell' allievo anche durante la frequenza scolastica.

Per noi ottimisti gli ostacoli sono superabili, si tratta solo di prendere una bella rincorsa. Nel primo caso potremmo ricorrere all' impiego random di più misuratori, negli altri casi basta stimare delle "tare" opportune.

Ad ogni modo, anche così viziati, i misuratori potrebbero avere un impiego alternativo: autorizzare le scuole eccellenti in termini assoluti a fissare una retta per gli allievi che le frequentano. In fondo la misura assoluta ci esenta dallo sgravio delle "tare".

E poi non è detto che la facoltà di una "retta" anche nel pubblico minacci le pari opportunità introducendo discriminazioni economiche: se il contesto conta, i frequentatori perverranno da famiglie agiate. Se conta meno, il preside userà il pedale della "retta" stando ben attento a non mettere in fuga un' utenza che gli dà questa opportunità di raccogliere finanziamenti aggiuntivi.

mercoledì 28 giugno 2017

Perché ci sono così poche economiste?

What is the Right Number of Women? Hints and Puzzles from Cognitive Ability Research Garett Jones
***
Parte introduttiva
There is no consensus as to the causes of women’s slow advancement in academic economics. Even after adjusting for factors representing family background or productivity a considerable portion of the gender promotion gap remains unexplained.
IL PROBLEMA
here i focus on the possibility that the low representation of women in economics is partially driven by genetic differences in tastes and abilities between the sexes,
TESI
Particularly in a field like academia, where essentially all employees are above the mean in abilities, variances are likely to be important.
CRUCIALE LA VARIANZA
Some useful surveys include Munger (2007), Allen and Gorski (2002), Zup and Forger( 2002), Pinker (2002), and especially Hyde (2005) and Cahill (2006); the most prominent rebuttal of the views expressed by those authors is Spelke (2005).
LETTERATURA

the combination of analogies from other mammals, early childhood studies, well-documented impacts of sex hormones on brain structure, and the repeated finding of higher means and variances in relevant mental abilities (especially mathematical abilities) in males point toward the very real possibility that men and women differ genetically
I FATTORI RILEVANTI PER LA CONTROVERSIA
evolution as a reason for soft priors
Adaptationism— the concept that gene-carriers quickly adapt to their surrounding circumstances— is at the heart of the modern theory of evolution, and it is difficult to imagine that male and female humans have faced identical circumstances across the millennia. Most obviously, men and women have faced systematically different challenges, framed by the nature of the reproductive cycle.
ADATTAMENTO DIVERSO PER UOMO E DONNA
Indeed, when an economist like Brad DeLong (2005) cleanly lays out the terrible dilemma facing women in academia, he inadvertently lays out an evolutionary dilemma as well… The process of climbing to the top of the professoriate is structured as a tournament, in which the big prizes go to those willing to work the hardest and the smartest from their mid-twenties to their late thirties. Given our society (and our biology), a man can enter this tournament without foreclosing many life possibilities [since he can more easily intertemporally substitute fatherhood]…. But given our society (and our biology), a woman cannot
IL DILEMMA DELL'ACCADEMICA
men face a greater expected payoff to taking big risks in the early parts of their life, and, empirically, men are more likely to engage in risky behavior than women. For men and their genes, there is almost always another day. For women, the trade-off is much crueler.
IL RISCHIO È MASCHIO
Men and women differ by 1 to 2 percent of their genomes, Dr. [David] Page said, which is the same as the difference between a man and a male chimpanzee or between a woman and a female chimpanzee….‘ We all recite the mantra that we are 99 percent identical and take political comfort in it,’ Dr. Page said. ’But the reality is that the genetic difference between males and females absolutely dwarfs all other differences in the human genome.’ (Wade 2003)
DIFFERENZE NEL GENOMA
A final genetic note: The fact that men have only one X-chromosome is a fact too large to omit. A woman has two X chromosomes, so if a particular gene is non-functioning on one X chromosome, then she is very likely to have a functioning copy on her second X-chromosome. A man, by contrast, is in no such luck.
CROMOSOMI
brain anatomy and evidence of sexual differentiation
The findings of Allen and Gorski (2002, 291) appear to sum up the consensus on hormones: “With respect to mammals, high levels of sex hormones— whether secreted by the testes or administered by a scientist— result in masculine brain development.”
ORMONI
Halpern (2000, 180)… “There are many studies in which low testosterone for males and high testosterone for females are associated with better performance on several different spatial tests” (171). Kimura (1999, 122) concludes that “the ‘optimal’ level of T[ estosterone] for spatial ability in humans is that of the normal male with lower levels.” Finally, when older men and older women have received hormone replacement therapy, or when people receive hormone therapy as part of a sex change operation, the “expected cognitive changes occurred” (Kimura 1999, 122).
ORMONI E PERFORMANCE
Economists use these spatial abilities in geometric and topological reasoning, so these differences may help explain why ĝ, the fraction of economists who are female, is below 50%.
ECONOMISTI E ABILITÀ SPAZIALE
The best-documented sexual dimorphism in mammals is in the pre-optic area of the hypothalamus, located just in front of the brain stem. This is about twice as big in human males as in human females— a difference visible to the naked eye— and is involved with reproductive behavior.
DIMORFISMI
The hippocampus, a site related to memory and spatial organization, also differs between the sexes (Cahill 2006); it is larger in human females when adjusted for brain size— a relatively recent finding. The finding is unsurprising since women typically do better on tests of memory retrieval and spatial memory.
HIPPOCAMPUS
So while women typically perform worse on spatial rotation tasks, such as what the letter “F” looks like when rotated in three dimensions, they do better at spatial memory tasks, such as where she put the car keys.
ROTAZIONE E MEMORIA SPAZIALE
men’s brains weigh about 15 percent more than women’s.
PESO DEL CERVELLO
modern MRI scans indicate that within a given sex there is a positive correlation between brain size and IQ score (correlations of 0.3 to 0.4 are common), there is less evidence that men and women differ on average overall intelligence.
DIMENSIONI DEL CERVELLO E IQ
In the neuroscience literature, it’s commonly observed that women’s brains are “more balanced” or “better connected” between left and right hemispheres.
DONNE CON EMISFERI PIÙ CONNESSI
the impacts of fetal hormones on brain development are clear enough that there is little debate in the literature over whether some structural differences between men’s and women’s brains are genetically driven.
ORMONI E CERVELLO
MRI scans show that male and female brains consistently use different structures to solve the same kinds of problems: ‘Every time you do a functional MRI on any test, different parts of the brain light up in men and women,’ says Florence Haseltine,
PROBLEM SOLVING
test scores as an indicator of mental ability
A common observation is that men have greater variability than women. Halpern (2000, 86)… It was consistently found that males were more variable than females in general knowledge, mechanical reasoning, quantitative ability, spatial visualization, and spelling… The high math variances are most relevant: On the SAT-Math, Feingold found that male variances were 20-25% larger for males in the four decades before his study, while on SAT-Verbal scores, male variances were about 5% higher…….
VARIANZA NEI PUNTEGGI DEI TEST
I turn to the ability that is likely most relevant to the economics profession as it currently exists: Mathematical abilities… Jonung and Ståhlberg state in their abstract,“[ W] e find economics to be more akin to mathematics than to the other social sciences.”
ECONOMIA E MATEMATICA
The usual stereotype drawn from the psychological literature is that men are better at math and visuospatial skills than women, especially at the upper end of the distribution. The crucial caveats to this generalization are that women are consistently better (on average) at arithmetic and computation than men,
DIFFERENZE IN MATEMATICA
The fact that women are better at computation is especially intriguing in light of recent changes in the accounting profession: In a field that was formerly male-dominated, more than half of all Bachelor’s degrees in accounting are now conferred on women (Koretz 1997, Briggs 2007).
CONTABILITÀ
according to Kimura (1999): She notes that boys do better on math aptitude tests (with the exception of girls’ superior computation ability), while girls do better on math achievement tests.
MATEMATICA: ATITUDINE E AVANZAMENTO
By way of explanation, Kimura notes (78): Since both aspects of math are taught by the same person, teacherrelated factors are unlikely to be the explanation. Nor do other ‘socialization’ explanations such as gender bias in problem content, math anxiety, parental expectation, and so on, adequately account for the differences.
SPIEGAZIONE: ESCLUSA LA SOCIALITÀ
psychologists indeed have addressed the possibility that their tests are biased: They’ve gone out of their way to write word problems that favor females (e.g., “Martha is making square cookies,” Kimura, 1999, 77) but males still perform better
FORSE IL FRAMING?
One source of evidence on the question of male-female differences is neurological disorders. Many such disorders are more common among men than among women; one that deserves particular attention is autism. Simon BaronCohen and his coauthors (2004, 2005) have theorized that autism is largely an “extreme male mind,”
L'INDIZIO DEI DISORDINI MENTALI. AUTISMO
Another source of data is meta-studies by psychologists. In a survey of meta-studies entitled “The Gender Similarities Hypothesis,” Hyde (2005) collected dozens of meta-studies of gender differences in cognitive abilities and personality traits. Among her findings is that on tests of mental rotation, spatial visualization, and spatial perception, males consistently perform better than females, with a median estimate of 0.44 standard deviations above females. Female advantages on tests of verbal fluency, language, and spelling are of the same order of magnitude. Males are overwhelming more aggressive than females (about 0.5 standard deviations, regardless of measure), and females are more agreeable and (importantly, in my view) more conscientious by about 0.2 standard deviations. The female advantage in conscientiousness is likely of first-order importance, particularly in academia, where tenure-track professors need to be self-starters.
META STUDI SUI CARATTERI
If the men of today actually do have an advantage in spatial ability— an advantage, based on Hyde (2005), that raises their mean 0.5 standard deviation higher than the female mean— and if we temporarily assume that men and women have the same standard deviations on this ability, then, at two standard deviations above the female mean, the ratio of men to women is 2.4: 1; at three standard deviations it’s 4: 1, and at four standard deviations it’s 6.5: 1. Adding in a 5% gender difference in standard deviations (as Deary 2003 found for IQ) raises these ratios to 2.5: 1, 5: 1 and 11: 1, respectively.
QUANTIFICARE LA DIFFERENZA
conclusion
With current scientific understanding, the male-female differences on mathematical skills appear likely to persist, even under plausible social interventions like gender-neutral teaching methods.
POLICY INUTILI
Economics could change itself so that it draws on the skills at which women, on average, excel. A more literary and historical economics, one more driven by verbal fluency and conscientious archival work, would be an economics that created greater opportunities for women.
UNICA VIA: DEVE CAMBIARE LA MATERIA. PIU’ STORIA, MENO ANALISI

sabato 25 febbraio 2017

Riassunto complessivo Measuring Up by Daniel M Koretz

Measuring Up by Daniel M Koretz
You have 121 highlighted passages
You have 52 notes
Last annotated on February 24, 2017
Chapter 1 If Only It Were So SimpleRead more at location 69
Note: 1@@@@@@@@@@@@@@ I CINQUE PROBLEMI DI UN TEST Edit
help her identify good schools.Read more at location 71
Note: RICHIESTA Edit
She assumed that because of what I do for a living, I ought to know this.Read more at location 71
the strength of the school’s music or athletic programs, some special curricular emphasis, school size, social heterogeneity, and so on.Read more at location 75
Note: x COSE DA CONSID PRIMA DEI TEST Edit
visit a few schools that looked promising.Read more at location 77
observations and descriptive informationRead more at location 81
She was not pleased. She clearly wanted an answer that was uncomplicatedRead more at location 82
Note: VOGLIA DI SEMPLIFICARE Edit
less ambiguity and complexity.Read more at location 83
They wanted something simpler: the names of the schools with the highest test scores,Read more at location 86
“If all you want is high average test scores, tell your realtor that you want to buy into the highest-income neighborhood you can manage. That will buy you the highest average score you can afford.”Read more at location 88
Note: x LA RISPOSTA AI SEMPLIFICATORI Edit
misunderstandingsRead more at location 90
that scores on a single test tell us all we need to know about studentRead more at location 90
Note: ... Edit
to know about schoolRead more at location 91
Note: c Edit
A third common misconception is that testing is simple and straightforward.Read more at location 92
No Child Left Behind,Read more at location 93
“A reading comprehension test is a reading comprehension test. And a math test in the fourth grade—there’s not many ways you can foul up a test … It’s pretty easy to ‘norm’ the results.”Read more at location 93
Note: x BUSH SEMPLIFICA Edit
this claim was entirely wrong: it is all too easy to foul up the design of a test,Read more at location 96
testing seems so misleadingly simpleRead more at location 103
Testing has become a routine part of our vocabularyRead more at location 106
For many years, Parade magazine has featured a regular column by Marilyn vos Savant, who is declared by the magazine to have the highest IQ in the country. Rather than simply saying that Ms. vos Savant is one damned smart person, if indeed she is, the editors use the everyday vocabulary of “IQ”—justRead more at location 107
Note: X TIPICO EQUIVOCO Edit
very few readers have any idea what an IQ test containsRead more at location 110
another issue: the rhetorical power of testing.Read more at location 112
it is just another way of saying that she is smart. But it does seem to give the assertion more weight, a patina of scientific credibility.Read more at location 112
So what are some of the complications that make testing and the interpretation of scores so much less straightforwardRead more at location 120
At first, they may seem discouragingly numerous.Read more at location 121
test scores usually do not provide a direct and complete measure of educational achievement.Read more at location 129
they are incomplete measures,Read more at location 129
these tests can measure only a subset of the goals of education.Read more at location 131
Note: PRIMA RAG INCOMPL Edit
tests are generally very small samples of behavior that we use to make estimates of students’Read more at location 132
Note: SEC RAGIONE Edit
an achievement test is in many ways like a political poll,Read more at location 134
opinions of a small number of voters are usedRead more at location 134
different tests often provide somewhat inconsistent results.Read more at location 139
Note: CONSEG. PROBLEMA DELL INVALIDITÀ Edit
For example, for more than three decades the federal government has funded a large-scale assessment of students nationwide called the National Assessment of Educational Progress, often simply labeled NAEP (pronounced “nape”), which is widely considered the best single barometer of the achievement of the nation’s youth. There are actually two NAEP assessments, one (the main NAEP) designed for detailed reporting in any given year, and a second designed to provide the most consistent estimates of long-term trends. Both show that mathematics achievement has been improving in both grade four and grade eight—particularly in the fourth grade, where the increase has been among the most rapid nationwide changes in performance, up or down, ever recorded. But the upward trend in the main NAEP has been markedly faster than the improvement in the long-term-trend NAEP. Why? Because the tests measure mathematics somewhat differently,Read more at location 139
Note: x AESEMPIO Edit
When scores have serious consequences, scores on the test that matters often go up far faster than scores on other tests.Read more at location 152
Note: HIGHT STAKE ATTENDIBILITÀ Edit
The experience in Texas during George Bush’s tenure as governor provides a good illustration. At that time, the state used the Texas Assessment of Academic Skills (TAAS) to evaluate schools, and high-school students were required to pass this test in order to receive a diploma. Texas students showed dramatically more progress on the TAAS than they did on the National Assessment of Educational Progress.Read more at location 153
Note: X ES DEL TEXAS Edit
Even a single test can provide varying results.Read more at location 163
Note: PROBLEMA DELL ATTENDIBILITÀ Edit
Students who take more than one form of a test typically obtain different scores.Read more at location 164
SAT college-admissions test more than once,Read more at location 165
These arise partly because the test forms, while designed to be equivalent, have different content,Read more at location 165
Fluctuations also occur because students have good and bad days:Read more at location 166
too nervous to sleep wellRead more at location 167
it makes no sense to place much faith in small differencesRead more at location 168
Then there is the problem of figuring out how to report performance on a test.Read more at location 171
Note: CALCOLO Edit
Most of us grew up in a school system with some simple but arbitrary rules for grading tests,Read more at location 172
We know that to obtain a grade of “A” can require much more in one class than in another.Read more at location 174
Psychometricians therefore have had to create scales for reporting performance on tests.Read more at location 175
various scalesRead more at location 179
Note: ... RAPPRESENTAZIONE Edit
provide differing views of performance.Read more at location 180
Note: c Edit
Further, sometimes a test does not function as it should. A test may be biased,Read more at location 180
For example, a mathematics test that requires reading complex text and writing long answers may be biased against immigrant students who are competent in mathematics but have not yet achieved fluency in English.Read more at location 181
Note: x BIAS IMMIGRATI Edit
bias must be distinguished from simple differences in performanceRead more at location 183
For instance, if poor students in a given city attend inferior schools, a completely unbiased test is likely to give them lower scores because the inferior teaching they received impeded their learning.Read more at location 184
Note: x IMBARAZZO X I TEST NN BIAS Edit
For example, the assessment designs that are best for providing descriptive information about the performance of groups (such as schools, districts, states, or even entire nations) are not suitable for systems in which the performance of individual students must be compared. Adding large, complex, demanding tasks to an assessment may extend the range of skills you can assess, but at the cost of making information about individual students less trustworthy.Read more at location 190
Note: x NN ESISTE IL TEST OTTIMO. ESEMPIO. IL SETTING VARIA AL VARIARE DEGLI SCOPI Edit
principles of testing are beyond the reach of most people.Read more at location 201
Note: t Edit
validity, reliability, bias, scaling, and standard setting,Read more at location 204
Note: CONCETTI CHIAVE Edit
Many people simply dismiss these complexities,Read more at location 205
proclivity to associate the arcane with the unimportantRead more at location 207
Chapter 2 What Is a Test?Read more at location 215
Note: 2@@@@@@@@@@@ UN TEST È UTILE SE LE SKILL MISURATE RAPPRESENTANO BENE QUELLE COINVOLTE NELL ASSOLVIMENTO DI UN COMPITO Edit
ON SEPTEMBER 10, 2004, a Zogby International poll of 1,018 likely voters showed George W. Bush with a 4-percentage-point lead over John Kerry in the presidential election campaign. These results were a reasonably good prediction: Bush’s margin when he won two months later was about 2.5 percent.Read more at location 216
Note: x FIDUCIA NEI SONDAGGI Edit
Occasionally, the polls are substantially wrong—theRead more at location 219
classic example is Truman versus Dewey in 1948,Read more at location 219
The basic principles underlying polling,Read more at location 222
provide a handy way to explain the workings of achievement tests.Read more at location 223
why should we care about these 1,018 people? Because together they represent the 121 millionRead more at location 228
ability to make this predictionRead more at location 234
Note: ... Edit
It depends on the design of the sample,Read more at location 235
Note: c Edit
If Zogby had sampled only individuals in UtahRead more at location 236
Note: ... Edit
the sample would not have been a good representationRead more at location 236
Note: c Edit
errors of sample design,Read more at location 239
Accuracy also depends on the way in which survey questions are worded;Read more at location 239
changes in the wording of questions can have substantial effects on respondents’ answers.Read more at location 240
Original question: “What is the average number of days each week you have butter?” Revised question: “The next question is just about butter. Not including margarine, what is the average number of days each week you have butter?”Read more at location 242
Note: x ES Edit
Finally, accuracy depends on the ability or willingness of respondentsRead more at location 248
when students are asked about parental income, for example. They may refuseRead more at location 249
“social desirability bias”: a tendency for some respondents to provide socially acceptableRead more at location 251
For example, a study published in 1950 documented substantial overreporting of several different types of socially desirable behavior. Thirty-four percent of respondents reported that they had contributed to a specific local charity when they had not, and 13 to 28 percent of respondents claimed to have voted in various elections in which they had not.Read more at location 254
Note: x ES SOCIAL BIAS Edit
Educational achievement tests are in many ways analogous to this Zogby poll in that they are a proxy for a better and more comprehensive measure that we cannot obtain.Read more at location 258
Note: x IL TEST È UN SONDAGGIO Edit
The full range of skills or knowledge about which the test provides an estimate—analogous to the votes of the entire population of voters in the Zogby survey—is generally called the domain by those in the trade.Read more at location 264
Note: x DOMINIO Edit
Chapter 3 What We Measure: Just How Good Is the Sample?Read more at location 479
Note: 3@@@@@@@@@@ Edit
there are some aspects of the goals of education that achievement tests are unable to measure.”Read more at location 482
Note: CRITICI Edit
the label “anti-testing”Read more at location 483
Tests measure what is important, their argument goes, and those who focus on other “goals” are softies.Read more at location 483
Note: X MOTTO DEI MERITOCRATICI Edit
These critics are not entirely wrong.Read more at location 484
recognize this limitation of testing,Read more at location 487
obscure paper published more than half a century ago by E. F. LindquistRead more at location 489
Note: BASE PER I CRITICI Edit
“Preliminary Considerations in Objective Test Construction.”Read more at location 490
he was remarkably prescient in anticipating controversies that engulfed the world of educationalRead more at location 501
LindquistRead more at location 506
Note: giù Edit
goals of education are diverse,Read more at location 507
only some of these goals are amenable to standardizedRead more at location 507
some other types of skills are far more difficult to test.Read more at location 509
interest in learningRead more at location 510
Note: AB NN MIS 1 Edit
ability to apply knowledgeRead more at location 511
Note: 2 Edit
The evidence shows unambiguously that standardized tests can measure a great deal that is of value,Read more at location 513
some of what it omits is very important.Read more at location 515
ITBS manual advises school administrators explicitly to treat test scores as specialized information that is a supplement to, not a replacement for, other information about students’ performance. And for the same reason, itRead more at location 516
Note: x ES DI ATTEGGIAM ACCORTO Edit
Second, Lindquist argued that even many of the goals of schooling that are amenable to standardized testing can be assessed only in a less direct fashion than we would like.Read more at location 523
Note: x SECONDA LACUNA Edit
focus of daily attention for teachers and students are just proxiesRead more at location 524
ultimate goals are too general and too remoteRead more at location 525
For example, why do we teach students algebra?Read more at location 526
to teach students how to reason algebraically so that they can apply this reasoning to the vast array of circumstances outside of school to which it is relevant. This sort of very general goal, however, is remote from decisions about the algebra content to be taught in a given middle school this Thursday morning.Read more at location 527
Note: x XCHÈ L ALGEBRA? Edit
curriculum designers and teachers must make a large number of specific decisions about what algebra to teach. For example, do students learn to factor quadratic equations? Many considerations shape these decisions, not just a subject’s possible utility in a wide range of work-related and other contexts years later.Read more at location 529
Note: c Edit
An anecdoteRead more at location 531
difference between learning content specified in a curriculum and later application of that knowledge.Read more at location 532
Many years ago, I had Sunday brunch in Manhattan with three New Yorkers. All were highly educated, and all had taken at least one or two semesters of mathematics beyond high school. In my experience, New York natives make their way about town in part by drawing on a prodigious knowledge of the location of various landmarks, such as the original Barnes and Noble store on Fifth Avenue. That Sunday morning, I found to my surprise that none of the three New Yorkers could figure out the location of the restaurant where we were to have brunch. It was on one of the main avenues, and they knew the address, but they could not figure out the cross street. I suggested that the problem might turn out to be a very simple one. I asked if they knew where the addresses on the avenues in that part of Manhattan reached zero and, if so, whether they reached zero at the same street. They quickly agreed that they did and gave me the name of the cross street. I then asked if the addresses increased at the same rate on these avenues, and if so, at what rate. That is, how many numbers did the addresses increase with each cross street? They were quite certain that the rate was the same, but it took a little more work to figure out what it was. Using a few landmarks they knew (including the original Barnes and Noble store), they figured out the rate for a couple of avenues. The rates were the same. At that point, they had the answer, although they had not yet realized it.Read more at location 532
Note: x ES DEI TRE STUDENTI PERSI A NY Edit
The problem was a simple linear equationRead more at location 542
All three were competent in dealing with algebra much more complex than this, but they had not developed the habit of thinking of real-world problems in terms of the mathematics they had learned in the classroom.Read more at location 545
Note: x LA CONDIZ DEI TRE STUD Edit
in the ideal world we would assess achievement by measuring the ultimate goalsRead more at location 551
“The only perfectlyRead more at location 552
Note: ... Edit
would be one based on direct observationRead more at location 553
But this sort of measurement is clearly impractical,Read more at location 557
a test author usually has to focus on the proximate goals of educators, even if these are only proxies for the ultimate social goals of education.Read more at location 579
Note: X RIPET IL DIFETTO Edit
we have to put all test-takers in the same environmentRead more at location 583
Note: CONOSC E ABILIT Edit
Lindquist wanted as much as practical to isolate specific knowledgeRead more at location 589
tests to include tasks that focus narrowly on these specifics.Read more at location 590
attempting to create test items that present complex, “authentic” tasks more similar to those students might encounter out of school.Read more at location 595
they conduct a “holistic” review of applicants, considering not only SAT or ACT scores but also grades, personal statements, persistence in extracurricular activities, and so on.Read more at location 609
Note: CONSIDERARE TEST MA ANCHE ALTRO Edit
in much of the testing that now dominates K–12 education, Lindquist’s advice that test scores must be seen as incomplete measures is widely ignored.Read more at location 616
Chapter 10 Inflated Test ScoresRead more at location 3298
Note: 10@@@@@@@@@@@@@@ Edit
performance is getting better, and rapidly.Read more at location 3302
this good news is often more apparent than real