a massive hoard: 250 billion photos, 350 million uploaded per day, according to a Facebook announcement in 2013.
“deep learning”: For an overview, see “The Code That Runs Our Lives,” video uploaded to YouTube by the Agenda with Steve Paikin, March 3, 2016, https://goo.gl/nTHx8c; for the graduate seminar, see “Geoffrey Hinton: ‘Introduction to Deep Learning & Deep Belief Nets,’” video uploaded to YouTube by Institute for Pure and Applied Mathematics (IPAM), August 24, 2015, https://goo.gl/CdLXnO; for the spooky part, see Raffi Khatchadourian, “The Doomsday Invention,” New Yorker, November 23, 2015.
Stylometry: “Stylometry Methods and Practices: Home,” Temple University.
Statistical profiles of texts: A statistical profile of the book was compiled using a few easily countable linguistic properties (e.g., the one hundred most frequently used words, the most frequent two-word combinations). These statistics were then compared to the ones for books written by some other female British mystery writers and a Harry Potter book. The match between the Galbraith and Potter books was much closer than for any other pair, suggesting they were written by the same person. Rowling outed herself as the author soon thereafter. For the tale as told by Patrick Juola, one of the sleuths, see Ben Zimmer, “The Science That Uncovered J. K. Rowling’s Literary Hocus-Pocus,” Wall Street Journal, July 16, 2013.
Similar methods showed: Markowitz & Hancock (2014).
The most poignant application: About Murdoch, see Dwight Garner, “Review: ‘Living on Paper,’ Seven Decades of Letters from Iris Murdoch,” New York Times, January 5, 2016. Byatt quote from Garrard et al. (2005).
Truman Capote famously: The David Susskind Show, January 18, 1959. From Battaglio (2011).
People have been tabulating: For a newspaper article about an early Bible concordance compiled by computer, a Univac, see E. C. Keissling, “Faith and Univac,” Milwaukee Journal, July 14, 1957, Google News, https://goo.gl/Gh2rSK.
They begin in utero: Jusczyk (2000).
Later, reading becomes: Romberg & Saffran (2010).
As the text circulated: The people of the Internet created many versions of this text. Mine is from Snopes.com (“Can You Raed Tihs?,” Snopes, http://goo.gl/mgKJNe). See the history here: “Aoccdrnig to Rscheearch . . . ,” Know Your Meme, http://goo.gl/QntGJs. For an account written by an actual Cambridge reading researcher, see http://goo.gl/q4PUcj.
Some people think: For one such example, see Lidor Wyssocky, “The Magic Button,” Creativity Post, June 29, 2016, http://goo.gl/aDiXr4.
The range of possibilities: Adams (1980) is a primary source for this section.
property is known as redundancy: See Gleick (2011) or “Intro to Information Theory: Claude Shannon, Entropy, Redundancy, Data Compression, and Bits,” Cracking the Nutshell, http://goo.gl/bKeqXV. “Redundancy” is one of a set of related concepts, including information, mutual information, uncertainty, and others.
Captcha security systems: Captchas keep getting harder in part because bots are getting better. More importantly, they can be defeated by cheap human labor paid to decode them: Motoyama et al. (2010).
English is redundant: At one time the Apple iTunes store censored the Gilbert and Sullivan song “T*t Willow,” exactly like that. I clipped an image of it on March 9, 2010, available on seidenbergreading.net. It was not censored on British iTunes.
Peter Norvig: “English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU,” Norvig.com, http://norvig.com/mayzner.html.
all that spelling data: Just as organisms are merely vehicles for passing along their selfish genes, books are merely the vehicles for passing along orthographic statistics!
pigeon reading: Blough (1982).
A study in Science: Grainger et al. (2012).
“baboons can read!”: Goldberg (2012).
tournament Scrabble players: Fatsis (2001).
Scrabble skills do not carry: Tuffiash, Roring, & Ericsson (2007); Hargreaves et al. (2012).
“The Knowledge”: For a famous study of those taxi drivers, see Maguire et al. (2000).
Unlike Scrabblists: Maguire (2006).
A utility for generating Cmabrigde: “Scramble a Word,” 4umi, http://goo.gl/K8N6bI.
The only broad generalization: The amount of information needed to identify a given word isn’t fixed; more is needed in noisy contexts (ones in which the letters are obscured or hard to see) than in clear ones.
how well a book will sell: Ashok et al. (2013) took a stab at it.
president uses personal pronouns: Obama critics repeatedly cast aspersions about his unusually frequent use of first-person pronouns (e.g., http://goo.gl/Qufkqf). They hadn’t actually counted, but Mark Liberman at Language Log did: http://goo.gl/804Qa8.
Sudoku: Seidenberg & MacDonald (1999). The mechanisms that I am describing informally have been rendered in more rigorous computational and quantitative terms. See, e.g., Piantadosi, Tily, & Gibson (2012), Chater, Tenenbaum, & Yuille (2006), and Flusberg & McClelland (2014).
Working back and forth: David Rumelhart described reading as an interactive process in a classic 1977 article. McClelland & Rumelhart (1981) then took the next step, implementing a computational model of interactive processes in word and letter recognition. I have informally redescribed a bit of this hugely influential work.
The answer could be wrong: The representation of words by combinations of semantic and phonological cues in writing systems is based on this principle.
Wheel of Fortune: “Wheel of Fortune: Amazing One-Letter Solve!,” video uploaded to YouTube by Wheel of Fortune, March 26, 2012, https://goo.gl/bjU1Hz.