Not a revolution (yet): Data journalism hasn’t changed that much in 4 years, a new paper finds
May 17, 2011, 11 a.m.

How our bits shape us: James Gleick’s “The Information”

1. Circumscribing information, or words as things

God is an intelligible sphere, whose center is everywhere and circumference is nowhere.

— Alain de Lille, 12th century, glossing the Corpus Hermeticum

One of the more poignant sections of James Gleick’s The Information: A History, A Theory, A Flood concerns the history of attempts to identify, count, and define every word in a natural language. In other words, treating words as things — as information.

An early effort — Robert Cawdrey’s 16th century Table Alphabeticall, only one copy of which has survived — defined 2500 English words. The current edition of the OED lists about 60,000 words that were in use in 1600. Today, the most high-profile efforts to precisely count the number of English words are hoaxes, but suffice it to say, we probably passed a million English words a while ago. The Oxford English Dictionary’s first editor, James Murray, observed that “the circle of the English language has a well-defined centre but no discernible circumference.”

This was when lexicographers were overwhelmingly concerned only with high-legitimacy printed sources, and the task of drawing such a circumference was still thought to be possible. All that was needed were a sufficient number of index cards, metal files, wooden boxes, and amateur assistants — the first Wikipedians. As Gleick writes, “the task seemed formidable but finite.”

The addition of electronic and digital technology to print explodes that sense of finitude, but not just in the ways you might immediately think. Yes, a great deal more language is recorded, so emoticons can go shoulder-to-shoulder with Shakespeare. But the number of English speakers has jumped from five million in 1600 to a billion four centuries later. Every sphere and subsphere are connected. There is no circumference.

Like the printing press, the telegraph, and the telephone before it, the Internet is transforming the language simply by transmitting information differently. What makes cyberspace different from all previous information technologies is its intermixing of scales from the largest to the smallest without prejudice, broadcasting to the millions, narrowcasting to groups, instant messaging one to one.

2. Universal information, or computing power

The invention of writing probably allowed us to think of information as a discrete object, not limited to what was contained in human minds. The invention of computing allowed us to see information everywhere — in electrical waves, strands of DNA, and the firing of a brain’s neurons irrespective of a mind’s content.

Retrospectively, computing even helps us think about very old modes of communication — alphabetic writing, African drum language, the bonfire signals used in Aeschylus’s Agamemnon or Tolkien’s The Lord of the Rings — as information networks, however imperfect.

Along the way, people used a host of metaphors to try to understand the developments they were witnessing and helping to create. “Net-work” is one of these. Another, particularly persistent in the nineteenth century with the rise of the telephone and telegraph, was conceiving of the new form of telecommunications as a global nervous system. In the twentieth century, this stopped being a metaphor, when neurologists and cognitive scientists began treating nerves as electronic message-carriers. What was important wasn’t the fact that both telegraph switches and nerve signals were electrical, but that both transmitted information.

Gleick’s book tells both of these histories, of technical achievements and conceptual breakthroughs, from premodernity to the present. It also shows how every new achievement, every breakthrough, brought new opportunities and new anxieties.

“The advent of the printing press certainly created anxieties that have the identical flavor to those we feel now,” Gleick told me. “I think they must surely be worse at times of transition when new technologies come along like the printing press or the telegraph or our electronic world.

“I love making these connections and discovering how difficult the transitions were for people in earlier time,” Gleick says. “But not because I think that the message is ‘oh, nothing for us to worry about; we’ve been through this before.’ On the contrary, we are going through something truly extraordinary. And in many ways, it is worse than ever. Or if not worse than ever, more intense than ever. Maybe it’s more interesting than ever.”

3. Carrying the news

For instance, in the nineteenth century, commentators argued that the telegraph, rapidly expanding in both reach and popularity, would soon do away with newspapers:

Anticipated at every point by the lightning wings of the Telegraph, [newspapers] can only deal in local ‘items’ or abstract speculations. Their power to create sensations, even in election campaigns, will be greatly lessened — as the infallible Telegraph will contradict their falsehoods as fast as they can publish them.

Instead, as Gleick notes, “newspapers could not wait to put the technology to work. Editors found that any dispatch seemed more urgent and thrilling with the label ‘Communicated by Electric Telegraph.'” Several startup newspapers even took to naming themselves “the Telegraph” — it sounded fast and modern, and suggested that what newspapermen did, too, was “writing at a distance.”

The Crimean War was the first major conflict experienced nearly in real-time by an audience scattered across the globe, because of the telegraph. But first, fast reports, especially those bearing sensational stories, often had to be corrected later. News style was changing, too. Because telegraph operators charged by the word, reporters’ writing became terse, abrupt, factual, economical. Telegraph style became a signal of the writers’ modernity, to be enshrined in style guides like Strunk & White’s.

The telephone, too, changed how news was transmitted, from business to business and household to household. Eliminating running messengers arguably contributed as much to the possibility of skyscrapers as the invention of the elevator. One of Gleick’s best passages describes a brief telecommunication disaster that turned into a profound social transformation:

The first telephone operators were teenage boys, cheaply hired from the ranks of telegraph messengers, but exchanges everywhere discovered that boys were wild, given to clowning and practical jokes, and more likely to be found wrestling on the floor than sitting on stools to perform the exacting, repetitive work of a switchboard operator. A new source of cheap labor was available, and by 1881 virtually every every telephone operator was a woman. In Cincinatti, for example, W.H. Eckert reported hiring sixty-six “young ladies” who were “very much superior to boys: “They are steadier, do not drink beer, and are always on hand. He hardly needed to add that the company could pay a woman as little or less than a teenage boy.

The same logic led to the hiring of young women as typists, secretaries, and “computers,” performing manual calculations that would later migrate to the digital mainframe and then the desktop.

4. The infinite library

When it was proclaimed that the Library contained all books, the first impression was one of extravagant happiness. All men felt themselves to be the masters of an intact and secret treasure. There was no personal or world problem whose eloquent solution did not exist in some hexagon. The universe was justified.

— Jorge Luis Borges, “The Library of Babel” (as quoted in The Information)

These histories of news, computation, and telecommunication bring us back to our present, where the fates of all three are now intertwined. Information travels faster than ever, our algorithmic processes are more powerful than ever, and there is simply more “news” (old and new) to sift through than ever before.

The Information is very much the work of a writer conscious of the unprecedented access to information on his desktop. Our historical forebears “complained about information overload even though by our standards, they were information-starved,” Gleick says. The last chapter of his book, “New News Every Day,” about our current plenitude, takes its title from Robert Burton’s The Anatomy of Melancholy, published in 1621:

I hear new news every day, and those ordinary rumours of war, plagues, fires, inundations, thefts, murders, massacres, meteors, comets, spectrums, prodigies, apparitions, of towns taken, cities besieged, in France, Germany, Turkey, Persia, Poland, &c. daily musters and preparations, and suchlike, which these tempestuous times afford, battles fought, so many men slain, monomachies, shipwrecks, piracies, and sea-fights, peace, leagues, stratagems, and fresh alarms. A vast confusion of vows, wishes, actions, edicts, petitions, lawsuits, pleas, laws, proclamations, complaints, grievances, are daily brought to our ears.

New books every day, pamphlets, currantoes, stories, whole catalogues of volumes of all sorts, new paradoxes, opinions, schisms, heresies, controversies in philosophy, religion, &c.

Our forebears in the age of print invented (and re-invented and elaborated) copyright. They had the dictionary. We have Google.

In many ways, with the help of Google, Wikipedians, and others, we’ve realized many of those early utopian dreams of gathering and preserving the world’s information. Yet, our “omniscience becomes a kind of curse,” says Gleick.

In a chapter late in the book, two scientists versed in both information theory and quantum physics (Charles Bennett and Rolf Landauer) set out to firmly establish the absolute energy costs of computation. With a perfect machine, only information erasure necessarily dissipates heat. That’s the asymptotic limit of computing’s future: perfect storage and perfect operations over everything, for free. Only deleting bits of data comes at a cost.

As Gleick notes, “forgetting takes work.” At the quantum level, information isn’t opt-in; it’s only opt-out.

This distinction isn’t only theoretical. An active member of the Author’s Guild, Gleick helped negotiate the recently-rejected Google Books settlement on behalf of authors. On his blog, Gleick wrote:

Many people, including some I greatly respect, are gleeful about the demise of the arduously worked out settlement of the lawsuits brought by the Authors Guild and book publishers against Google. Not me.

It certainly wasn’t perfect. It involved some messy compromises, as settlements tend to do. It couldn’t satisfy everyone… I fear that many people underestimate the difficulties that lie ahead.

I asked Gleick about the popular sentiment that the settlement was good for publishers but bad for authors. “I think that’s absolutely nuts,” he replied. “I just think there was a lot of garbage being promulgated. And some authors fell for it, but I hope most authors didn’t…. [The settlement] was good for authors.”

According to Gleick, it’s unlikely that the settlement will be able to be revised to anyone’s satisfaction. The judge’s clear position that the database of in-copyright books should be opt-in, not opt-out — and the impracticality for Google to verify ownership for the millions of books it’s already scanned — simply can’t be reconciled. His hope is that the settlement can provide a framework for a national digital library through which authors can still be compensated for their work.

“We wrestled with it for three years,” he noted. “And we came up with this complicated thing. And of course, the problem was, it was Google [who scanned the books and created the database] — as opposed to the Library of Congress. But you have to give Google credit; they spent the hundreds of millions of dollars, and nobody else has been willing to do it. I agree with everybody; it would be better if the Library of Congress did it. And then you could set up the same kind of framework [for payments and author compensation].”

This is just a fraction of the problems we’re grappling with, for which The Information offers both perspective and paradoxes.

“We are all patrons of the Library of Babel now,” Gleick writes, “and we are the librarians, too. We veer from elation to dismay and back.”

James Gleick is an author, journalist, and New York Times Science alumnus who lives in Key West, Florida. In addition to The Information, his earlier books Chaos, on the mathematics of randomness, and Genius, his biography of physicist Richard Feynman, have been re-released as enhanced e-books by Open Road Integrated Media.

POSTED     May 17, 2011, 11 a.m.
