Information technology, gadgets, social media, libraries, design, marketing, higher ed, data visualization, educational technology, mobility, innovation, strategy, trends and futures. . . 

Posts suspended for a bit while I settle into a new job. . . 

Entries in Data (3)


Big Data: Hospital

(See also the Data tag.)

This is a good description of how one major NYC healthcare institution would use "big data" to understand patients' health status. 

In The Hospital Of The Future, Big Data Is One Of Your Doctors

From our genomes to Jawbones, the amount of data about health is exploding. Bringing on top Silicon Valley talent, one NYC hospital is preparing for a future where it can analyze and predict its patients' health needs--and maybe change our understanding of disease.

The office of Jeff Hammerbacher at Mount Sinai's Icahn School of Medicine sits in the middle of one of the most stark economic divides in the nation. To Hammerbacher’s south are New York City’s posh Upper East Side townhouses. To the north, the barrios of East Harlem.

What's below is most interesting: Minerva, a humming supercomputer installed last year that's named after the Roman goddess of wisdom and medicine.

It’s rare to find a supercomputer in a hospital, even a major research center and medical school like Mount Sinai. But it’s also rare to find people like Hammerbacher, a sort of human supercomputer who is best known for launching Facebook’s data science team and, later, co-founding Cloudera, a top Silicon Valley “big data” software company where he is chief scientist today. After moving to New York this year to dive into a new role as a researcher at Sinai’s medical school, he is setting up a second powerful computing cluster based on Cloudera’s software (it’s called Demeter) and building tools to better store, process, mine, and build data models. “They generate a pretty good amount of data,” he says of the hospital’s existing electronic medical record system and its data warehouse that stored 300 million new “events” last year. “But I would say they are only scratching the surface.”

Combined, the circumstances make for one of the most interesting experiments happening in hospitals right now--one that gives a peek into the future of health care in a world where the amount of data about our own health, from our genomes to our Jawbone tracking devices, is exploding.

“What we’re trying to build is a learning health care system,” says Joel Dudley, director of biomedical informatics for the medical school. “We first need to collect the data on a large population of people and connect that to outcomes.” Could there actually be three types of Type 2 diabetes? A look at the health data of 30,000 volunteers hints that we know less than we realize. Credit: Li Li, Mount Sinai Icahn School of Medicine, and Ayasdi

To imagine what the hospital of the future could look like at Mount Sinai, picture how companies like Netflix and Amazon and even Facebook work today. These companies gather data about their users, and then run that data through predictive models and recommendation systems they’ve developed--usually taking into account a person’s past history, maybe his or her history in other places on the web, and the history of “similar” users--to make a best guess about the future--to suggest what a person wants to buy or see, or what advertisement might entice them.

Through real-time data mining on a large scale--on massive computers like Minerva--hospitals could eventually operate in similar ways, both to improve health outcomes for individual patients who enter Mount Sinai’s doors as well as to make new discoveries about how to diagnose, treat, and prevent diseases at a broader, public health scale. “It’s almost like the Hadron Collider approach,” Dudley says. “Let’s throw in everything we think we know about biology and let’s just look at the raw measurements of how these things are moving within a large population. Eventually the data will tell us how biology is wired up.”

Full article at link. 


Mining Web for Drug Adverse Effects

The New York Times is among media reporting on a study by White et al in JAMIA (Journal of the Medical Informatics Association) -- 

J Am Med Inform Assoc doi:10.1136/amiajnl-2012-001482 

Brief communication 

Web-scale pharmacovigilance: listening to signals from the crowd 

Ryen W White1, Nicholas P Tatonetti2, Nigam H Shah3, Russ B Altman4, Eric Horvitz1 

+ Author Affiliations
1Microsoft Research, Redmond, Washington, USA
2Department of Biomedical Informatics, Columbia University, New York, New York, USA
3Department of Medicine, Stanford University, Stanford, California, USA
4Departments of Bioengineering and Genetics, Stanford University, Stanford, California, USA 

Correspondence to Dr Ryen W White, Microsoft Research, Redmond, WA 98052, USA; 

Received 9 November 2012
Revised 8 January 2013
Accepted 13 January 2013
Published Online First 6 March 2013 

Abstract (emphasis added) 

Adverse drug events cause substantial morbidity and mortality and are often discovered after a drug comes to market. We hypothesized that Internet users may provide early clues about adverse drug events via their online information-seeking. We conducted a large-scale study of Web search log data gathered during 2010. We pay particular attention to the specific drug pairing of paroxetine and pravastatin, whose interaction was reported to causeThe New York Times hyperglycemia after the time period of the online logs used in the analysis. We also examine sets of drug pairs known to be associated with hyperglycemia and those not associated with hyperglycemia. We find that anonymized signals on drug interactions can be mined from search logs. Compared to analyses of other sources such as electronic health records (EHR), logs are inexpensive to collect and mine. The results demonstrate that logs of the search activities of populations of computer users can contribute to drug safety surveillance.

Full JAMIA article available via you local library

The New York Times story -- 

Unreported Side Effects of Drugs Are Found Using Internet Search Data, Study Finds


Published: March 6, 2013 

Using data drawn from queries entered into Google, Microsoft and Yahoo search engines, scientists at Microsoft, Stanford and Columbia University have for the first time been able to detect evidence of unreported prescription drug side effects before they were found by the Food and Drug Administration’s warning system.

Using automated software tools to examine queries by six million Internet users taken from Web search logs in 2010, the researchers looked for searches relating to an antidepressant, paroxetine, and a cholesterol lowering drug, pravastatin. They were able to find evidence that the combination of the two drugs caused high blood sugar.

The study, which was reported in the Journal of the American Medical Informatics Association on Wednesday, is based on data-mining techniques similar to those employed by services like Google Flu Trends, which has been used to give early warning of the prevalence of the sickness to the public.

The F.D.A. asks physicians to report side effects through a system known as the Adverse Event Reporting System. But its scope is limited by the fact that data is generated only when a physician notices something and reports it.

Full article continues at link




Cheap Big Data

(Bits, The New York Times's technology blog -- "The Business of Technology" -- is good to follow.) 

"Big data," according to the redoubtable Wikipedia, is

. . . a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions. 

. . . Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, cameras, microphones, radio-frequency identification readers, and wireless sensor networks. 

This article in The New York Times "Bits" technology column/blog lays out the prospect of big data capability exiting largely the domain of well resourced enterprises (companies, universities).

Big Data Done Cheap 


Some new product impress for what they says about the future. Win or lose, they show where the world is going with near certainty. 

In this case, the product is Big Data computing at near consumer prices.

Violin Memory is an eight year-old company that makes large-scale data storage systems for computer centers. Its boxes fetch information uncommonly fast. Now, the company is going downmarket, with data storage for individual computer servers. These data cards create powerful machines that can do sophisticated work, at less than one-tenth the current costs of storage. 

If the product works, ordinary servers costing a few thousand dollars might be deployed for sophisticated data analysis, genetic research, logistics management, or other activities that are currently done on multimillion-dollar racks of computers. It could make possible much cheaper real-time computing projects at companies and schools, bringing in more customers and experimentation.

Article continues at link. 

This is just the hardware -- also needed are accessible data sets themselves and the software tools to extract, manipulate, load, management, and analyze them. . .