NASIG 2013: Knowledge and Dignity in the Era of Big Data

CC BY 2.0 2013-06-10
“Big Data” by JD Hancock

Speaker: Siva Vaidhyanathan

Don’t try to write a book about fast moving subjects.

He was trying to capture the nature of our relationship to Google. It provides us with a services that are easy to use, fairly dependable, and well designed. However, that level of success can breed hubris. He was interested in how this drives the company to its audacious goals.

It strikes him that what Google claims to be doing is what librarians have been doing for hundreds of years already. He found himself turning to the core practices of librarians as a guideline for assessing Google.

Why is Google interested in so much stuff? What is the payoff to organizing the world’s information and making it accessible?

Big data is not a phrase that they use much, but the notion is there. More and faster equals better. Google is in the prediction/advertising business. The Google books project is their attempt to reverse engineer the sentence. Knowing how sentences work, they can simulate how to interpret and create sentences, which would be a simulation of artificial intelligence.

The NSA’s deals that give them a backdoor to our data services creates data insecurity, because if they can get in, so can the bad guys. Google keeps data about us (and has to turn it over when asked) because it benefits their business model, unlike libraries who don’t keep patron records in order to protect their privacy.

Big data means more than a lot of data. It means that we have so many instruments to gather data, cheap/ubiquitous cameras and microphones, GPS devices that we carry with us, credit card records, and more. All of these ways of creating feed into huge servers that can store the data with powerful algorithms that can analyze it. Despite all of this, there is no policy surrounding this, nor conversations about best ways to manage this in light of the impact on personal privacy. There is no incentive to curb big data activities.

Scientists are generally trained to understand that correlation is not causation. We seem to be happy enough to draw pictures with correlation and move on to the next one. With big data, it is far too easy to stop at correlation. This is a potentially dangerous way of understanding human phenomenon. We are autonomous people.

The panopticon was supposed to keep prisoners from misbehaving because they assumed they were always being watched. Foucault described the modern state in the 1970s as the panopticon. However, at this point, it doesn’t quite match. We have a cryptopticon, because we aren’t allowed to know when we are being watched. It wants us to be on our worst behavior. How can we inject transparency and objectivism into this cryptopticon?

Those who can manipulate the system will, but those who don’t know how or that it is happening will be negatively impacted. If bad credit can get you on the no-fly list, what else may be happening to people who make poor choices in one aspect of their lives that they don’t know will impact other aspects? There is no longer anonymity in our stupidity. Everything we do, or nearly so, is online. Mistakes of teenagers will have an impact on their adult lives in ways we’ve never experienced before. Our inability to forget renders us incapable of looking at things in context.

Mo Data, Mo Problems

NASIG 2013: Libraries and Mobile Technologies in the Age of the Visible College

“This morning’s audience, seen from the lectern.” by Bryan Alexander

Speaker: Bryan Alexander

NITLE does a lot of research for liberal arts undergraduate type schools. One of the things that he does is publish a monthly newsletter covering trends in higher education, which may be worth paying some attention to (Future Trends). He is not a librarian, but he is a library fanboy.

What is mobile computing doing to the world, and what will it do in the future?

Things have changed rapidly in recent years. We’ve gone from needing telephone rooms at hotels to having phones in every pocket. The icon for computing has gone from desktop to laptop to anything/nothing — computing is all around us in many forms now. The PC is still a useful tool, but there are now so many other devices to do so many other things.

Smartphones are everywhere now, in many forms. We use them for content delivery and capture, and to interact with others through social tools. Over half of Americans now have a smartphone, with less than 10% remaining who have no cell phone, according to Pew. The mobile phone is now the primary communication device for the world. Think about this when you are developing publishing platforms.

The success of the Kindle laid the groundwork for the iPad. Netbooks/laptops now range in size and function.

Clickers are used extensively in the classroom, with great success. They can be used for feedback as well as prompting discussion. They are slowly shifting to using phones instead of separate devices.

Smartpens capture written content digitally as you write them, and you can record audio at the same time. One professor annotates notes on scripts while his students perform, and then provides them with the audio.

Marker-based augmented reality fumbled for a while in the US, but is starting to pick up in popularity. Now that more people have smartphones, QR codes are more prevalent.

The mouse and keyboard have been around since the 1960s, and they are being dramatically impacted by recent changes in technology. Touch screens (i.e. iPad), handhelds (i.e. WII), and nothing (i.e. Kinect).

If the federal government is using it, it is no longer bleeding edge. Ebooks have been around for a long time, in all sorts of formats. Some of the advantages of ebooks include ease of correcting errors, flexible presentation (i.e. font size), and a faster publication cycle. Some disadvantages include DRM, cost, and distribution by libraries.

Gaming has had a huge impact in the past few years. The median age of gamers is 35 or so. The industry size is comparable to music, and has impacts on hardware, software, interfaces, and other industries. There is a large and growing diversity of platforms, topics, genres, niches, and players.

Mobile devices let us make more microcontent (photo, video clip, text file), which leads to the problem of archiving all this stuff. These devices allow us to cover the world with a secondary layer of information. We love connecting with people, and rather than separating us, technology has allowed us to do that even more (except when we focus on our devices more than the people in front of us).

We’re now in a world of information on demand, although it’s not universal. Coverage is spreading, and the gaps are getting smaller.

When it comes to technology, Americans are either utopian or dystopian in our reactions. We’re not living in a middle ground very often. There are some things we don’t understand about our devices, such as multitasking and how that impacts our brain. There is also a generational divide, with our children being more immersed in technology than we are, and having different norms about using devices in social and professional settings.

The ARIS engine allows academics to build games with learning outcomes.

Augmented reality takes data and pins it down to the real world. It’s the inverse of virtual reality. Libraries are going to be the AR engine of the future. Some examples of AR include museum tours, GPS navigators, and location services (Yelp, Foursqure). Beyond that, there are applications that provide data overlaying images of what you point your phone at, such as real estate information and annotations. Google Goggles tries to provide information about objects based on images taken by a mobile device. You could have a virtual art gallery physically tied to a spot, but only displayed when viewed with an app on your phone.

Imagine what the world will be like transformed by the technology he’s been talking about.

1. Phantom Learning: Schools are rare and less needed. The number of people physically enrolled in schools has gone down. Learning on demand is now the thing. Institutions exist to supplement content (adjuncts), and libraries are the media production sites. Students are used to online classes, and un-augmented locations are weird.

II. Open World: Open content is the norm and is very web-centric. Global conversations increase, with more access and more creativity. Print publishers are nearly gone, authorship is mysterious, tons of malware, and privacy is fictitious. The internet has always been open and has never been about money. Identities have always been fictional.

III. Silo World: Most information is experienced in vertical stacks. Open content is almost like public access TV. Intellectual property intensifies, and campuses reorganize around the silos. Students identify with brands and think of “open” as radical and old-fashioned.

ER&L 2013: E-Resources, E-Realities

“Tools” by Josep Ma. Rosell

Speakers: Jennifer Bazeley (Miami University) & Nancy Beals (Wayne State University)

Despite all the research on what we need/want, but no one is building commercial products that meet all our needs and addresses the impediments of cost and dwindling staff.

Beals says that the ERM is not used for workflow, so they needed other tools, with a priority on project management and Excel proficiency. They use an internal listserv, UKSG Transfer, Trello (project management software), and a blog, to keep track of changes in eresources.

Other tools for professional productivity and collaboration: iPads with Remember the Milk or Evernote, Google spreadsheets (project portfolio management organization-wide), and LibGuides.

Bazeley stepped into the role of organizing eresources information in 2009, with no existing tool or hub, which gave her room to experiment. For documentation, they use PBWiki (good for version tracking, particularly to correct errors) with an embedded departmental Google calender. For communication, they use LibGuides for internal documents, and you can embed RSS, Google Docs, Yahoo Pipes aggregating RSS feeds, Google forms for eresource access issues, links to Google spreadsheets with usage data, etc.. For login information, they use KeePass Password Safe. Rather than claiming in the ILS, they’ve moved to using the claim checker tool from the subscription agent.

Tools covered:

  • Google Calendar
  • Google Docs (includes forms & spreadsheets)
  • PBWiki
  • LibGuides
  • Yahoo Pipes
  • WordPress
  • KeePass Password Safe
  • PDF Creator
  • EBSCOnet

Others listed:

  • Blogger (blog software)
  • Mendeley (ref manager)
  • Vimeo (videos)
  • Jing (screenshot/screencast)
  • GIMP (image editor)
  • MediaWiki (Wiki software)
  • LastPass (password manager)
  • OpenOffice (software suite)
  • PDF Creator (PDF manipulation)
  • Slideshare (presentation manager)
  • Filezilla (ftp software)
  • Zoho Creator (database software)
  • Dropbox (cloud storage)
  • Github (software management)
  • Subscription agent software (SwetsWise, EBSCOnet)
  • Microsoft Excel / Access
  • Course Management Software (Moodle, Sakai, Blackboard)
  • Open Source ERMS: ERMes (University of Wisconsin-La Crosse) & CORAL (University of Notre Dame)

Moving Up to the Cloud, a panel lecture hosted by the VCU Libraries

“Sky symphony” by Kevin Dooley

“Educational Utility Computing: Perspectives on .edu and the Cloud”
Mark Ryland, Chief Solutions Architect at Amazon Web Services

AWS has been a part of revolutionizing the start-up industries (i.e. Instagram, Pinterest) because they don’t have the cost of building server infrastructures in-house. Cloud computing in the AWS sense is utility computing — pay for what you use, easy to scale up and down, and local control of how your products work. In the traditional world, you have to pay for the capacity to meet your peak demand, but in the cloud computing world, you can level up and down based on what is needed at that moment.

Economies, efficiencies of scale in many ways. Some obvious: storage, computing, and networking equipment supply change; internet connectivity and electric power; and data center sitting, redundancy, etc. Less obvious: security and compliance best practices; datacenter internal innovations in networking, power, etc.

AWS and .EDU: EdX, Coursera, Texas Digital Library, Berkeley AMP Lab, Harvard Medical, University of Phoenix, and an increasing number of university/school public-facing websites.

Expects that we are heading toward cloud computing utilities to function much like the electric grid — just plug in and use it.

“Libraries in Transition”
Marshall Breeding, library systems expert

We’ve already seen the shift of print to electronic in academic journals, and we’re heading that way with books. Our users are changing in the way they expect interactions with libraries to be, and the library as space is evolving to meet that, along with library systems.

Web-based computing is better than client/server computing. We expect social computing to be integrated into the core infrastructure of a service, rather than add-ons and afterthoughts. Systems need to be flexible for all kinds of devices, not just particular types of desktops. Metadata needs to evolve from record-by-record creation to bulk management wherever possible. MARC is going to die, and die soon.

How are we going to help our researchers manage data? We need the infrastructure to help us with that as well. Semantic web — what systems will support it?

Cooperation and consolidation of library consortia; state-wide implementations of SaaS library systems. Our current legacy ILS are holding libraries back from being able to move forward and provide the services our users want and need.

A true cloud computing system comes with web-based interfaces, externally hosted, subscription OR utility pricing, highly abstracted computing model, provisioned on demand, scaled according to variable needs, elastic.

“Moving Up to the Cloud”
Mark Triest, President of Ex Libris North America

Currently, libraries are working with several different systems (ILS, ERMS, DRs, etc.), duplicating data and workflows, and not always very accurately or efficiently, but it was the only solution for handling different kinds of data and needs. Ex Libris started in 2007 to change this, beginning with conversations with librarians. Their solution is a single system with unified data and workflows.

They are working to lower the total cost of ownership by reducing IT needs, minimize administration time, and add new services to increase productivity. Right now there are 120+ institutions world-wide who are in the process of or have gone live with Alma.

Automated workflows allow staff to focus on the exceptions and reduce the steps involved.

Descriptive analytics are built into the system, with plans for predictive analytics to be incorporated in the future.

Future: collaborative collection development tools, like joint licensing and consortial ebook programs; infrastructure for ad-hoc collaboration

“Cloud Computing and Academic Libraries: Promise and Risk”
John Ulmschneider, Dean of Libraries at VCU

When they first looked at Alma, they had two motivations and two concerns. They were not planning or thinking about it until they were approached to join the early adopters. All academic libraries today are seeking to discover and exploit new efficiencies. The growth of cloud-resident systems and data requires academic libraries to reinvigorate their focus on core mission. Cloud-resident systems are creating massive change throughout out institutions. Managing and exploiting pervasive change is a serious challenge. Also, we need to deal with security and durability of data.

Cloud solutions shift resources from supporting infrastructure to supporting innovation.

Efficiencies are not just nice things, they are absolutely necessary for academic libraries. We are obligated to upend long-held practice, if in doing so we gain assets for practice essential to our mission. We must focus recovered assets on the core library mission.

Agility is the new stability.

Libraries must push technology forward in areas that advance their core mission. Infuse technology evolution for libraries with the values needs of libraries. Libraries must invest assets as developers, development partners, and early adopters. Insist on discovery and management tools that are agnostic regarding data sources.

Managing the change process is daunting.. but we’re already well down the road. It’s not entirely new, but it does involve a change in culture to create a pervasive institutional agility for all staff.