NASIG 2013: Knowledge and Dignity in the Era of Big Data

CC BY 2.0 2013-06-10
“Big Data” by JD Hancock

Speaker: Siva Vaidhyanathan

Don’t try to write a book about fast moving subjects.

He was trying to capture the nature of our relationship to Google. It provides us with a services that are easy to use, fairly dependable, and well designed. However, that level of success can breed hubris. He was interested in how this drives the company to its audacious goals.

It strikes him that what Google claims to be doing is what librarians have been doing for hundreds of years already. He found himself turning to the core practices of librarians as a guideline for assessing Google.

Why is Google interested in so much stuff? What is the payoff to organizing the world’s information and making it accessible?

Big data is not a phrase that they use much, but the notion is there. More and faster equals better. Google is in the prediction/advertising business. The Google books project is their attempt to reverse engineer the sentence. Knowing how sentences work, they can simulate how to interpret and create sentences, which would be a simulation of artificial intelligence.

The NSA’s deals that give them a backdoor to our data services creates data insecurity, because if they can get in, so can the bad guys. Google keeps data about us (and has to turn it over when asked) because it benefits their business model, unlike libraries who don’t keep patron records in order to protect their privacy.

Big data means more than a lot of data. It means that we have so many instruments to gather data, cheap/ubiquitous cameras and microphones, GPS devices that we carry with us, credit card records, and more. All of these ways of creating feed into huge servers that can store the data with powerful algorithms that can analyze it. Despite all of this, there is no policy surrounding this, nor conversations about best ways to manage this in light of the impact on personal privacy. There is no incentive to curb big data activities.

Scientists are generally trained to understand that correlation is not causation. We seem to be happy enough to draw pictures with correlation and move on to the next one. With big data, it is far too easy to stop at correlation. This is a potentially dangerous way of understanding human phenomenon. We are autonomous people.

The panopticon was supposed to keep prisoners from misbehaving because they assumed they were always being watched. Foucault described the modern state in the 1970s as the panopticon. However, at this point, it doesn’t quite match. We have a cryptopticon, because we aren’t allowed to know when we are being watched. It wants us to be on our worst behavior. How can we inject transparency and objectivism into this cryptopticon?

Those who can manipulate the system will, but those who don’t know how or that it is happening will be negatively impacted. If bad credit can get you on the no-fly list, what else may be happening to people who make poor choices in one aspect of their lives that they don’t know will impact other aspects? There is no longer anonymity in our stupidity. Everything we do, or nearly so, is online. Mistakes of teenagers will have an impact on their adult lives in ways we’ve never experienced before. Our inability to forget renders us incapable of looking at things in context.

Mo Data, Mo Problems

ER&L 2010: Opening Keynote – Librarians in the Wild: Thinking About Security, Privacy, and Digital Information

Speaker: Lance Hayden, Assistant Instructor, School of Information – University of Texas

He spent six years with the CIA, after that he attended the UT iSchool, which was followed by working with Cisco Systems on computer security issues. The team he works with does “ethical hacking” – companies hire them to break into their systems to find the holes that need to be filled so that the real bad guys can’t get in.

Many of us are not scared enough. We do things online that we wouldn’t do in the real world. We should be more aware of our digital surroundings and security.

In computer security, “the wild” refers to things that happen in the real world (as opposed to the lab). In cyberspace, the wild and civilization are not separate – the are co-located. Civilization is confidentiality, integrity, and availability. We think that our online communities are entirely civilized, but we are too trusting.

The point is, if you’re not careful about keeping your virtual houses secure, then you’re leaving yourself open to anyone coming in through the windows or the basement door you never lock.

Large herds attract big predators. As more people are connected to a network or virtual house, the motivation to hit it goes up. Part of why Macs seem more secure than Windows machines is because there is a higher ROI for attacking Windows due to the higher number of users. Hacking has gone from kids leaving graffiti to organized crime exploiting users.

Structures decay quickly. The online houses we have built out of software that lives on real-world machines. There are people every day finding vulnerabilities they can exploit. Sometimes they tell the manufacturers/vendors, sometimes they don’t. We keep adding more things to the infrastructure that increases the possibility of exposing more. The software or systems that we use are not monolithic entities – they are constructed with millions of lines of code. Trying to find the mistake in the line of code is like trying to find a misplaced semicolon in War and Peace. It’s more complex than “XYZ program has a problem.”

Protective spells can backfire. Your protective programs and security systems need to be kept up to date or they can backfire. Make sure that your magic is tight. Online shopping isn’t any less safe, because the vulnerabilities are more about what the vendor has in their system (which can be hacked) than about the connection. Your physical vendor has the same information, often on computer systems that can be hacked.

Knowledge is the best survival trait (or, ignorance can get you eaten). Passwords have been the bane of security professionals since the invention of the computer. When every single person in an institution has a password that is a variation on a template, it’s easy to hack. [side note: The Help Desk manager at MPOW recommends using a personalized template and just increasing the number at the end every time they have the required password change. D’oh!] The nature of passwords is that you can’t pick one that is completely secure. What you’re trying to do is to have secure enough of a password to dissuade most people except the most persistent. Hayden suggests phrases and then replace characters with numbers, and make it longer because it increases the number of possible characters required to hack it.

Zuckerberg says that people don’t care about privacy anymore, so don’t blame Facebook, but to a certain extent, Facebook is responsible for changing those norms. Do companies like Google have any responsibility to protect your information? Hayden’s students think that because Google gives them things for free, they don’t care about the privacy of their information and in fact expect that Google will use it for whatever they want.

IL2009: Collaboration in the Clouds

Presenter: Tom Ipri

How will cloud computing impact the library as a space? Will we be able to provide the infrastructure to support collaborative computing within our buildings or resource networks?

Virtual computing labs allow students to access their software, settings, and files from any computer on campus. However, there are concerns about reliability, privacy, and the security of data. If you are sending your students to services outside of the university, what impacts are there on the policies of the university?

Who needs libraries when everything is in the cloud? The library can become fully both a warehouse and a gathering place.

productivity tools from MS and Google

I’ve been enjoying all the benefits of having Outlook as my primary work email and calendar tool for the past year. After three years of dealing with the disaster that is Novell GroupWise, it’s lovely to finally have a tool that does what I need it to do.

However, for my personal stuff, I am a big fan of Gmail and other Google tools, so I was a little sad to give up my Google Calendar, among other things. All that has changed in the past week now that I’ve discovered the GCal/Outlook syncing program. Right now, I have it only going from Outlook to GCal, but that may change in the future.

Another awesome tool that I’ve implemented this week is the Google Calendar gadget available from Google Labs. This puts my upcoming appointments in the sidebar of Gmail, and pings little reminders if I have Gmail open. Outlook takes care of this at work, and I’m loving having this functionality in my my non-work hours without having to maintain two separate calendars.

The other Google Labs gadget that is made a surprising impact on my productivity and organization is the Remember The Milk task management tool. I’ve been using RTM to keep track of my review assignments, but I hadn’t found a need for it for other to-do things since I use the Tasks feature of Outlook for work stuff. However, when my emails to personal address with things I needed to remember to do began to pile up and clutter my inbox, I decided it was time to implement a real to-do list. With the Gmail integration, it’s now all in one place, just like with my work stuff in Outlook.

gmail coolness

I have recently switched my personal emailing entirely over to my Gmail account. In the past year, I’ve been using it for Where’s George hit notifications, Geocaching.com messages, and new BookCrossing messages and journal entry notifications. I continued to use my SpamCop webmail account for other personal emailing. However, when it came time to renew … Continue reading “gmail coolness”

I have recently switched my personal emailing entirely over to my Gmail account. In the past year, I’ve been using it for Where’s George hit notifications, Geocaching.com messages, and new BookCrossing messages and journal entry notifications. I continued to use my SpamCop webmail account for other personal emailing. However, when it came time to renew my account ($30/yr), I decided that it was time to move on. I’ve found that changing my email address every few years keeps the spam down. Even with the excellent spam filters, I was getting 10-15 spam messages a day sent to my SpamCop account, some of which were not filtered to the Held Mail folder. In the past 15 days that I’ve been using Gmail exclusively for all of my non-work emailing, I’ve been very happy with it. It’s managing threads of conversations much better than any email system I’ve used in the past. And, since it’s a relatively new account, I have gotten maybe fifteen spam messages in the past year. Not bad.

This past year, I received permission to set up a book exchange bookshelf in the group study area in the library. It’s not exactly an OBCZ, but it functions as such. I set up a separate account on BookCrossing and started registering books left there using that account rather than my regular one. I had been using my work email for that account, but I felt a bit uncomfortable about it. Also, I suspected that sometimes private messages and journal entry notifications were not getting through the campus email filters. I thought about setting up a Gmail account for that, but the idea of having to check yet another email account did not appeal to me. Then I realized I could just have everything forwarded from the library BookCrossing email account to my regular Gmail account. Brilliant! In no time I had the second account set up and forwarding messages. Thank you, Google!

nasig part three

The Friday vision session was given by Marshall Keys. He spoke about the chaotic transitions brought on by technology. He said that the “future of libraries depends on their ability to meet the emerging needs of users” and that we need to first understand what those needs are. None of us know what tools we … Continue reading “nasig part three”

The Friday vision session was given by Marshall Keys. He spoke about the chaotic transitions brought on by technology. He said that the “future of libraries depends on their ability to meet the emerging needs of users” and that we need to first understand what those needs are. None of us know what tools we will be using in libraries in the future, but we should keep aware of trends and try to anticipate them.

Keys talked about the “blog mentality” of the younger generation of library users:

  • What I think is important
  • What I think is important to other people
  • Something is important because I think it is important (“Whatever” corrolary: If I don’t think it is important… whatever.)
  • Privacy is unimportant
  • Community is important

The last two aspects of the “blog mentality” are particularly relevant to library technology. Emerging users want community, personalization, and portable technology, and they are willing to have it all at the expense of a loss of privacy. For example, they want to know what their peers are interested in, and they can get that kind of information from places like Amazon, Netflix, and Friendster, but not from the library catalog.

Another point on technology that Keys made about our emerging users is that the phone is their primary information appliance, and as the sales of ringtones indicate, these users are willing to pay for the ability to customize their tools. One not-so-emerging proponent of a phone as a primary information appliance is the Shifted Librarian herself, Jenny Levine, and her treasured Treo. She and Marshall Keys would make for an interesting pair.

Side note: I am writing this in the SeaTac airport while waiting for my shuttle back to Ellensburg. At a nearby table is a ten year old girl and her little sister along with her father. Just now, he was having trouble with something on his cell phone, and she took it and showed him how to do what he wanted to do. I suppose I shouldn’t be surprised that someone so young would know more about how to use the phone than the person who owns it, but I am anyway.

The point that Keys was trying to make was that if emerging users consider their phones to be primary sources of information, then we need to be developing reference tools that acknowledge that reality. There are text message services that answer questions quickly for a nominal fee, and if our users are more inclined to pay for that service rather than come to us through traditional methods, then we need to consider ways to implement similar services. We also need to face the reality that a majority of library functions can be outsourced off-shore, including technical services and reference services. If we aren’t preparing for this eventuality, then it will be even more difficult once it happens.

Keys stated that, “a wealth of information creates a poverty of attention.” If we aren’t prepared to provide accurate information quickly to our users in the formats they prefer, then we will become irrelevant.

gmail atom feed

Cool! I just noticed that I can grab an Atom feed for my Gmail inbox. After doing a quick search on this, it seems that I’m about four months behind on this news. I probably won’t use this, since I hit my feed aggregator once or twice a day. I much prefer the Gmail Notifier … Continue reading “gmail atom feed”

Cool! I just noticed that I can grab an Atom feed for my Gmail inbox. After doing a quick search on this, it seems that I’m about four months behind on this news.

I probably won’t use this, since I hit my feed aggregator once or twice a day. I much prefer the Gmail Notifier sitting my systems tray.

gmail invites

I have more Gmail invitations to give away. I seem to have a never-ending supply of Gmail invites now, and all of my friends who want one have them, so from now on, they will be put in the isnoop.net’s gmail invite spooler.

I have more Gmail invitations to give away. If you would like a Gmail account, send me an email. You can also post a comment on this entry, but comments get closed after a while.

Update 1/25/05: I seem to have a never-ending supply of Gmail invites now, and all of my friends who want one have them, so from now on, they will be put in the isnoop.net’s gmail invite spooler.

openurl, firefox, and google scholar

Peter Brinkley of the University of Alberta Libraries has developed a Firefox extension that adds an OpenURL button to Google Scholar search results.[web4lib] “The purpose is to enable users at an institution that has an OpenURL link-resolver to use that resolver to locate the full text of articles found in Google Scholar, instead of relying … Continue reading “openurl, firefox, and google scholar”

Peter Brinkley of the University of Alberta Libraries has developed a Firefox extension that adds an OpenURL button to Google Scholar search results.[web4lib]

“The purpose is to enable users at an institution that has an OpenURL link-resolver to use that resolver to locate the full text of articles found in Google Scholar, instead of relying on the links to publishers’ websites provided by Google. This is important because it solves the “appropriate copy problem”: the link to a publisher’s site is useless if you don’t have a subscription that lets you into that site, and your library may provide access to the same article in an aggregator’s package or elsewhere.”

From all appearances, this is a fantastic tool that embraces Google while still providing even more of that useful service that librarians do. If you have an OpenURL link resolver that you are able to tweak like SFX, go for it! (Next step, educate your users about Firefox….)

Update: One of the library coding gods, Art Rhyno, has developed a bookmarklet that prepends your library’s proxy server URL string to the links in the Google Scholar results. That’s another work-around if you don’t have an OpenURL link resolver. If it’s something your library gets, then you’ll get passed through authenticated to the full-text content. If not, then you can obtain access or the content some other way.

One snag I seen in all of this is that depending on how your proxy server is set up, this may not work. Some libraries *cough*UofKY*cough* use a proxy server that requires the user to make modifications to their web browser before authenticating them. I’m not sure whether or not this would cause confusion for the users who haven’t done that modification.

speaking of gmail invites…

Do you have a bunch of Gmail invites to give away?

If you have Gmail invites to give away, but all of your friends and interested blog readers already have them, I ran across a website that will help you give them away to those who want a Gmail account. I gave away three of my remaining four, keeping one just in case.

css.php