NASIG 2013: Knowledge and Dignity in the Era of Big Data

CC BY 2.0 2013-06-10
“Big Data” by JD Hancock

Speaker: Siva Vaidhyanathan

Don’t try to write a book about fast moving subjects.

He was trying to capture the nature of our relationship to Google. It provides us with a services that are easy to use, fairly dependable, and well designed. However, that level of success can breed hubris. He was interested in how this drives the company to its audacious goals.

It strikes him that what Google claims to be doing is what librarians have been doing for hundreds of years already. He found himself turning to the core practices of librarians as a guideline for assessing Google.

Why is Google interested in so much stuff? What is the payoff to organizing the world’s information and making it accessible?

Big data is not a phrase that they use much, but the notion is there. More and faster equals better. Google is in the prediction/advertising business. The Google books project is their attempt to reverse engineer the sentence. Knowing how sentences work, they can simulate how to interpret and create sentences, which would be a simulation of artificial intelligence.

The NSA’s deals that give them a backdoor to our data services creates data insecurity, because if they can get in, so can the bad guys. Google keeps data about us (and has to turn it over when asked) because it benefits their business model, unlike libraries who don’t keep patron records in order to protect their privacy.

Big data means more than a lot of data. It means that we have so many instruments to gather data, cheap/ubiquitous cameras and microphones, GPS devices that we carry with us, credit card records, and more. All of these ways of creating feed into huge servers that can store the data with powerful algorithms that can analyze it. Despite all of this, there is no policy surrounding this, nor conversations about best ways to manage this in light of the impact on personal privacy. There is no incentive to curb big data activities.

Scientists are generally trained to understand that correlation is not causation. We seem to be happy enough to draw pictures with correlation and move on to the next one. With big data, it is far too easy to stop at correlation. This is a potentially dangerous way of understanding human phenomenon. We are autonomous people.

The panopticon was supposed to keep prisoners from misbehaving because they assumed they were always being watched. Foucault described the modern state in the 1970s as the panopticon. However, at this point, it doesn’t quite match. We have a cryptopticon, because we aren’t allowed to know when we are being watched. It wants us to be on our worst behavior. How can we inject transparency and objectivism into this cryptopticon?

Those who can manipulate the system will, but those who don’t know how or that it is happening will be negatively impacted. If bad credit can get you on the no-fly list, what else may be happening to people who make poor choices in one aspect of their lives that they don’t know will impact other aspects? There is no longer anonymity in our stupidity. Everything we do, or nearly so, is online. Mistakes of teenagers will have an impact on their adult lives in ways we’ve never experienced before. Our inability to forget renders us incapable of looking at things in context.

Mo Data, Mo Problems

ER&L 2013: E-Resources, E-Realities

“Tools” by Josep Ma. Rosell

Speakers: Jennifer Bazeley (Miami University) & Nancy Beals (Wayne State University)

Despite all the research on what we need/want, but no one is building commercial products that meet all our needs and addresses the impediments of cost and dwindling staff.

Beals says that the ERM is not used for workflow, so they needed other tools, with a priority on project management and Excel proficiency. They use an internal listserv, UKSG Transfer, Trello (project management software), and a blog, to keep track of changes in eresources.

Other tools for professional productivity and collaboration: iPads with Remember the Milk or Evernote, Google spreadsheets (project portfolio management organization-wide), and LibGuides.

Bazeley stepped into the role of organizing eresources information in 2009, with no existing tool or hub, which gave her room to experiment. For documentation, they use PBWiki (good for version tracking, particularly to correct errors) with an embedded departmental Google calender. For communication, they use LibGuides for internal documents, and you can embed RSS, Google Docs, Yahoo Pipes aggregating RSS feeds, Google forms for eresource access issues, links to Google spreadsheets with usage data, etc.. For login information, they use KeePass Password Safe. Rather than claiming in the ILS, they’ve moved to using the claim checker tool from the subscription agent.

Tools covered:

  • Google Calendar
  • Google Docs (includes forms & spreadsheets)
  • PBWiki
  • LibGuides
  • Yahoo Pipes
  • WordPress
  • KeePass Password Safe
  • PDF Creator
  • EBSCOnet

Others listed:

  • Blogger (blog software)
  • Mendeley (ref manager)
  • Vimeo (videos)
  • Jing (screenshot/screencast)
  • GIMP (image editor)
  • MediaWiki (Wiki software)
  • LastPass (password manager)
  • OpenOffice (software suite)
  • PDF Creator (PDF manipulation)
  • Slideshare (presentation manager)
  • Filezilla (ftp software)
  • Zoho Creator (database software)
  • Dropbox (cloud storage)
  • Github (software management)
  • Subscription agent software (SwetsWise, EBSCOnet)
  • Microsoft Excel / Access
  • Course Management Software (Moodle, Sakai, Blackboard)
  • Open Source ERMS: ERMes (University of Wisconsin-La Crosse) & CORAL (University of Notre Dame)

food blogging & making things so labor intensive I don’t do them

derby pie
derby pie

I started a food blog on Tumblr last January. Here’s the about statement:

I started this project because after a year of taking photos of myself every day, I wanted to document something else. Over the summer and fall, I had developed a routine of trying new recipes on the weekends and some weeknights. This blog is where I share photos of the results, talk about what went right or wrong, and link to the recipes.

And sometime in May/June, I stopped. I got busy. I remembered to take some pictures, but they sat on my desktop waiting to be blogged for so long that I felt guilty and overwhelmed, so I eventually deleted them.

It wasn’t like it would take all that much time to write up something. And add a link. And format it the same as the previous posts. But it seemed like a big deal at the time.

Also, I stopped cooking/baking as much in the summer.

I have this tendency to make things that should be simple and routine into complex, detailed processes that become burdensome. Is this just some freak aspect of my desire for control and order, or is it simply human nature?

my presentation for Internet Librarian 2012

Apologies for the delay. It took longer than I expected to have the file and a stable internet connection at the same time. You’ll find the notes on the SlideShare page.

IL 2010: Adding Value to Your Community

speaker: Patricia Martin

[I took notes on paper because my netbook power cord was in my checked bag that SFO briefly lost on the way here. This is an edited transfer to electronic.]

She told a story about how a tree in her yard sprang up and quickly produced fruit, due in part to the fertilization that came from some bats living in her garage. The point being is that libraries are sitting on hidden assets (i.e. bat shit), but we haven’t packaged it in a way our community will recognize and value it (i.e. bat guano fertilizer).

She thinks that the current conditions indicate we are on the cusp of a renaissance generation that will lead to an explosion of creativity. Every advanced civilization gets to a point where there is so much progress made that traditions become less relevant and are shed. We need to keep libraries, or at least their role in society/education, relevant or they will be lost.

Martin says that the indicators of a renaissance are death (recession), a facilitating medium (internet), and an age of enlightenment (aided by the internet). We are seeing massive creativity online, from blog content larger than the volumes in the Library of Congress to Facebook to the increase in epublications over their print counterparts.

Capitalism relies on conformity, but conformity won’t give us the creativity we need. Brands/companies who are succeeding are those who provide a sense of belonging/community for their users, who empower creativity among them, and who manage the human interface.

The old ways have the brand at the center, but the new way is to have the user at the center. This sounds easy, until you have to live it. When the user is at the center, they want to build a community/tribe together, which creates sticker brands.

Jonathan Harris wants us to move forward towards creating a vibrant culture online that’s not about celebrity tweets. He is studying the things that people yearn for and creating a human interface to explore it. It is projected that 80% of data generated will come from social networks – how will we make sense of it all? Why would the RenGen (renaissance generation) still use libraries if the traditional book is our brand? We need a new story about the future where libraries are present, in whatever form they become.

A president of a cloud computing company is quoted by Martin as saying that in the future, screens will be everywhere. The return on transaction (faster) will replace the return on investment. He saw the cloud storage demand grow 500 times in 2009, and expects that rate will only continue into the future as we generate more and more data.

Story is the new killer app – the ultimate human interface. The new story of the future will be built around preconition.

Libraries can create value by leaving the desk and going into the community to provide neutral information to meet the needs of the community. We add value by putting users at the center, letting them collaborate on the rules, and curating the human interface.

WordCamp Richmond: Blogging for Business

moderator: Kate Hall
panelists: Dr. Arnold Kim, John Petersik, and Jason Guard

All started blogging because they had a passion for the topic, and were subsequently surprised by the popularity of their blogs. Both Kim & Petersik now blog fulltime, but Guard doesn’t expect to make a significant income from his blog. Kim noted that there are many other blogs like his now, so what sets his apart is the community that has developed around it.

Many bloggers have commented that since they started tweeting, their blog writing has decreased. Hall is disappointed in herself by this, but also enjoys the interactivity with readers. Kim notes that if your job is to be a blogger, then anything else that takes time away from your blog should be approached with caution; however, it can be a great tool for building a personal brand. For Petersik, it’s just another forum for connecting with their audience, much like Facebook.

How do you deal with the public sucker punches? People have opinions and sometimes they can be expressed strongly. It helps to have a comments policy to keep the conversation civil and not distracted by trolls. Guard tries to be provocative and push buttons, so he expects the sucker punches. Generally he lets the trolls fly their troll flags. Hall commented that some people are out there just to be haters.

WordCamp Richmond: Exploiting Your Niche – Making Money with Affiliate Marketing

presenter: Robert Sterling

Affiliate marketing is a practice of rewarding an affiliate for directing customers to the brand/seller that then results in a sale.

“If you’re good at something, never do it for free.” If you have a blog that’s interesting and people are coming to you, you’re doing something wrong if you’re not making money off of it.

Shawn Casey came up with a list of hot niches for affiliate marketing, but that’s not how you find what will work for you. Successful niches tend to be what you already have a passion for and where it intersects with affiliate markets. Enthusiasm provokes a positive response. Enthusiasm sells. People who are phoning it in don’t come across the same and won’t develop a loyal following.

Direct traffic, don’t distract from it. Minimize the number of IAB format ads – people don’t see them anymore. Maximize your message in the hot spots – remember the Google heat map. Use forceful anchor text like “click here” to direct users to the affiliate merchant’s site. Clicks on images should move the user towards a sale.

Every third or fourth blog post should be revenue-generating. If you do it with every post, people will assume it’s a splog. Instapundit is a good example of how to do a link post that directs users to relevant content from affiliate merchants. Affiliate datafeeds can be pulled in using several WP plugins. If your IAB format ads aren’t performing from day one, they never will.

Plugins (premium): PopShops works with a number of vendors. phpBay/phpZon works with eBay and Amazon, respectively. They’re not big revenue sources, but okay for side money.

Use magazine themes that let you prioritize revenue-generating content. Always have a left-sidebar and search box, because people are more comfortable with that navigation.

Plugins (free): W3 Total Cache (complicated, buggy, but results in fast sites, which Google loves), Regenerate Thumbnails, Ad-minister, WordPress Mobile, and others mentioned in previous sessions. Note: if you change themes, make sure you go back and check old posts. You want them to look good for the people who find them via search engines.

Forum marketing can be effective. Be a genuine participant, make yourself useful, and link back to your site only occasionally. Make sure you optimize your profile and use the FeedBurner headline animator.

Mashups are where you can find underserved niches (i.e. garden tools used as interior decorations). Use Google’s keyword tools to see if there is a demand and who may be your competition. Check for potential affiliates on several networks (ClickBank, ShareASale, Pepperjam, Commission Junction, and other niche-appropriate networks). Look for low conversion rates, and if the commission rate is less than 20%, don’t bother.

Pay for performance (PPP) advertising is likely to replace traditional retail sales. Don’t get comfortable – it’s easy for people to copy what works well for you, and likewise you can steal from your competition.

Questions:

What’s a good percentage to shoot for? 50% is great, but not many do that. Above 25% is a good payout. Unless the payout is higher, avoid the high conversion rate affiliate programs. Look for steady affiliate marketing campaigns from companies that look like they’re going to be sticking around.

What about Google or Technorati ads? The payouts have gone down. People don’t see them, and they (Google) aren’t transparent enough.

How do you do this not anonymously and maintain integrity in the eyes of your readers? One way to do it is a comparison post. Look at two comparable products, list their features against each other.

NASIG 2010: Publishing 2.0: How the Internet Changes Publications in Society

Presenter: Kent Anderson, JBJS, Inc

Medicine 0.1: in dealing with the influenza outbreak of 1837, a physician administered leeches to the chest, James’s powder, and mucilaginous drinks, and it worked (much like take two aspirin and call in the morning). All of this was written up in a medical journal as a way to share information with peers. Journals have been the primary source of communicating scholarship, but what the journal is has become more abstract with the addition of non-text content and metadata. Add in indexes and other portals to access the information, and readers have changed the way they access and share information in journals. “Non-linear” access of information is increasing exponentially.

Even as technology made publishing easier and more widespread, it was still producers delivering content to consumers. But, with the advent of Web 2.0 tools, consumers now have tools that in many cases are more nimble and accessible than the communication tools that producers are using.

Web 1.0 was a destination. Documents simply moved to a new home, and “going online” was a process separate from anything else you did. However, as broadband access increases, the web becomes more pervasive and less a destination. The web becomes a platform that brings people, not documents, online to share information, consume information, and use it like any other tool.

Heterarchy: a system of organization replete with overlap, multiplicity, mixed ascendandacy and/or divergent but coextistent patterns of relation

Apomediation: mediation by agents not interposed between users and resources, who stand by to guide a consumer to high quality information without a role in the acquisition of the resources (i.e. Amazon product reviewers)

NEJM uses terms by users to add related searches to article search results. They also bump popular articles from searches up in the results as more people click on them. These tools improved their search results and reputation, all by using the people power of experts. In addition, they created a series of “results in” publications that highlight the popular articles.

It took a little over a year to get to a million Twitter authors, and about 600 years to get to the same number of book authors. And, these are literate, savvy users. Twitter & Facebook count for 1.45 million views of the New York Times (and this is a number from several years ago) — imagine what it can do for your scholarly publication. Oh, and NYT has a social media editor now.

Blogs are growing four times as fast as traditional media. The top ten media sites include blogs and the traditional media sources use blogs now as well. Blogs can be diverse or narrow, their coverage varies (and does not have to be immediate), they are verifiably accurate, and they are interactive. Blogs level that media playing field, in part by watching the watchdogs. Blogs tend to investigate more than the mainstream media.

It took AOL five times as long to get to twenty million users than it did for the iPhone. Consumers are increasingly adding “toys” to their collection of ways to get to digital/online content. When the NEJM went on the Kindle, more than just physicians subscribed. Getting content into easy to access places and on the “toys” that consumers use will increase your reach.

Print digests are struggling because they teeter on the brink of the daily divide. Why wait for the news to get stale, collected, and delivered a week/month/quarter/year later? People are transforming. Our audiences don’t think of information as analogue, delayed, isolated, tethered, etc. It has to evolve to something digital, immediate, integrated, and mobile.

From the Q&A session:

The article container will be here for a long time. Academics use the HTML version of the article, but the PDF (static) version is their security blanket and archival copy.

Where does the library as source of funds when the focus is more on the end users? Publishers are looking for other sources of income as library budgets are decreasing (i.e. Kindle, product differentiation, etc.). They are looking to other purchasing centers at institutions.

How do publishers establish the cost of these 2.0 products? It’s essentially what the market will bear, with some adjustments. Sustainability is a grim perspective. Flourishing is much more positive, and not necessarily any less realistic. Equity is not a concept that comes into pricing.

The people who bring the tremendous flow of information under control (i.e. offer filters) will be successful. One of our tasks is to make filters to help our users manage the flow of information.

ER&L 2010: Adventures at the Article Level

Speaker: Jamene Brooks-Kieffer

Article level, for those familiar with link resolvers, means the best link type to give to users. The article is the object of pursuit, and the library and the user collaborate on identifying it, locating it, and acquiring it.

In 1980, the only good article-level identification was the Medline ID. Users would need to go through a qualified Medline search to track down relevant articles, and the library would need the article level identifier to make a fast request from another library. Today, the user can search Medline on their own; use the OpenURL linking to get to the full text, print, or ILL request; and obtain the article from the source or ILL. Unlike in 1980, the user no longer needs to find the journal first to get to the article. Also, the librarian’s role is more in providing relevant metadata maintenance to give the user the tools to locate the articles themselves.

In thirty years, the library has moved from being a partner with the user in pursuit of the article to being the magician behind the curtain. Our magic is made possible by the technology we know but that our users do not know.

Unique identifiers solve the problem of making sure that you are retrieving the correct article. CrossRef can link to specific instances of items, but not necessarily the one the user has access to. The link resolver will use that DOI to find other instances of the article available to users of the library. Easy user authentication at the point of need is the final key to implementing article-level services.

One of the library’s biggest roles is facilitating access. It’s not as simple as setting up a link resolver – it must be maintained or the system will break down. Also, document delivery service provides an opportunity to generate goodwill between libraries and users. The next step is supporting the users preferred interface, through tools like LibX, Papers, Google Scholar link resolver integration, and mobile devices. The latter is the most difficult because much of the content is coming from outside service providers and the institutional support for developing applications or web interfaces.

We also need to consider how we deliver the articles users need. We need to evolve our acquisitions process. We need to be ready for article-level usage data, so we need to stop thinking about it as a single-institutional data problem. Aggregated data will help spot trends. Perhaps we could look at the ebook pay-as-you-use model for article-level acquisitions as well?

PIRUS & PIRUS 2 are projects to develop COUNTER-compliant article usage data for all article-hosting entities (both traditional publishers and institutional repositories). Projects like MESUR will inform these kinds of ventures.

Libraries need to be working on recommendation services. Amazon and Netflix are not flukes. Demand, adopt, and promote recommendation tools like bX or LibraryThing for Libraries.

Users are going beyond locating and acquiring the article to storing, discussing, and synthesizing the information. The library could facilitate that. We need something that lets the user connect with others, store articles, and review recommendations that the system provides. We have the technology (magic) to make it available right now: data storage, cloud applications, targeted recommendations, social networks, and pay-per-download.

How do we get there? Cover the basics of identify>locate>acquire. Demand tools that offer services beyond that, or sponsor the creation of desired tools and services. We also need to stay informed of relevant standards and recommendations.

Publishers will need to be a part of this conversation as well, of course. They need to develop models that allow us to retain access to purchased articles. If we are buying on the article level, what incentive is there to have a journal in the first place?

For tenure and promotion purposes, we need to start looking more at the impact factor of the article, not so much the journal-level impact. PLOS provides individual article metrics.