The Future of Digital Identity and RA21

Federated identity services have been around for close to twenty years. The research and academic space served as a development and proving ground for standards like SAML and technologies like Shibboleth and CAS, and the commercial space ran with those ideas with standards like OAuth and OpenID Connect (OIDC). The success of companies like Google, Facebook, Twitter, and more helped move this whole idea of federated, digital identity deep into the fabric of the Internet. With the positive characteristic of limiting the number of accounts and passwords that a user has to track comes the negative possibilities around potentially invading an individual’s privacy by tracking their actions across a variety of services.

With commercial services such as Google and Facebook, the user is generally the product, and all information is considered ‘fair game’.  When information about the user is the product, then privacy is essentially a lost cause. When the purpose is purely to support authorization to a service or set of materials, however, privacy has a far better chance of being preserved.

In the scholarly research and education community, however, the story is somewhat more positive. For federated identity to work in this environment, there must be a fabric of trust, based on agreed upon policies and procedures between the institution offering identity services, and the organization consuming that information. Service providers may have bilateral contracts with identity provider institutions, but one of the reasons federations exist is that bilateral contracts do not scale. To join a federation, service providers generally must agree to a set of practices that includes things like having publicly available privacy policies and an understanding of appropriate use of user information.

For example, from the US-based InCommon federation’s proposed Baseline Expectations:

Baseline Expectations of Service Providers

1.   Controls are in place to reasonably secure information and maintain user privacy
2.   Information received from IdPs is not shared with third parties without permission and is stored only when necessary for SP’s purpose
3.   Generally-accepted security practices are applied to the SP
4.   Federation metadata is accurate, complete, and includes site technical, admin, and security contacts, MDUI information, and privacy policy URL
5.  Unless governed by an applicable contract, attributes required to obtain service are appropriate and made known publicly

In addition to defined operating practices, the federation community has a variety of other ways to help encourage trust within and across their federations. For example, there are internationally vetted entity categories that may decorate the metadata of both identity providers and service providers. Research & Scholarship (R&S), as one example, is aimed at tagging Service Providers “that are operated for the purpose of supporting research and scholarship interaction, collaboration or management, at least in part”. Identity providers can restrict attribute release to only those service providers that have qualified for R&S. There are several others, and determining what entities are tagged with that information is publicly available information via the Metadata Explorer Tool supported by GÉANT, the consortium of national research and education networks and federations in Europe (though with information for all known federations around the world).

All this points to a healthy and growing trust fabric in R&E identity federations. But there are still significant challenges, and given the scale involved, those problems are hard to tackle. According to the Metadata Explorer Tool, there are currently 4584 identity providers across the 61 known R&E federations around the world, and 10551 service providers. The challenges range from establishing user consent (assuming consent is the correct mechanism to support information sharing) given different legal and cultural requirements, differentiating and supporting the requirements of the research collaboration community from more commercial interests, and for RA21, presenting a sensible list of identity providers—given the possibility of over 4500 identity providers in the world—to the user while still respecting users’ privacy, the service providers subscription contracts, and the identity providers obligations.

The user, whether they are a student, corporate researcher, or faculty member, is at the center of RA21. The ultimate goal of the project is to produce a set of best practices regarding identity discovery so that the user can access content they have rights to access, regardless of their location. The technology that is being tested in the various pilots is being used to provide different testing grounds for the most privacy preserving and looking at different options for the mechanics of offering a simpler, targeted list of possible identity providers that might be relevant to the user.

When it comes to identity discovery, issues around consent are actually very limited. The user should be able to consent to sharing personal information such as their name or email address – those items are useful for personalization, but not fundamentally necessary to the authorization transaction. The user handles the authentication, but it is up to the institution to validate the assertion that the user is affiliated with that institution. The user cannot “own” that aspect of the data – that belongs to the institution.

Publishers, as the primary content providers in this picture, will have to do most of the heavy lifting when it comes to improving discovery. They will need to be able to send users to a discovery service, and be able to handle authorization decisions based on whatever attributes are appropriate to their service rather than just by checking IP address. As many people have noted, IP addresses work extremely well when a user is on campus, but as soon as the user has shifted to a local coffee shop, their home, or an airport, then they have to jump through multiple clicks for authentication OR stop and get a VPN or proxy set up on their system.

Federated identity services in the R&E context allow the user more freedom to access licensed content and services from anywhere in the world. RA21 focuses on improving the identity discovery experience for the user, and looks forward to consent improvement efforts such as Consent-informed Attribute Release project, federation-wide security response efforts such as SIRTFI, and many other efforts around the world that bite off their own piece of this elephant to improve federated identity.

2017 wrap up

As one year ends and another begins, it’s useful to take a moment to consider what has been accomplished to date with RA21, and what we expect in the coming year.
RA21 formally started in mid-2016, as several STM members started to consider the growing number of issues with IP authorization and the limitations of identity federation as the most logical next step for authorizing digital access to material. Before the end of the year, the STM-led effort combined with a very similar effort coming from several of the P-D-R companies and associated publishers. This combined group paved the way for inviting libraries, vendors, and federation operators to work towards a common goal of improving identity discovery – a key initial step to making federated identity viable for all parties.
In 2017, the project brought on Julia Wallace as Project Director and Heather Flanagan as Academic Pilot Coordinator. They joined Jenny Walker, Corporate Pilot Coordinator, as the core staff to help facilitate the communities that work together to evolve RA21. The first task was primarily outreach in order to validate the use cases, mission, and goals of the project—there were sixteen separate events over the course of 2017 designed to engage as many interested parties as possible. Simultaneously, the technologies that would form the foundation of assessing the practicality of any best practice in this space were discussed and teams created to discuss and implement the deployment of the technology platforms. The Corporate pilot focused more on the issues around user experience, exploring SAML for corporates, and granular usage statistics, while the academic pilots (the P3W and WAYF Cloud pilots) focused more on best practice for identity discovery.
By the middle of 2017, the User Experience (UX) work stream—originally coordinated by the corporate pilot—was identified as an effort that needed to span all of the pilots. The corporate pilot had done extremely useful work in testing possibilities with researchers and librarians, and findings from that effort were presented in at the workshop in September. As the corporate pilot closed this phase of their work, the ongoing efforts in the UX space were handed off to the academic pilots to take lead on the next steps – further refining the user experience and testing with more users against the P3W and WAYF Cloud platforms. Corporate pilot participants will continue to be involved in the UX side of things, of course, and will be part of the in-depth review of the outputs of the UX work stream.
At the end of 2017, both academic pilots have open sourced their code base and deployed prototypes. Plans are in place to review the security and privacy implications of both pilots, thanks to the new Security and Privacy work stream. Outreach will continue, and testing of the UX in both pilots is planned for late Q1/early Q2. Early findings are intended to be captured in a set of position papers which will later be rolled up into the final, NISO standardized guidance in this space.
2018 will be an important year for RA21 and we look forward to providing regular updates via this blog. For messages on other news and forthcoming events, please sign up for our news email via the RA21 Contact Form.

RA21 Workshop – September 1, 2017

On 1 September, SURF hosted an RA21 general session and pilot workshop at their offices in Utrecht, NL. Approximately 25 people attended in person, with another 12 attending remotely. When compared to the event in London, this event had a more even distribution of librarians, vendors, publishers, and federation operators attending.

The day opened with a general introduction (slides), which generated some useful discussion on topics such as governance for the WAYF Cloud service, questions about handling guest access, and education on authentication versus authorization. The second session covered the UX work (slides), which has advanced since the July workshop. The corporate pilot has started to collect responses from the UX survey, and those preliminary responses were reviewed and discussed during the presentation.

UX survey results (summary)

The UX discussion then generated further questions around the viability of using an email address for all use cases, thoughts on geolocation, the possibility of short-term wins to change the language (if not the underlying WAYF technology) on existing discovery pages, and more.

The afternoon was a working session that focused on where the academic pilots differed, and how those differences might impact the UX design. The P3W pilot and the WAYF Cloud pilot share a number of characteristics. Given RA21’s ultimate goal of exploring the possibilities for improving identity discovery to create informed best practice, having the pilots share ideas about what is (or is not) working is critical. Where there is already convergence between the pilots, we can start considering what that means for best practice. Where the pilots differ, we can focus on comparing the differences to the RA21 use cases and explore the possibilities in detail.

Whiteboard session – P3W and WAYF Cloud comparison

The people who stayed for the pilot workshop session were extremely pleased with the quality of discussion and progression of ideas. Our next workshop will be on October 19, with a goal of continuing this kind of active engagement and exploration of possibilities in the UX, security, and privacy spaces.

RA21 Workshop – July 7, 2017

Earlier this month, RA21 held a series of events at the lovely JISC office in London. The day started with an introduction to RA21; this session saw nearly 50 people in the room, along with another 50 people viewing the event online. The introduction included a summary of the problems around identity discovery that RA21 is hoping to solve, along with an explanation of how each of the pilots is approaching the final goal of a set of best practices around identity discovery.

RA21 July Workshop attendees
RA21 July Workshop attendees

The introduction resulted in some lively Q&A that dug into a bit more detail on each of the pilots, how they expect to handle GDPR-related concerns, and how the user experience (UX) would help the user. That last question was an excellent lead in to the next part of the day, exploring the work done so far on UX. The corporate pilot has done quite a bit of work around the UX, pulling together a survey that walks individuals through the possible design and explaining the possibilities in each step. That survey will be adapted for the academic use case later this summer.

After the UX session concluded, the meeting shifted from a presentation to generally interested parties to an actual workshop for pilot participants. The two academic pilots, the WAYF Cloud and the Privacy Preserving Persistent WAYF (P3W), split into breakout rooms to answer detailed questions and discuss the expected work packages out of each stream. Topics such as UX development, testing, library education, security, and privacy were covered. Development will happen through the summer, and hands-on testing will begin this fall. See the pilot pages for more details about the goals and objectives for each of the pilots.

While the pilot workshop was not recorded, the introductory session video and slides are available on the RA21 Events page. Another RA21 day, split into a morning of introductory material and an afternoon of workshop sessions, is scheduled for 1 September 2017 in Utrecht, NL, at the SURFnet offices. Check the events page for information on how to register for the September event, plus information on future events!

Federated Identity and Privacy

Over the next few blog posts, we’ll be diving into some of the issues brought up in the RA21 FAQ. This week, let’s talk about the use of federated identity and privacy.

Almost everyone online has been to a site that requires the user to register, and offers the possibility of using one of their social media accounts to handle that registration. Immediately, questions come to mind: if I click on this link, what is my social media provider learning about me and the sites I visit? What information is my social media provider returning to this third party? How can I enjoy the easier user experience of just clicking on this link rather than following a full registration process while still protecting the privacy of my data?

When shifting from a social media service to an academic or business environment, however, these questions change. Contracts between an identity provider and a service provider are often involved that clearly define what user information can be shared, and how it may be used. As the number of Identity Providers increase, however, these kind of bilateral agreements do not scale for either the identity provider nor the service provider. Identity federations are a way to support such agreement at scale, and as members of such federations both service providers and identity providers agree to abide by specific operating procedures (e.g., the InCommon Participant Operating Practices). Those operating procedures may include the ability for all parties to publish their privacy statements directly in the federation metadata feeds, and to be very specific in how they use information sent from the identity provider to the service provider. In fact, identity providers can make policy-based decisions as to whether to allow an authentication transaction to go through based on whether or not there is a public privacy statement available for a service provider.

In the most basic of federation actions, a user goes to a service provider’s website and clicks on a link to authenticate for access. That link first takes them to a discovery service that allows the user to select the correct identity provider. At that point, the transaction shifts entirely to the identity provider which, assuming a successful authentication, only returns information that the authentication was successful. No other data is released in this first, basic flow. By default, the service provider never finds out the user’s name, contact information, role within the institution, and so on, from the identity provider. In a more advanced flow, more information may be shared of relevance to both parties – perhaps the resources being accessed should only be available to students. Or, perhaps access to a particular class of online resources should be restricted to a particular department within an institution. Such access controls can be achieved without access to personal information about the user – only the institutional affiliation is required. Federated authentication and associated authorization decisions are customizable based on automatable criteria. While the decisions on what information to release is largely in the hands of the identity provider, the technology is developing that would allow a user to explicitly consent to releasing additional information (such as their name or email address to support personalization of a service).

The technology and policy exist to both enable and manage the sharing of information about a user; legal restrictions exist as well that further impact what information may be shared and under what circumstances. Many regions in the world have privacy regulations that impact the digital world. From the Global Data Protection Regulation in the European Union, to the Personal Data (Privacy) Ordinance in Hong Kong (Cap. 486 of the Laws of Hong Kong), and the various state and federal consumer protection laws in the United States, many governments consider the privacy of their constituents to something necessary to protect. Even if resource providers want to collect and user data, they have to consider the regulations in all regions in which they operate.

RA21 seeks to improve the federated identity experience through a better identity provider discovery process. Information regarding best practice to support privacy as well as usability are guiding principles for the effort. At the end of the day, the project will have a list of best practices in this space that must support these principles.

Further reading

RA21 Workshop Wrap-up

The RA21 Workshop, a half-day event held in Washington, DC in parallel with the STM Annual US Conference Society Day, was quite a success. Thirty-five people attended, representing publishers, librarians, vendors, and identity federation operators interested in moving the RA21 project forward. Slides from the event are available here (RA21Workshop-April2017-final).

The workshop started with an introduction to the project, and quickly moved to discussing the pilots themselves. Ralph Youngen (American Chemical Society) described the corporate pilot and its explorations regarding an identity federation for the Pharmaceutical Documentation Ring (P-D-R) companies. While there is a significant overlap in use cases between the academic and corporate pilots, the P-D-R companies do not quite fit in an academic model; creating a new federation may make sense for their use case.

The majority of the day was spent discussing the three pilots that focus on the academic use case. Chris Shillum (Elsevier) did a short presentation on the Privacy Preserving Persistent WAYF (P3W) pilot which spurred three breakout groups. The groups in turn discussed potential barriers to adoption, level of effort required, existing work that should be leveraged, and how to move forward. The question of existing work became critical later in the day as the newest pilot, the Client-based WAYF as presented by Leif Johansson (SUNET), proved to neatly align with the P3W pilot expectations. The Client-based WAYF, using services developed, tested, and in production as one of the Swedish Academic Identity Federation (SWAMID), offers users a choice to use a particular federated identity ‘forever’ (or until such time as they want to change their home organization). There are plans to add a telemetry service that would work in the background to determine what Identity Providers (IdPs) will work with specific Service Providers (SPs), thus limiting the list presented to the user to only IdPs known to work with that particular SP.

Elias Balafoutis (Atypon) presented on the Shared WAYF pilot, a solution that would allow publishers to share information on whether an anonymized user has already authenticated, thus limiting the number of authentication requests a user has to answer in order to access material in the scholarly publishing space.  This is an opt-in service for both users and publishers, specifically geared towards providing as seamless a user experience as IP authorization.

By the end of the afternoon, three academic pilots had turned into two. Both pilots now have project management, developer resources, and running code to work from. One area that was identified as a critical project gap, however, was the lack of a clear articulation of the value of the project to academic libraries. The goal of the RA21 project is to develop a set of best practices around the user experience with identity provider discovery when a service uses federated authentication. For these best practices to be supported by all stakeholders in the scholarly communications space, librarians are a key stakeholder group that must be engaged. As a result of these discussions, the RA21 steering group will be evolved into an advisory board that brings all key stakeholder groups (librarians, publishers, vendors, and identity federation operators) to the table.

There is quite a bit to do now on the project: librarians need to be identified to join the advisory group; the pilots need to meet within their project teams and start testing their ideas; and, we need to start documenting the best practices that are coming out of this work. The timeline, subject to change as we work through the pilots, is to have the platforms in testing mode in Q3 2017, to draft the best practices in Q4, and to meet with each stakeholder group in Q4 2017 and through 2018 to get their feedback. We expect to finalize the best practices in 2018. Expect additional workshops later this year and in 2018!

RA21 Workshop – April 25, 2017

The RA21 project is picking up speed! On April 25, there will be a workshop in Washington, DC, focused on the academic pilot efforts. Each pilot will have an hour to discuss the goals and outputs expected from that effort, and to discuss the particular challenges that are the target of that pilot’s efforts. The corporate pilot will also have some time to discuss the related work going on with that group.

The P3W pilot will be exploring the best practices that come from using a modified Account Chooser that stores the users preference of Identity Provider (IdP) locally in their browser; it will also be looking to improve the user experience of IdP discovery. The first time visiting a participating site, users will be presented with a streamlined UI to select their preferred IdP and sign in; thereafter, this preference will be stored in their browser allowing the user to be seamlessly signed into other participating sites on future visits. This pilot is looking for participants to help develop the UX, build the software tools and create the test environment. The pilot is also looking for campuses and libraries that would be willing to switch over to using the pilot in place of existing IP authorization for the participating publisher sites.

The Shared WAYF pilot seeks to validate the use of a cloud service to facilitate communication between publisher platforms for the purpose of discovery of the Identity Provider most appropriate for each user. This pilot is looking for both publishers willing to integrate their platforms to the wayf-cloud service, and publishers with content on the Literatum platform from Atypon. The pilot is also looking for campuses and libraries that would be willing to switch over to using the pilot in place of existing IP authorization for the participating sites and titles.

The Client-based WAYF pilot is exploring a solution that would use an existing service ( to build a database of Service Providers (SPs) Identity Providers (IdPs) and  that are automatically queried to determine if the SP and IdP can work together. When a user starts the authentication process for the first time, they will be presented with a list of IdPs that are known to work with that SP, as well as information on what to do if their IdP is not on the list. This pilot is looking for publishers that are willing to work with the project team on the best way to the existing samlbits discovery service and development resources to determine the best way to populate the metadata registry with hints from the Service Providers regarding what IdP are likely to work in an authorization scenario.

Approximately 30 people are expected to attend; if you are planning to attend and have not yet registered for this free event, please contact the RA21 project coordinators! A QR code is required to get to the meeting space.