Good afternoon everyone. My name is Ragu Balakrishnan. I'm a professor of
Electrical and Computer Engineering I'm also the Michael and Katherine Birck Head
of ECE and it's my distinct pleasure to welcome you to this afternoon's College
of Engineering distinguished lecture by Dr. Robert Kahn we are in a special we
are in for a very special treat this afternoon and what I'd like to do is to
first introduce our Dean of Engineering Dr. Mung Chiang who in turn will tell
you why it is such a special treat for us while dr. Chang is walking up I'd
like to briefly introduce him dr. monk Chang is the John a Edwards and
Dean of the College of Engineering his research received the 2013 Alan T
Waterman award his online courses and textbooks reached over a quarter million
students and he's co-founded several startup companies and a non-profit
consortium please join me in welcoming Dean Chiang.
well good afternoon everyone here in physical presence or in virtual
presence on Facebook which is watching you as well my name is Mung Chiang on
behalf of Purdue college of engineering it is such a special honor to welcome
the distinguished lecture to the grand finale of the inaugural season of the
Purdue engineering distinguished lecture series dr. Robert Cong a living legend
and a national treasure I can spend the next 1 hour going through his bio and
won't be able to finish. I'll be brief Dr. Kahn is widely known as a one of
the fathers of the Internet in particular in 1966 Dr. Kahn moved from
MIT to BB&N to start what was known as the ARPANET that led to the Internet
1972 Dr. Kahn moved from BB&N to DARPA and led the largest effort by United
States government to that point to support computer and networking research
and development in 1974 Dr. Kahn together with Vint Cerf wrote the paper
that gave us tcp/ip the glue the led to the success of Internet and since then
Dr. Kahn has continued to innovate including in the space of digital object
architecture MEMS exchange and many more beyond the internet what he did for the
world and humanity in the internet invention led to
Awards including the I Triple E Medal of Honor the ACM Turing award the Japan
prize the Queen Elizabeth prized in engineering the Makani prize the Draper
prize United States National Medal of Technology I don't have my notes in
front of me I just happen to remember all these awards by heart and many more
that I do not remember by heart just yet let me just conclude this brief
introduction with a one more distinct honor that Dr. Kahn received in 2004 the
United States Presidential Medal of Freedom the highest honor can be
disposed to bestowed onto a civilian of this country so let's welcome the living
legend father of Internet dr. Robert Kahn thank you
okay so it's sufficiently bright up here I can't really see you folks all that
well but I'll just take it for granted that you're not walking out on this
lecture so what I'd like to say is that I've been focused on infrastructure
development for most of the time after taking a leave of absence from MIT
Juarez on the faculty and although I've been involved in network development all
the way as among mentioned I've been involved in many other things along the
way including leading the research programs at DARPA for a for a number of
years when we were the largest supporter of computer science and IT Rd probably
in the world and the problem with working on infrastructure is you really
can't see it and so unless you have a pretty good idea of what it is sometimes
the ideas can kind of roll over you and they sound good but you don't know what
to do with them so in the in the 1970s I remember giving a number of talks about
the Internet when it was more than just an idea we're actually building out part
of it with the research community because this was the era when you were
saying starting at workstations a PC had not yet been invented or developed or
made available but people could get powerful workstations and local area net
and I remember giving a lecture in two groups that were not involved because
the people who were actively involved sort of knew what it was all about and
the reaction I often got at the end of those remarks would say that was a very
interesting lecture I'm hoping I don't get that at the end of this one although
I hope I do find it interesting but they would say at the end of this tell me
again why I would want an IP address so you just understood that that they
didn't really get it at the nuts and bolts level when we start
out to build the Internet in fact when we started out to do computer networking
at all the goal was to get the bits from one computer to another the internet
simply put it in the multiple Network environment or instead of just a
landline net namely ARPANET we had a few other nets that I was involved in
developing a packet radio net which is kind of like the forerunner of today's
cellular systems or there was a satellite net on Intel set forward a
link to the European research community and the goal we had was to just get the
bits from one computer to another with the idea that the users when they got
those bits there they would navigate by fingers on a keyboard and eyes on a
screen and here we are some 45 years later and we are still for the most part
in in the paradigm of navigating for the most part with fingers on keyboards and
eyeballs on screens we try to change that back in the 1980s and iWork
collaborated with Ben surf on this as well when we came up with the idea of
mobile programs that could run through the internet and carry out you know some
of these tasks but that came to the forefront at almost exactly the same
time that the first viruses and worms and Trojan horses were being introduced
into the internet environment and so most of the organizations I thought that
would be most interested in that notion that you could remove yourself from
having to navigate everything kind of educate a program as a factotum let it
loose and it could advise you about what you needed to know about or carry out
your tasks they found that unacceptable because they were uncomfortable with the
idea of somebody else's programs just showing up on their machine I think it's
time will come but that's sort of where we have been until recently now you
might say why wasn't the world wide web the solution and for many people is a
very effective solution I use it quite a bit myself most people do but when you
look at the fundamental issues of information manager
they often involve proprietary information personal information they
often involve security at different levels that have to be invoked and it's
a it's a very difficult kind of a situation when you're especially trying
to find old information so I'll give you an example suppose I mean we were
involved in developing one of the most widely used programming languages today
it's known as Python and along with Java and the C family those are the three
probably most widely used languages but we went to try and clear the rights to
Python when we were developing it at CNRI and the person who was involved was
guido van rossum he was doing the work and we had to go back and find out you
know what what happened because we and he was CWI in the Netherlands working on
a programming language called ABC for children when he got hired there what
would the rules that apply to him what agreements did he happened and we had to
find out all this old information which you couldn't do by just navigating with
fingers on a keyboard and today we are working in a variety of different
contexts with different groups that are selectively using this kind of
architectural motion to deal with information whether it's for managing
supply chain for movies in Hollywood with the cable TV industry or options
trading around the globe or construction information and most importantly I think
first out of the box or the libraries Purdue was one of them way back when
along with the publishers who were making available their information and
so part of the architecture I'm going to describe today is not just a
hypothetical this is an architecture that in places is very widely used but
often four pieces of it not for the whole the whole thing so if you look at
any technical journal from probably ACM or I I Triple E or some of the medical
journals you'll see references to things like digital object identifiers every
article and they've been doing that for probably two decades but
they have been very reluctant to make those articles actually available
because they're afraid that the crown jewels of the publishing industry could
be effective but I think more and more as we get to understand this the
benefits of this will come out and we've been having discussions about where else
it could be used and I think it's a perfectly good way to think about
managing information in organizations whether they be a university or a
business going forward so if somebody were to tell you that the electrical
power infrastructure was available in you were this 100 or more years ago most
people said well what good is it because they didn't see the applications and
they couldn't see the electrical infrastructure and so I took a while to
cause it to be built up whether it was from public safety interests for
electrical lighting outdoors in place of gas lamps or whatever I mean you didn't
have electrical heating in your houses you didn't have lamps until light bulbs
and the electrical infrastructure was fully in place in the homes but you know
people didn't see it initially so they may not have may not have had a good way
of really understanding it so somebody said look I have a billion volt wire we
could put in your house really if the latest thing and infrastructure most
people wouldn't know what to do with the billion volt wire and then probably
would be scared about it because it sounds dangerous or something even if
you can see the transmission lines and the like so I think that the
applications are often important to people to understand and when we talk
about managing information I have to tell you that the attack I'm taking here
is to not solve a particular application any more than the internet or the
ARPANET or any of even lands were intended to solve a specific application
it was an infrastructural capability we were fairly sure people could take
advantage of in the original Internet as you know it would have been a very
different development if every time you wanted to have an
interaction with a remote computer you had to ask well where was it located
what network was it on what protocols did it use what gateway do I connect to
how do I route the traffic we would not have an internet like we do today where
you can just simply easily identify something and have the bits show up in
the right place but when it comes to information it's a very different story
and if you want to manage information over very long periods you need ways to
do that effectively so that's what this is all about and I hope I can explain it
to you in a way that makes you comfortable that it's not about the
technology any more than the internet was about the technology itself so let
me see if I can get you to the next slide okay so one of the issues that we
had early on was that you know people our Congress was passing laws about the
internet and nobody really knew what it was so there was an FN c definition that
defined it as a global information service they've gone back and forth on
is it a telecommunications the utility is it a global information service so
they're regulated differently and what's it about well it was never those
protocols were never about the technology it was all about whatever the
technology was how do you make it possible to make them work together the
computers the networks whenever we're talking about here the net effect is
over the life of those protocols which are still being used today some 45 years
since we first started the work roughly the scaling up of the technology has
grown by something on the order of a factor of 10 million if this goes on we
have an Internet in a decade or two especially as the Internet of Things
grows and those protocols are still in use we
will have a scaling up of effective of the billion or a trillion nothing in the
history of the engineering world that I'm aware of has ever scaled up by that
much if that were the case I mean take a look at airplanes they've gone from what
order of magnitude 100 miles an hour to either 600 or a little bit above
supersonic you're talking about factors of ten to a hundred not a trillion and
that's the reason for that is because this architecture was never about the
technology it was all about whatever it is enable it to work and so if you think
about the digital object architecture it's really in my view a logical
extension of the internet it's based on the same architectural ideas that showed
up namely it's an open architecture defining the five interfaces and
protocols it's independent of the underlying technology you don't have to
ask you know are we using databases or quantum storage systems or what are the
interfaces any more than we worry about tracks and sectors on disks today and
the important thing about internet the internet and and infrastructure in
general is that the most effective infrastructure developments of those
that are conceptually simple in both the understanding of it by users and the
ability to of applications to make make use of it and that's the case here we
this architecture is about as minimal as you can get to manage information which
means there's a lot of room for people to adapt it to their own needs and
requirements it is particularly useful for getting interoperability between
different systems and this is probably the most important comment I can make
it's a non proprietary architecture for many years people have said because we
were involved in developing it it was proprietary to my organization although
the funding for it actually originally came from DARPA and was an outgrowth of
this work on mobile programs that Vint and I had done but I want to tell you
also that we heard the same thing about the internet itself in the mid-1980s so
I was one of the early members of a you board that the National Academy had
set up in Washington called the Computer Science and Technology Board I think it
was called at the time and they were looking for things to work on in these
nice proposed at the time why don't they think about the impact of
the Internet which we were referring to as the national information
infrastructure as as as its impacts on society's will will evolve so take a
look at that and so you can get a handle on that and the answer was no we can't
work on that because that's seeing our eyes
that's our my organization that I still run that's our proprietary technology
and I said no it isn't this was developed with federal government
support it's a public thing it's in the public interest two years later the
federal government actually gave the the Academy some money to look into that
very same problem and they decided oh I guess it wasn't proprietary after all so
you have to distinguish between an architecture which sort of lays out how
things can work and the actual implementation of it now somebody might
have an electoral property in an architecture like if you're a building
designer but this is one for which there is no intellectual property in the
architecture nobody is claiming it certainly not us nobody else that I'm
aware of that really is in a position to but the implementations of pieces of it
could be proprietary so if a company built a tcp/ip implementation that could
be theirs and they may want to charge for it but anybody could then build
those protocols and continue on with it now you know managing digital
information means different things to different people I mean I recently gave
a lecture at World Conference on humanities and they were more interested
in the linguistic side of things so this is an example of some of the things that
came up there that you know we have language in the world because it's used
to create literature and different languages produce literature in
different forms but in the computer world the same thing is true we have
programming languages produce computer programs in those
languages they're not quite English frenching Chinese but they can be Java
Python or C++ and these programs and any other information in digital form can be
structured as digital objects and manage them and so just like the early networks
that we developed were based on the notion of packets which had addresses so
that you could route them through a network but once they got delivered they
became ephemeral you couldn't say I would like to gain access to the packet
that was sent on such-and-such a Network 43 years ago and expect to get it
nobody's keeping track of that there's no reason to but when you're managing
digital information of some import there are many cases where you want to manage
that information actually in perpetuity if it's business information you might
want to keep it for a very long time if it's governmental information some of it
you might really want to keep in perpetuity and if its laws and
regulations as they might apply to various things at various points in the
past you probably want to keep all of that as well
so in this world of the digital object architecture the digital objects are the
lingua franca everything that I talk about is about these objects so let me
say you know what a digital object is in the first order it's it's basically a
sequence of bits or a set of those sequences so this could be a digitized
version of a movie in which case you have an audio part and the video parts
and sequencing craps and subtitles synchronization but it could be a chip
design that's got various aspects of it I mean literally anything that you can
represent in digital form and it has this is important associated with it a
unique persistent identifier and that identifier is part of the object in some
sense but it's also something that can be resolved all on its own so let me see
if I can put this in context for you let's say we're in the world of the
Internet of Things and we have you know a hundred billion things
and I say here's an identifier for the temperature readings from a particular
thermostat maybe in this auditorium and it's one of a hundred billion things you
know the identifier but how do you know that it's this thing in this particular
auditorium you're not going to try all hundred billion so you need a way of
routing the data here you need a way to reduce this identifier to information
about the thing it's identifying we call that state information and so the
ability to resolve these identifiers is really crucial now this whole issue of
how you build a system like that that's meaningful especially if individual
organizations want to create their own identifiers and control their own
information and virtually all the organizations we've talked to want to do
that and so they have the ability to do it locally but now we might have a
hundred thousand or a million of those which ones are those do you then ask for
the state information so I'll tell you a little bit about how this works this
this piece of the architecture is the most well developed it's in widespread
use and has been for more than I would say 25 years Purdue was one of the early
uses of it in one form there are uses of it in another form today
the librarian publishing community has been first out of the box because they
were the ones that saw the need for persistent identifiers in their digital
journals that they produce so if everything was identified with the URL
let's say and you moved it from one machine to another over time sooner or
later those URLs are not going to work anymore they didn't want to have to
change all of those citation indices at the back so that's way this today and
that's the way things actually work now a digital object typically will
incorporate a work that's how people thought about it earlier work being an
incorporeal creation in the world of copyright for which you have to actually
reduce it to a form of particular expression but it could be something in
which party has rights or interests like contract or in which there is value and
at the end of this particular talk there is a set of references if you make these
slides available that references the paper on representing value as digital
objects which i think is one of the first papers that actually talks about
minting a cryptographic string as as it about has it evolved in and you probably
have seen that more recently in the form of all the cryptocurrency stuff and
block chains and I've given a number of talks on block chains which I in my view
are just simply particular way of structuring a digital object so I think
this is the context that applies very broadly and there's a very motivated and
encouraged group of folks that are looking at block chains to something
they're particularly interested in I think this is a more general way to
think about that problem so basically any kind of information is that's in
digital form can be structured and represented as a digital object now if
you think about that in some longer-term form you know if a piece of information
shows up in your machine it'd be very nice to have some context about it what
is it what's its provenance where did it come
from and so having an ability to do that is really important and we've been
playing a role with the research data Alliance trying to help them understand
how to deal with very large research datasets so if somebody were to give you
you know a terabyte of research data you're not going to know what to do with
that unless it's more finely structured and you can go through and see what type
of information is the next 20 bits or the next 500 bits or next megabyte
so these types are important and those types can be represented as identifiers
as well and reduced to important information about what the type itself
means so these identifiers is really are kind of the linchpin just like IP
addresses are the linchpin of today's internet and they in general can be used
to identify anything that you would like them to identify but it's all about
those things represented in digital form so if I say here's an identifier for an
individual I really mean that identifier will
resolve through digital information about the individual that they wanted
you to know like their public key or maybe their contact information for the
day or anything else they chose to make available it could be about a system
that you wanted to verify you're talking to the right one it could be about
content in different forms so all of this is possible but these identifiers
are the lynchpin and the resolution system is really necessary Kosovo you
have this identifier you know our argument has been don't put semantic
information in these identifiers because if you only understand Chinese you're
not going to understand that if it was semantics in English and you need a
resolution system in general so put all the semantic information in the
resolution system or in something that you know will serve as its as its
equivalent like a registry or metadata every object we assume therefore has not
only identified but the record has a public key and there's a public/private
key pair that exists and so you can validate the systems the users and the
content through the PKI interchange where whatever the party is that's
trying to do the validation gives you or a non sense string like call it a nonce
and you encrypt it with your private key and then they can validate it because
they presumably know who you are as an identified user to be sure you have the
right public key now this doesn't solve the problem of vetting the users because
all it's saying is this is the person who has the public private key pair they
the public key corresponds to the private key that that individual had so
in some cases people will verify it off of security cards issued by trustworthy
organizations and the like and the fact that this produces a PKI infrastructure
really enables a lot of very interesting things because people have struggled
struggled with how to create a PKI infrastructure but this architecture
comes with a fully built in conceptually now I mentioned before that
this work came out of some work that we did on mobile programs so we produced a
report it was called the world of no BOTS this is something that I did with
my colleague Ben surf and because that technology was viewed as potentially
dangerous because they didn't know what program is from who would be showing up
in a world of viruses we extracted the mobility part of it out and produce the
equivalent of the digital object architecture which could have mobility
reintroduced at any point in time because a mobile program can be a
digital object but we're assuming right now that we're not dealing with mobility
but things in physical or structural locations within the internet
environment so what does this object architecture do well first of all it
provides a conceptual framework for managing information of all kinds and
most people today don't have a framework for that so I mean I was asked a
question well how does this relate to databases well if you think about a
database we people know what they are today but if you took the information
from that database and put it into another database you're gonna lose all
the context about that information like access control to it provenance perhaps
and then be very nice if the objects themselves had the ability to self
identify themselves so when an object moved from one place to another that
information all went along with the object now whether you use a database to
store it or not is immaterial because that's just the low-level technology in
this architecture you can put it in the cloud you can put it in multiple clouds
you can give anything you like behind the scenes but the whole idea is that
using the identifiers you should be able to then get the object or some part of
the object so the protocols for doing that enable you to deal with the
information that's embedded within these digital objects and that's something
that I think is going to stand us in good stead going forward so we don't
have to move very large files when you only want to know a small data like
color Stahl reading or a blood pressure
reading or something from a much larger record and with that kind of protocol
you have the advantage of getting interoperability dealt with right off
the bat because if the main interface is something that is making these objects
available which I'll get to later call it a repository if the main interface to
that is based on identifiers then every repository regardless of what the
storage mechanism is automatically interoperable just like tcp/ip allows
for interoperability between computers of different sorts this protocol which
is called the digital object interface protocol do IP
sounds like VoIP but it isn't all identifier base automatically allows for
interoperability and will persist over the long term now there are three
components in the architecture one of which I just mentioned it's a repository
because you're not gonna access the digital object if you can't get it from
somewhere so repositories store the objects to
enable their access based on security and identifier if necessary it's public
then you don't care if it's not then you want to be sure it's only being given to
people who have the right cryptographic validations of themselves you may not
remember an identifier if somebody sent it to you by email fine you might click
on it if somebody cited it in a publication you might click on it but
suppose you're looking for the laws in the state of Indiana in okay we'll go
into the future 2015 or 2025 or 2030 5i no 15 is in the past and you're looking
for a particular law on a particular topic then
you need to be able to understand what's the identifier so that you can avoid all
of the searching so that's what registries enable because registries
store metadata about the objects so you ought to be able to search them this
architecture doesn't define what the search strategy is so if somebody comes
up with a really good PhD thesis on how to locate people by photographs or music
by sounds or whatever then you can incorporate that at the front end of the
search part but it's the access to the metadata which would then provide you
back a list of identifiers sort that or things that are closely related to that
so now you've got the identifiers what's the next step presumably you'd go to the
repository to access the information unless you don't need to do that if you
just want to get a public key maybe you don't need to go to a repository the
resolution system is the key intermediary so by going to the
resolution system you say here's the identifier what's the state information
about this object it might say here at ten places on the internet you can go
and you can use normal routing or it might say it's in one particular
location here's how you authenticate it here are terms and conditions for its
use most things on the Internet today you have no clue what you can do with
them when you get them and so sometimes people just do what they think is
reasonable but you have a way of actually stating explicitly what you can
and can't do in this particular form well you can think about this resolution
system in a variety of ways if you put the resolution system in one location
then you've got to have the information for every object in that location and
that's a it might be if you had a hundred billion repositories and every
one of them had a you know a million entries into it and you've got ten to
the 15th and the 18th you've got a huge number of
Records in one location and so when we talk to people about it they wanted to
manage their own and so let's say you had a million organizations that or in a
position to create their own identifiers and manage their own identifier records
it's like the catalog for those records if you will and I as I mentioned Purdue
was one of the early ones that made use of this and they still do but it's
through the public the publishers mechanism right now the do is which of
those million would you ask and so we ended up with a two-level system that
I'll describe a little a little later but it's in widespread use and it's
pretty important there's another effort that's ongoing for which we were working
with both NIST and the National Institutes of Health and that's to
define what it means to be a type and we're doing it through the folks at AI
so if you got the ability to define types through your data what is a type
look like if you don't have a standard way of saying what a type should look
like nobody else is gonna understand it well if you have a separate language
above that to try and describe it so we're trying to come up with a meta
structure of meta description of what a type looks like but not to define any
particular types so in the medical community they'll have their expertise
that knows how to do that in the engineering disciplines and they'll be
different from chemistry to mechanical to electrical they can define their own
type of structures and they may be different ways of doing it so you will
have a way of resolving it and then you can see it in potentially in different
languages if somebody is willing to take the time to do that so within a digital
object every entity every element of that object which could be many elements
is represented as a type value pair and the whole object itself is typed so you
know what type of entity that is but the types and cells are represented as
digital objects and that's how you can understand for element X you can click
on the type and find out what type of elements that that is so conceptually
that's what looks like ends a little sketch
I have just a show I can't see how it's clicking but you should have on the
screen in the upper right hand corner repositories below it the resolution
service to the left of that resource discovery which is really the metadata
records and you have a client in the upper left hand corner so the client
will go try and discover an identifier it'll come back he'll go to the
identifier resolution service to resolve it that'll come back that might then go
to a repository to get the object and that'll come back and like NC so he's
got the day that he wants after some of those interactions so as I mentioned
this work started out with the work that didn't I did in the late 80s on mobile
programs but it got elaborated in the early 90s with DARPA funding in
something called the cstr project in which we've worked with the number of
DARPA designated universities to actually digitize their computer science
technical reports the stuff that was in the grey literature and that was a very
interesting interaction because we had a lot of discussions about identifiers
where one university would say I want to put semantics into the identifiers
because I want people to know it comes from my University and I'll never sell
my publications to another university particularly my PhD thesis or anything
like that and yet later on they realized the value
of that and the publishers knew it right from day one because if you go to a
major publisher they might take a whole bunch of their collections and want to
sell that collection to another publisher for whatever reason where they
merged with somebody and they don't want the train of semantics going along with
it so they want to be able to have an identifier that sort of kind of neutral
with regard to any of the semantics in 1994 fog we set up a group of companies
in the United States with like no it was at least fifteen might have been 70 or
80 all across the board and it was an attempt to get them to understand what
the internet was about we might do the same thing for the digital object stuff
once they feel comfortable having a solution that isn't owned by
any one of them well they understood industry did that they didn't own the
Internet at that point in time but what was it and what are other people
thinking so we had people from semiconductors computer software
applications computers networking router builders newspaper people financial
people and we brought them all together and there was a report that got put out
in 1997 it was called they said something like managing access to
digital information it was an approach that was based on digital objects and
stated operations on those objects now if you think about object-oriented
programming the whole point of object-oriented programming when it
first came out I followed that very closely was to insulate the programmers
from all the details of the internal structures of the program they have to
worry about setting up arrays and pointers and the like and built-in
methods that allowed you to access it but when you're talking about
intellectual property you were other important things that people care about
those organizations really wanted to be able to license those interfaces so
somebody wanted to do something with that material they wanted the ability to
have that as a licensing capability so that's where stated operations came in
where you could actually indicate what kinds of operations are possible with
the object for that particular individual or the public at large and
when this was first presented in when when the world was starting to think
about ideas it actually got the digital ID world award I'll show you a picture
that in in a second so that's what the report looks like I I scanned it in in
landscape form rather than portrait mode so the only shows probably no yeah I
thought there might be a laser here but you can see there's a list of like seven
companies there I think starting with das and it just gets up to the bees but
they're all together about somewhere between 50 and 80 companies that signed
on to it there's a reference to that at the end I commend it to you and that's
what the digital ID world award looks like you know cited the bottom digital
object architecture the balancing innovation with reality now the way you
actually interface with the digital object is through this protocol and the
protocol itself is really pretty simple it's based on you give it an identifier
maybe your own identifier too and then they can validate whether you should be
able to get it and they know exactly what what it is and you can penetrate
into the objects and interface with the information itself none of the other
systems do that I mean historically everything about networking was based on
this technology wires on the Internet machines for IP addresses files in the
case of URLs on the web and you know you don't want to have what happened that
mean did I do that okay so you don't want to have to be asking those kind of
questions in the future imagine somebody coming back and and thinking okay you
want to get a copy of the law in Indiana from 2025 and it was on a machine called
this back then let's say it's 100 years before
well that machine isn't gonna be around doesn't help to tell it what machine it
was on back then you just want to get it right now doesn't help to say what wire
it was connect it was connected to by a machine on the ARPANET ARPANET long time
gone and who knows what what networking strategies will be using them you really
want to be able to identify the information and go along with it
and I think it's really the right way to be thinking about this we we have a an
effort that we took to try and describe this and it
became a standard through ITU but mainly and a descriptive level so it's not a
specification for implementation but we're about to make that available
standalone from the things that now harness it but if you look at how it
works see the red part on that slide is supposed to show sort of the front in
logical processing it takes these identifiers Zin
but all the digital objects are out the back end so you don't care from a user
perspective whether it's on a thumb drive on a disk drive on a raid array
and a cloud service or who knows what in the future and in fact it could be on
multiple cloud services which we also demonstrated so that you know in the
future you can take those objects poured it into any other system you move it
from cloud service to cloud service which I think the clouds will eventually
have to support but they may not want to do it right now because they may feel
like they're losing customers but it's I think the right way to do that and the
minute you have this kind of interface to these repositories and even
registries then you get automatically the kind of interoperability that you
get with the internet when people use tcp/ip protocols so it's kind of like
the logical equivalent of that we have a piece of software we got we put out on
the net because people asked us they had a download repository code they had a
download registry code on put them together and make them work and they
said you know look repositories need registry so we know what's in the
repository it's like a local index and guess what registries need repositories
to store the metadata records so it's sort of the same set of software can't
you bundle them together which we did we put out a piece of software called
Quadra it's on the quarter org site we're about to release a second version
of it with the updated version of the do IP protocol it's no charge on that but
it does base itself on the use of handles so you need to be able to create
handles and manage handles well I believe Purdue can do that itself
whether they do it as do Ives or plain-vanilla handles and virtually
anybody else can because that's not a profit-making operation that we or
anybody else tends to run there's an experimental mode we can try it out and
there's a regular mode where you can just deal with it you know persistently
this reference out of ITU is called x12 55 it came out of a working group on
identity management information so it's couched in terms of discovery of
identity management information but that's like you know something about
email which can be used for anything couched in terms of chemistry needs this
is an email protocol for chemical users when in fact it's the same protocol for
physics users it's a protocol for housewives and whatever whatever it is
that is motivating is this is a very general framework description and it's
all based on the digital object architecture and it was adopted as a
global standard in 2013 now metadata is another one of those terms that people
struggle with if you ask most people you know what is metadata they'll probably
say it's data about data or something like that but in fact I think of it as
assertions namely they can be about okay identity like what's the resource called
provenance who created it and where was it created access what are the access
constraints protocols you can have descriptions of the data various
technical parts of it what stage in the lifecycle and they got issues about
structure and representation those are just examples
that's what metadata is really all about Anna metadata registry we'll keep that
kind of of information saying about one and no
you know all kinds of information where you're looking for keywords of sort or
images or whatever that leads to that now let me just say a few words about I
said a lot about things and there are things let me talk about blocks and
block chains briefly you know block chains sound like they're new but the
notion of a block is not really a new item
anybody who's dealt with like communications knows about block coding
anybody who's dealt with deep space communication knows that if you're gonna
send information and and and wait for an acknowledgement and retransmission you
know it could be like Mars I think round trip is what ten to fifteen minutes
roundtrip and so there's a lot of latency involved
in doing that and so what people tend to do is chain blocks birth trapping codes
things like that have really dealt with that sort of situation so the ability to
link these blocks together is not new and of course in the programming feel
linked lists have been around for as long as I can remember and their various
ways in which you can hook them up one to the other but blocks were not usually
managed separately from the applications but they could be and so the work on the
block chain stuff purports to be new and what's really new about it is sort of
the awareness that people have of the fact that crypto currencies can have
value and that they can exist in that that there are all ways to authenticate
them or evaluate them or or the like they don't require in their view a
centralized Authority although somebody's got to be able to say how the
cryptographic stuff works how do you change it what are the rules and
requirements when you need to you know take new actions regarding the whole
system but it's it's independent of thermal regulatory authorities which
many people find attractive where's many other people were afraid of
that for exactly that reason and I think it it remains to be seen how the regular
in general we'll deal with this as it becomes more and more prevalent around
the world I think it's going to need to be visibility at the level of the
regulator's the communities and they'll probably will mandate that in time but
you don't have to get to a system which causes everything to be replicated and
stored and linked together essentially in perpetuity one of their big problems
is had to you for kobluk chain and I was recently at gave a keynote at a
blockchain summit in Australia last month and they had some of the best
coders from around the world showing up there and I said well what are some of
the problems you're focused on right answers well
techniques for how to fork a blockchain and so I said oh really
so he said to me how do you fork blockchains in the digital object
architecture and I said well we don't have that problem because we never have
to deal with that particular issue because it isn't required and so that
ended up in another long discussion that said oh you're blowing my mind because
this is not what we're trying to do we're trying to do it a very different
way it's a choice that you make how to structure a digital object and how do
you link things together whether you need to do that or not that's again a
choice so I think the idea of using this kind of information and training things
together has really come up in lots of different contexts but I think that just
to link it together I think this information about a block
what you need to know is you need to get its identifier to deal with it it's in
the province of metadata can be self-contained
I think the amount of metadata can in fact be enormous but I want to go into
some more general observations and then take any questions so I'm almost done
here so I think the context for the blockchain technology has been around
for many years and I believe every block is an example of a digital object it
needs to be identified needs to be understood needs to be persistently
accessible it's a particular way as I said I'm
structuring a digital object that comprises many others digital objects
are stored in repositories and those can be replicated and mirrored and their
various ways to cross check if these multiple repository entries are
appropriate and so trust in the system is really something that's inherent to
it so you can have an object that never changes and there's a very simple way to
validate that kind of an object without having to have all of this other
material like you can create an identifier for an immutable object that
simply involves putting an appropriate fingerprint of the object maybe some
length of considerations in the identifier itself so once you get the
object from the identifier you can validate whether it meets the
appropriate checksums without having to know about the party that provided it
either this is all based on trusting the encryption part of the schema if it's a
mutable object obviously it's going to keep changing so you can't put it in the
identifier if the identifier never changes but you can get that information
out of the record from a resolution system so I won't go into the details of
how you could do that but the basic issue here is the trust the resolution
system or do you not trust it and I think this is something that can be
trusted because ultimately it's on the part of the party that created the
information to maintain that information and they could presumably change other
things about that information which they have no reason to want to do so that's
banks they want to keep the proper information it's publishers they don't
want to change the papers that they published if it's laws they want to keep
and you trust the parties that created it to maintain that appropriately so
that's what I tend to talk a lot about at blockchain stuff so let me say a few
things about this two stage resolution thing the way we create identifiers is
body giving an organization that one create them a number so typically a
prefix that's a dotted prefix derived from a credential so you know in in in
the past we would say okay well Purdue can have I don't know 1015 and you
create a thousand 15 / whatever you like so you can have any identifier system
you now use and that identifier system can still be used so could be social
security number is riders licenses license plates that could be
cryptographic strings it could be whatever it could even have semantics if
you wanted although we don't recommend that but that then that system allows an
individual organization to create the local records and so what you need to do
is to get to their local records to find out what's going on and it's under their
control and management this is inherently their very distributed system
and those local services themselves can be mirrored for reliability and security
as desired so there's a picture of showing it conceptually this is really
simple system you go into the system and you get back this handle record and that
you interpret to figure out what the net do next I'm going to step you through
this very quickly just to give you a feeling for it's like trying to describe
a router to somebody you can say simply you know it takes a packet inputs packet
out participate in some routing protocol but conceptually people spend a lot of
time figuring out how to actually implement it I could describe an
operating system to pretty simply but the details can be pretty complex so I'm
not going to go through every step here but there's the a little system there's
a global registry that contains the prefixes so there they are and here are
various services that are available these are basically in today's world run
by different parties around the globe so the global handle registry is run by a
foundation that was set up in Geneva to make this attractive to organizations
and companies around the globe that did not want to rely totally on
u.s. developed capability or managed to the capability so those services
themselves are run in different places and so you can go to any one of them if
you want to resolve and identify in each one of those can be implemented in
different ways with you know they can have some basic services and they can
have replications and every one of them can be implemented differently so here's
along with one two three four five no it's got an in-service I guess because
it's not it here's another one that's got a single server here's one that's
got two they can be super computers workstations whatever but they're all
distributed around around the place and so there's a client to go into global
global will say okay here's the information you need you need to go to
that guy and that guy will get you to there we'll switch it to there and back
comes information you get some kind of an appropriate record and you're done so
internally it's been elaborated down over some 20 odd years it's pretty
interesting the software is all available publicly you can download it
the only thing you need to do to make use of it except on an experimental
basis is get registered in this global registry so that was a long discussion
among people from government private industry academia and that's we've been
running that for 20-some odd years and we set up the dot this foundation in
Geneva and handed over that responsibility for the foundation so
that's not run out of Geneva what it does is it provides coordination some
software another strategic services free development about and the evolution of
the digital object architecture and it works with different groups on its
application and it has as a mission to promote interoperability between
different kinds of information systems so it could be a weather system in the
health system and a transportation system of banking system and insurance
system and so forth making Devon define them any way they like but this provides
a uniform way of interfacing between them this X 12:55 is something that a
lot of them are because it is a standard that is now
adopted globally but at a very high level that standard supports the core do
architecture standards and the foundation kind of manages their
evolution going forward and they provide overall administration of this handle
system which is a particular implementation of the identifier
resolution service described in the architecture so a provision that G HR
services comes from an administration that is distributed with multiple
administrators around the globe so it's like you know you know an organization
like the FAA that's managing your traffic but they're not running the
airplanes and so the equivalent of running airplanes is done by these
administrators in there there are currently about eight of them we hope to
get to twelve very shortly it's got a very distinguished board that's
administering this from around the globe and that what they do is they give
credentials to the administrators and then they issue prefixes based on their
on their credentials just to show you what physically happens okay here are a
collection of global records and they're identical every one of the
administrators keeps a copy of them but these are only the very high-level ones
if you give lower level descriptions they won't show up in the global
registry you have to go ask the local parties but let's say here's CNRI is one
of the administrators and we have so there's some party that they get a
prefix we put it in the JH our records and then we propagated to all of the
other administrators around the globe the security in this system is
particularly interesting here's another client we take that in
propagated another client we do the same so here's another one this is the DOI
Foundation is the organization's set up by the
publishers it with the DOI so they can do the same for the different
registration agents put the information in propagate that here's another one
this is GW DG is dealing with the big data and researcher data in Europe it's
part of its I think originally sponsored by the Max Planck Society and so they
have different organizations they work actually around the globe they'll do the
same thing and so there's another one and so forth Donna itself the foundation
puts in certain information pertaining to security and this whole system
basically has been operating now reliably for almost three years and it
really solves a lot of the problems of building a big distributed database
where what happens in the middle is is all interconnections between the parties
that you know we've thought about using blockchain for that but decided we
didn't need to because this was as effective and you know is much more
efficient it's a lot of fostering of community interests I just mentioned a
few things we work with IOT big data authentication interoperability but the
foundation is right at the center of the coordination but it doesn't do the work
the work is done by other parties and I think you know we're gonna see the
internet really dealing with increased complexity elsewhere this is one attempt
to deal with a fundamental problem of information management
I think trust in the system is important I think we'll eventually see this mobile
program technology show up again but the need to protect these rights and values
and interests coupled with the sheer volume of information is really
something that requires this new paradigm so I think this digital object
architecture is really important can do the job and there are a lot of other
things in progress I'm not going to go through every one of them but we know
that things are going to grow in many different dimensions we're going to have
growth in terms of the number of objects the actual amount of information the
need to rely on it the need to have it persist that's gonna stress almost every
part of the internet as we now know it and so if we don't have a good
architecture for dealing with it I think we're gonna have trouble going forward
but I think this can also benefit every organization that is willing to make the
investment in managing its own information because it will stand in
good stead going forward so I that's the last slide I had on this slide packet if
you take a look at it you will see there are a bunch of articles in the back that
I commend to you that have to do with things that we have done or been
involved with to try and explain some of this technology and I think you'll find
it interesting reading in your own right so I think I will stop there thank you
very much for your attention I hope you found this interesting
and and that nobody will ask me why we needed a ton of hers so you want to take
some questions streaming it we'll also hear the questions so you mentioned
about digital object identifiers which sort of work very well if things are
permanent but my question is can we afford to remember everything as we get
more and more data being generated by more and more devices can be compressed
this data can be afford to forget some of this data well there are two parts of
it one you could be talking about the identifiers but the information itself
that's a policy matter that's not a technology matter you know whatever the
policy is and probably find that technology approach to managing it
within limits of course I I think that the fact that there is so much
information is challenging to some people who want to keep it around
forever and other people they want to forget it if you talk to the lawyers
they'll probably say get rid of that ask for awhile because you never know what
the downside might be for keeping it around I come from a family where we
never threw anything away so I'm inclined to want to keep everything but
that's not because the infrastructure requires it it's just because it might
be an interesting artifact about your life I mean there were some groups that
were trying to develop but you know like long histories of people I wanted to be
able to create you know the life log of people and you know you only wanted it
up the people who people would care about in the future well hoody who are
they how do you know about them when they're two years old so you keep a life
log of everybody and then you can decide which of the one you want to curate and
which not probably every family will have some interest in keeping their own
family archives for as long as they can afford it
I didn't think this was a shy crowd so uh how do you envision the interaction
between duis and the domain name service do is that any interaction at all well
we use domain names all the time I mean because some people today have wanted to
keep all their information on web sites and so they give URLs but during the
handle record so if they move it from place to place they don't have to change
things of course if you're if you're moving it from place to place and not
changing the domain name then you don't need that to change anything in the
records because the domain name does that altogether and to get back to the
question that was just asked a moment before I mean I was the one that put in
place the transition to the domain name system and we did that so people
wouldn't have to remember all the IDS you can remember a simple simpler way of
doing it I don't expect people to remember IDs at all I mean this is a
bigger problem than remembering IP addresses but that's where registries
come in and much of what the normal thing I say the average user of this
kind of capability will want to do is take a particular identifier that's
shown up it's in a journal that they got it's in a paper that they read it's in
you know something that they can they actually have tangibly in hand and want
to go follow it through to get the actual information or whatever it is
about it that they care about so that that would be the typical operation
people who would be more interested in delving deeper will be the research
community and looking for things from the distant past right you're a
you know a builder developer and you're trying to put an ad on to a building
that was built 50 years ago anyone get the plans for the building you want to
know what laws applied then or now and you want to put that all together and
you'd rather not make it a research project if you can avoid it and I think
this technology if it's managed properly will avoid that everybody's gonna have
it I mean how many buildings do you have here on campus where you know people
decide well we need an extension to the building and you need to go back and
find out what was in the original building and the like and what people
are now thinking about we've actually built some systems like this that they
want to know everything that's in the building and when I say everything I
mean not only the steel in the building and the pipes and the walls but the
carpet on the floor is and what paint was used and where did you get the the
HVAC system what about the handles on the doors and I mean everything in the
building and you can easily imagine that can be managed as a digital object thing
as if it's created when the building is created you can find out everything
about the building including the plans of the building and the approvals and
the coats and and all of that that apply so this is broadly applicable and that's
just one example
so do you have connections with the sambara community so the Fedora open
repository where you know it would enable preservation of digital objects
and the other thing is like the DNA computing and quantum computing given
that you want to store all of these digital objects do you have an endeavor
in along the lines of you know having these innovative means of storage so I
mean you could ask about the internet in general for applications of all kind and
and have you figured that out we tried to keep this infrastructure at the
minimum level so that when people have their applications they could build on
top of it so the the short answer is we could other people could for any
particular example you give but because we didn't want to tackle every possible
example including ones we couldn't think about obviously we haven't tried to do
that and you mentioned fedora in particular let me tell you about the
history of how that came about when we built the very first of the repositories
I just described we built it in C back then that was the language we used and
what we wanted to do was fund somebody else to build an equivalent to
demonstrate that the repository access protocol which we were using at the time
would enable another repository done completely differently to interoperate
and so we funded Cornell to build that version in Java and they called that
repository fedora you know in terms so it really came out of our work it is it
was originally compatible we haven't followed it because they took their own
path but there was a very close synergy there when Carla goes and saying they
pay it or they Erin that was the background for that well thank you I
think god this is all the time we have for questions because we have an event
coming up next so the event is at 3 o'clock we're going to have a panel it's
on the internet present in future policy and technology issues it's in walk
building active learning center building room 3
154 so that even starts at 3 o'clock and we have about 10 minutes for transition
so thank you all very much please join me
No comments:
Post a Comment