Overview
HTTP REST API
The shared records servlet will expose a set of HTTP REST API's designed to closely mirror those provided by Amazon's S3 web services. The goal of modeling our API's closely after those provided by Amazon is to make switching between our own services and calling Amazon S3 directly as straightforward and painless as possible. Additionally, developers that are familiar with using Amazon's S3 API's should have an easy time designing clients to work against the shared records service.
Interoperation Between the Shared Records Client, Server, and S3
The shared records client will have two operation modes: local and remote.
In local mode the client will store records locally on the client's computer (LocalDataStore). Records and other data will be stored locally in that same directory structure as they will be stored on servers (either shared records server or S3), so that synchronization of a client's local data store and a remote server is an easy operation.
In remote mode the client will run against our own shared records server (SharedRecordsRemoteDataStore). The server itself can also store data locally (on the server) or remotely (either against another shared records server or S3). A client running against a shared records server should not need to know the details of where the records are stored, as all calls to manipulate records will go to the shared records server. Again, when the server is storing data locally it will mirror the file and directory structure that will be used in the local clients and in S3, making integration seamless between the various layers.
In the future the client may also be configured to run directly against S3 (S3RemoteDataStore).
The rest of this page describes the various operations, the HTTP REST API's that will be used to perform them in the shared records server, and whether these calls could be made directly to S3, bypassing the shared records server altogether.
Storage Model
All storage of records and metadata will be relative to a shared records root directory (referred to from here on as [Root]). This could be a regular directory on a windows or linux file system, or it could be a bucket in S3.
Records will be stored directly in the [Root]/ directory, as recordUID.data (where recordUID is the unique record ID generated from the hash of the bytes of the record). Additionally, in our own data stores we will have to store some extra information to encapsulate some of the information that S3 allows to be processed in its headers (the word metadata is not used because this is a seperate concept in shared records). This information includes content type, content disposition, and any S3 metadata tags (including the x-amz-meta-record-content-type tag discussed below). See the following two sections for more details.
Record metadata will be stored in individual files for each entry. The location and naming convention of metadata files is: [Root]/recordUID/log/title_integerCounter_rollingChecksum. "RecordUID" is the id the metadata is associated with, "title" is the title of the metadata entry, "integerCounter" is a running count of the total number of entries associated with this record and "rollingChecksum" is a checksum based on the contents of this entry and previous entries. What is stored in the file is only the body of the metadata entry that is passed in. The other attributes (including modification time, sequence number, content type, title, previous metadata entry, and tags) will be maintained seperately. This is done locally by storing them in a seperate file, at location [Root]/recordUID/log/title_integerCounter_rollingChecksum_headers. In S3 it is accomplished by converting the attributes to amazon custom headers and letting S3 handle them.
Within each metadata entry (the headers), there will be a reference to the previous and metadata entry associated with the record, as well as the previous and entry with the same title (assuming that these exist). This allows the entire series of metadata to be reconstructed from the last entry, and allows a revision history of all entries sharing a title to be reconstructed without touching all the record's metadata.
In addition to the individual metadata entry files, there will also be one additional metadata file associated with each record. This file will contain a reference to the first and last metadata entry associated with the record, as well as the most recent entry for each unique metadata title. This file is stored in XML formatting as [Root]/recordUID_log.xml. (Replicating this information in a .json file was removed because it creates concurrency issues.)
Supported Metadata Attributes and Corresponding S3 Headers
The following table represents the supported metadata attribute tags and their corresponding Java, XML, JSON, and S3 representation. (These values are flexible and should be reviewed.)
| Name / Description | XML Tag | JSON Tag | S3 Tag/Header |
| Associated Record UID | RecordUID | sharedRecords.recordUID | x-amz-meta-record-uid |
| Title | Title | title | x-amz-meta-title |
| User ID | UserID | modifier | x-amz-meta-user-id |
| Content Type | ContentType? | contentType | Content-Type |
| Previous Metadata Entry | PreviousEntryFileName? | previousEntry | x-amz-meta-prev-entry |
| Tags | Tags->Tag | tags (JSON Array) | x-amz-meta-tags |
| Timestamp (Server Defined) | TimeStamp? | serverTimeStamp | x-amz-meta-server-time-stamp |
| Timestamp (User Defined) | UserTimeStamp? | modified | x-amz-meta-user-time-stamp |
| Sequence Number | SequenceNumber? | sharedRecords.sequenceNumber | x-amz-meta-seq-num |
| Signature | Signature | signature | x-amz-meta-signature |
Individual Operations
Record Deposit
Current APIs: CreateRecordServlet and CreateAnonymousRecordServlet
The following URL is used for depositing a record to the shared records server. The record will be initially created anonymously.
PUT http://www.sharedrecords.org/recordUID.data
Request Headers (this list is supported by Amazon. We should see which of these we will support. See this page for more details):
- Cache-Control:Can be used to specify caching behavior along the request/reply chain. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.
- Content-Type: A standard MIME type describing the format of the contents. If none is provided, the default is binary/octet-stream. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17.
- For encrypted records the content type should always be binary/octet-stream. For unencrypted records we will store this type and honor it when the record is returned to the client.
- Content-Length: The size of the object, in bytes. This is required. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13.
- Content-MD5: An MD-5 hash of the object's value. This is used to verify integrity of the object in transport. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.15.
- We should switch our implementation to use MD5. Then this value should simply be the record ID that is passed in.
- Content-Disposition: Specifies presentational information for the object. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.5.1.
- Content-Encoding: Specifes what content codings have been applied to the object and thus what decoding mechanisms must be applied in order to obtain the media-type referenced by the Content-Type header field. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11.
- Expires: Gives the date/time after which the response is considered stale. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21.
- x-amz-meta-: x-amz-meta-: Any header starting with this prefix is considered user metadata. It will be stored with the object and returned when you retrieve the object. The total size of the HTTP request, not including the body, must be less than 4 KB.
- The shared records server will use "x-amz-meta-record-content-type" to specify the original content type of an encrypted record. The client will use this information to process the record after it has been unencrypted.
Body:
- Data payload - The raw bytes of the (potentially encrypted) record.
Initially, all record deposits should come through the shared records server. This is so that we can perform server-side validation of the record ID, which will be the hash of the record bytes. If validation does not succeed then the server should return an error and the record should not be stored. (NOTE: I think we can accomplish this in S3 as well if we switch to MD5, and force the client to pass the record ID in the Content-MD5 header. Obviously we can't control custom clients running directly against S3.)
When this call is made to the shared records server, the following steps should be taken:
- Check client authentication (NOTE: how is this step done?)
- Validate record ID against the hash of the data received
- Store the record
- If the data store is local to the server the record will be deposited as RecordUID.data in the top level data-store on the local file system. We will also need to have a local storage scheme to maintain the header information and the x-amz-meta-record-content-type tag.
- If the data store is remote (e.g. S3), the request will be re-sent to the remote store. (NOTE: Can we just use forwarding here?)
Record retrieval
Current API: GetEncryptedRecordBytesServlet
The following URL is used for retreiving a record.
GET http://www.sharedrecords.org/recordUID.data
Request Headers: (this list is supported by Amazon. We should see which of these we will support.)
- If-Modified-Since: Return the object only if it has been modified since the specified time, otherwise return a 304 (not modified).
- If-Unmodified-Since: Return the object only if it has not been modified since the specified time, otherwise return a 412 (precondition failed).
- If-Match: Return the object only if its entity tag (ETag) is the same as the one specified, otherwise return a 412 (precondition failed).
- If-None-Match: Return the object only if its entity tag (ETag) is different from the one specified, otherwise return a 304 (not modified).
- Range: Return only the bytes of the object in the specified range.
Response Headers: (this list is supported by Amazon. We should see which of these we will support.)
- Content-Type: This is set to the same value you specified in the corresponding header when the data was PUT. The default content type is binary/octet-stream.
- Content-Disposition: This is set to the same value you specified in the corresponding header when the data was PUT. If no Content-Disposition was specified at the time of PUT then this header is not returned.
- Content-Range: This indicates the range of bytes returned in the event that you requested a subset of the object by setting the Range request header.
- x-amz-meta-: If you supplied user metadata when you PUT the object, that metadata is returned in one or more response headers prefixed with x-amz-meta- and with the suffix name that you provided on storage. This metadata is simply returned verbatim; it is not interpreted by S3.
- x-amz-missing-meta: This is set to the number of metadata entries not returned in x-amz-meta headers. This can happen if you create metadata using an API like SOAP that supports more flexible metadata than the REST API. For example, using SOAP, you can create metadata whose values are not legal HTTP headers.
- This doesn't seem to apply to shared records but is left here for completeness.
Record retrieval should be the same operation, whether the record is retrieved from the shared records server or from S3. The only difference in the two calls is the server URL and potentially the authentication mechanism. We should be able to process GET operations on a shared records server using an S3 store by simply forwarding the request to S3. In the authentication scheme we should indicate in the forward somehow that the URL to S3 is only valid for a fixed amount of time.
Verify Record
Current API: VerifyRecordExistsServlet
The following URL is used to verify the existence of a record.
HEAD http://www.sharedrecords.org/recordUID.data
The request and response headers to this call are the exact same as the GET record API. The only difference is that the record body is not returned in the response, just the headers. The response will be 404 Not Found if the record doesn't exist.
Meta Data Addition
Current APIs: AddJsonMetaDataEntryServlet and AddMetaDataEntryServlet
The following URL is used to add metadata to a record.
POST http://www.sharedrecords.org/recordUID_log
The request must contain the following paramters:
- max-sequence-number: the maxiumum sequence number our client knew about at the time of the request. This will allow the server to throw an error if the information is out of date.
- format: "binary", "json", or "xml". The format that the entries are being passed in as. If no format is specified we assume it is a single binary entry.
The body of the request should be the meta data entries (an array of metadata entries to add in the specified format, or a single entry's body if binary format is specified).
This call must be made directly to a shared records server (as S3 doesn't allow POST's). When the call arrives at our server the following operations will be performed:
- Verify record existance
- Verify that the passed in max sequence number is the same as the latest meta data entry associated with the record.
- Create the meta data entries as [Root]/recordUID/log/title_integerCounter_rollingChecksum
- The first entry created should reference the metadata entry associated with the max sequence number (the last entry)
- Each additional entry should reference the previously created entry.
- This step should also create the corresponding _headers file if it is a shared records data store, otherwise the information will be converted to x-amz-meta- tags and sent to Amazon, as described below.
- Finally the xml file associated with this record's metadata must be updated to include the entries added.
Notes:
- With an S3 data store this whole operation will be a GET on the xml file (to get the initial sequence number), then a PUT for each entry with appropriate headers, then a GET and a PUT on the xml file.
- If two operations begin at the same (or a close) time, all the entries will be created, but only one of the operations will succeed (get written to the xml file).
- If we aren't using an S3 data store we can perform a similar operation locally.
Meta Data Retrieval
Current APIs: GetJsonMetaDataEntriesServlet and GetMetaDataEntriesServlet
The following URL will be used to get the xml log for a record. This will be the first and last metadata entries, and the most recent metadata entry for each unique title.
GET http://www.sharedrecords.org/recordUID_log.xml
The supported request headers are the same as retrieving a record. This call will just return the xml file that is associated with the metadata of this record. In the case of an S3 data store, we can forward this request directly to Amazon.
The shared records server also supports getting the metadata in JSON format. This is accomplished using the following URL.
GET http://www.sharedrecords.org/recordUID_log.json
In this case, since a json file is not actually kept on the server, the shared records server must parse the xml document and convert the information to json format before returning it.
The following URL should be used to return all of the metadata associated with a record (not just the most recent versions).
GET http://www.sharedrecords.org/?prefix=recordUID/log/
The following parameters are supported (this list is provided by Amazon. We should see which of these we will support.)
- prefix: this should be "recordUID/log/"
- marker: Indicates where to begin listing. The list will only include keys that occur lexicographically after marker. This is convenient for pagination: To get the next page of results use the last key of the current page as the marker.
- max-keys: The maximum number of keys you'd like to see in the response body. The server may return fewer than this many keys, but will not return more.
- delimiter: Causes keys that contain the same string between the prefix and the first occurrence of the delimiter to be rolled up into a single result element in the CommonPrefixes? collection. These rolled-up keys are not returned elsewhere in the response.
Note: It is frustrating that S3 only provides alphabetical pagination.
Note 2: We should look into whether we want to provide any additional API's. Getting all entries with a particular title (i.e. version history) is easy with our current scheme. We can also support "last N", although this requires some server-side logic.
Create Account
Current API: Combination of CreateAccountServlet and SaveAccountPublicKeyServlet
The following URL creates an account on the servlet (by depositying its public key).
PUT http://www.sharedrecords.org/hash_of_account_public_key.key
The supported headers are the same for the PUT of a record. The body of the PUT should be the bytes of the account's public key.
Get Account Public Key
Current API: GetAccountPublicKeyServlet
The following URL gets an account's public key from the servlet.
GET http://www.sharedrecords.org/hash_of_account_public_key.key
Wrapped Key Desposit
Current API: ShareRecordServlet
The following URL deposits a record's wrapped key into a user's account.
PUT http://www.sharedrecords.org/hash_of_account_public_key/recordUID.wrappedkey
The supported headers are the same for the PUT of a record. The body of the PUT should be the bytes of the wrapped key.
Record List for an Account
Current API: GetEncryptedRecordListServlet
The following URL is used to get a list of records that have been shared with an account.
GET https://www.sharedrecords.org/?prefix=hash_of_account_public_key/
The supported headers and parameters are the same as the GET of all metadata associated with a record.
Candidates to Remove
- GetWrappedKeyBytesServlet
- SaveTokenServlet
- GetRecordSignerServlet
- GetSignatureServlet
- DeleteRecordServlet
- DeleteAccountServlet
TODOs
- Can we carry extension in the barcode without breaking anything.
