Frequently Asked Questions

Open Science Chain is an NSF funded project that is building a cyberinfrastructure solution using distributed ledger technologies (consortium blockchain) to enable a broad set of researchers to efficiently verify and validate the authenticity of scientific datasets and share metadata including detailed provenance information in a secure manner.

For more information see the about page.

Back to Top

Open Science Chain is funded by the National Science Foundation (award number 1840218) and is free to use for the academic and research communities.

Back to Top

Please use the following reference for citations:

Sivagnanam, S., Nandigam, V. and Lin, K., 2019, July. Introducing the Open Science Chain: Protecting Integrity and Provenance of Research Data. In Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning) (p. 18). ACM.

Back to Top

Blockchain is a type of distributed ledger technology that offers a secure cryptographically protected record of transactions (blocks). Blockchain's "append only" structure prevents altering or deleting previously entered data. 

Back to Top

A public blockchain is truly decentralized, permissionless, open to anyone and secured by cryptoeconomics (e.g. Bitcoin).

Private blockchains are centralized to a single organization with restricted access to members of that organization. 

Consortium blockchains are semi-private in the sense that consensus in the network is controlled by a limited set of nodes, all participants have known identities and transactions do not require cryptoeconomics which improves transaction performance. 
 

Back to Top

A checksum is a string (a sequence of numbers and letters) that is generated by running an algorithm on a data file. Checksums generated using a cryptographic hash function (e.g. SHA256) are primarily used to verify the authenticity of a data file. 

Back to Top

The data itself is not stored in the blockchain. Only verification information about the data (SHA256 hash) is stored as a manifest in the blockchain along with the metadata. The actual data remains off-chain.

Storing large amounts of data in the blockchain is inefficient especially since some scientific datasets tend to be in the multi-terabytes size range. Storing only the comprehensive metadata of a dataset enables researchers to share large datasets or sensitive data that are stored off-chain, yet verifiable with the information stored in the blockchain.

Back to Top