Reproducible

Research Assurance Standard (RAS) v0.1.0

Status of this Document

This document specifies the Research Assurance Standard (RAS) v0.1.0 for scientific research and requests discussion and suggestions for improvements.

Introduction

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

This standard defines the requirements for creating and validating reproducibility reports for scientific research papers. It establishes both minimum requirements (MUST) that represent current best practices and aspirational goals (SHOULD) that represent exceptionally high standards for reproducibility assessment.

1. Paper Identification and Reference

1.1 Basic Identification

  1. 1.1.1. A reproducibility report MUST include the full title of the paper.
  2. 1.1.2. A reproducibility report MUST include the complete list of authors.
  3. 1.1.3. A reproducibility report MUST include the year of publication.
  4. 1.1.4. A reproducibility report MUST include the Digital Object Identifier (DOI), if available.

1.2 Citation Information

  1. 1.2.1. A reproducibility report SHOULD include a complete citation of the paper following a standard citation format (e.g., APA, IEEE, Chicago, etc.).

1.3 Paper Verification

  1. 1.3.1. A reproducibility report MUST include the SHA256 hash of the document file used for the reproduction attempt.
  2. 1.3.2. A reproducibility report MUST include the source of the document (e.g., publisher website, preprint server, author correspondence).
  3. 1.3.3. A reproducibility report MUST include a verification chain linking the document to an authoritative source.
  4. 1.3.4. A reproducibility report SHOULD include a locality-sensitive hash that is robust to minor formatting differences, digital rights management (DRM) modifications, or other non-substantive variations.
  5. 1.3.5. A reproducibility report SHOULD include multiple verification methods to establish document authenticity.
  6. 1.3.6. A reproducibility report SHOULD include a timestamp of when the document was obtained.

2. Content Extraction

  1. 2.2.1. A reproducibility report MUST document the method used to extract content from the paper (e.g., GROBID, pdfminer, manual transcription).
  2. 2.2.2. A reproducibility report MUST include an assessment of the extraction quality and any limitations encountered.
  3. 2.2.3. A reproducibility report SHOULD include a structured representation of extracted tables, figures, and equations with their original identifiers.

3. Key Results Identification

  1. 3.1.1. A reproducibility report MUST identify and extract the key results claimed in the paper.
  2. 3.1.2. A reproducibility report MUST explicitly link each key result to a direct quote from the paper text.
  3. 3.1.3. A reproducibility report MUST provide page numbers or other locators for each key result.

4. Dependency Identification and Bill of Materials

4.1 Software Dependencies

  1. 4.1.1. A reproducibility report MUST identify all software dependencies explicitly mentioned in the paper.
  2. 4.1.2. A reproducibility report MUST document the specific versions of software dependencies used in the reproduction attempt.
  3. 4.1.3. A reproducibility report MUST include a complete software bill of materials (SBOM) listing all dependencies, including transitive dependencies.
  4. 4.1.4. A reproducibility report SHOULD identify software dependencies that were implicit (not explicitly mentioned in the paper but required for reproduction).
  5. 4.1.5. A reproducibility report SHOULD include dependency compatibility information, including known conflicts or version constraints.
  6. 4.1.6. A reproducibility report SHOULD provide mechanisms to resolve dependency conflicts through automated means.

4.2 Data Dependencies

  1. 4.2.1. A reproducibility report MUST identify all datasets explicitly mentioned in the paper.
  2. 4.2.2. A reproducibility report MUST identify all datasets implicit to the paper's results but not explicitly mentioned in the paper.
  3. 4.2.3. A reproducibility report MUST provide persistent identifiers (e.g., DOIs) or locations for all datasets.
  4. 4.2.4. A reproducibility report MUST document the specific versions of datasets used to the extent that these are known, or provide a statement that this information is not known.
  5. 4.2.5. A reproducibility report SHOULD include data provenance information, including preprocessing steps applied to raw data.
  6. 4.2.6. A reproducibility report SHOULD include data integrity verification (e.g., checksums).
  7. 4.2.7. A reproducibility report SHOULD include dataset schema information, including variable types and constraints.

5. Data and Code Acquisition

5.1 Author Identification and Communication

  1. 5.1.1. A reproducibility report MUST identify the corresponding author(s) as specified in the paper.
  2. 5.1.2. A reproducibility report MUST document all contact information for corresponding authors provided in the paper.
  3. 5.1.3. A reproducibility report SHOULD include updated contact information for corresponding authors if the information in the paper is outdated.
  4. 5.1.4. A reproducibility report MUST document any communication with the authors of the original paper.
  5. 5.1.5. A reproducibility report MUST include author responses to queries, if received.

5.2 Data Acquisition

  1. 5.2.1. A reproducibility report MUST document the method used to acquire each dataset.
  2. 5.2.2. A reproducibility report MUST report success or failure in acquiring each dataset.
  3. 5.2.3. A reproducibility report MUST document any permissions, licenses, or access controls encountered when acquiring datasets.
  4. 5.2.4. A reproducibility report MUST search for datasets in the following locations, in order, stopping when the dataset is found:
    1. Locations explicitly cited in the paper
    2. Supplementary materials associated with the paper
    3. Institutional or disciplinary repositories mentioned in the paper
    4. General-purpose repositories (e.g., Zenodo, Figshare, Dryad, OSF)
    5. Author's personal or professional websites
    6. Direct correspondence with authors
  5. 5.2.5. A reproducibility report MUST include a log of all data acquisition attempts, including timestamps and HTTP status codes where applicable.

5.3 Code Acquisition

  1. 5.3.1. A reproducibility report MUST document the method used to acquire the code associated with the paper.
  2. 5.3.2. A reproducibility report MUST report success or failure in acquiring the code.
  3. 5.3.3. A reproducibility report MUST document the repository, version control system, or other source from which code was obtained.
  4. 5.3.4. A reproducibility report MUST include commit hashes, tags, or other precise identifiers for the acquired code.
  5. 5.3.5. A reproducibility report MUST include a log of all code acquisition attempts.
  6. 5.3.6. A reproducibility report MUST search for code in the following locations, in order, stopping when the code is found:
    1. Locations explicitly cited in the paper
    2. Supplementary materials associated with the paper
    3. Popular code repositories including but not limited to:
      1. GitHub
      2. GitLab
      3. Bitbucket
      4. SourceForge
      5. CodeOcean
      6. Zenodo
      7. OSF
      8. language-specific repositories (e.g., CRAN, PyPI, NPM)
    4. Author's personal or professional websites
    5. Direct correspondence with corresponding authors
  7. 5.3.7. A reproducibility report SHOULD attempt to contact authors through the following escalation process if code is not otherwise available:
    1. Email to corresponding author's address provided in the paper
    2. If no response within 2 weeks, attempt contact through alternative email addresses that can be found
    3. If no response within 2 additional weeks, contact co-authors
    4. If no response, contact the institution of the corresponding author
    5. As a last resort, employ a process server to deliver a formal request

6. Scientific Workflow and Computational Environment Specifications

6.1 Workflow Specification

  1. 6.1.1. A reproducibility report SHOULD include a formal specification of the scientific workflow described in the paper.
  2. 6.1.2. A reproducibility report MUST document each step in the computational workflow.
  3. 6.1.3. A reproducibility report MUST map each workflow step to the corresponding description in the paper.
  4. 6.1.4. A reproducibility report SHOULD express the workflow using a standardized workflow specification language (e.g., Nextflow, Snakemake, CWL).
  5. 6.1.5. A reproducibility report SHOULD include a visual representation of the workflow.

6.2 Computational Environment Specification

  1. 6.2.1. A reproducibility report MUST include a specification of the computational environment required to execute the workflow.
  2. 6.2.2. A reproducibility report MUST document the operating system type and version used in the reproduction attempt.
  3. 6.2.3. A reproducibility report MUST document hardware requirements or limitations encountered during reproduction.
  4. 6.2.4. A reproducibility report SHOULD include a container definition (e.g., Dockerfile, Singularity definition) that encapsulates the computational environment.
  5. 6.2.5. A reproducibility report SHOULD include environment variables and configuration settings.
  6. 6.2.6. A reproducibility report SHOULD include a machine-readable specification of the computational environment (e.g., YAML, JSON).

7. Workspace Construction and Analysis Execution

7.1 Workspace Construction

  1. 7.1.1. A reproducibility report MUST document the directory structure created for the reproduction attempt.
  2. 7.1.2. A reproducibility report MUST describe how acquired data and code were organized within the workspace.
  3. 7.1.3. A reproducibility report MUST include a manifest of all files in the workspace, their sources, and their purpose.
  4. 7.1.4. A reproducibility report SHOULD include an automated script or configuration that constructs the workspace from source materials.
  5. 7.1.5. A reproducibility report SHOULD document any file permissions or access controls implemented in the workspace.
  6. 7.1.6. A reproducibility report SHOULD include resource allocation settings for the workspace (e.g., memory limits, CPU allocations).

7.2 Analysis Execution

  1. 7.2.1. A reproducibility report MUST document the exact commands or procedures used to execute each step of the analysis.
  2. 7.2.2. A reproducibility report MUST include a detailed execution log showing the output of each analysis step.
  3. 7.2.3. A reproducibility report MUST report the execution time for each major component of the analysis.
  4. 7.2.4. A reproducibility report SHOULD include a complete trace of all operations performed during analysis execution.
  5. 7.2.5. A reproducibility report SHOULD document resource utilization during execution (e.g., memory usage, CPU usage, disk I/O).
  6. 7.2.6. A reproducibility report SHOULD include mechanisms for automated execution of the complete analysis pipeline.

8. Debugging and Refinement

8.1 Error Handling

  1. 8.1.1. A reproducibility report MUST document all errors encountered during the reproduction attempt.
  2. 8.1.2. A reproducibility report MUST categorize errors by type (e.g., syntax errors, runtime errors, logical errors).
  3. 8.1.3. A reproducibility report MUST describe resolution attempts for each error.
  4. 8.1.4. A reproducibility report SHOULD provide a decision tree or flowchart for error diagnosis and resolution.
  5. 8.1.5. A reproducibility report SHOULD document patterns of errors that could inform future reproduction attempts.
  6. 8.1.6. A reproducibility report SHOULD include a knowledge base of common errors and their solutions.

8.2 Code and Workflow Modifications

  1. 8.2.1. A reproducibility report MUST document any modifications made to the original code or workflow to enable reproducibility.
  2. 8.2.2. A reproducibility report MUST provide justification for each modification.
  3. 8.2.3. A reproducibility report MUST include before and after versions of modified components.
  4. 8.2.4. A reproducibility report SHOULD include a patch file or diff for each modification.
  5. 8.2.5. A reproducibility report SHOULD maintain a version-controlled history of all modifications.
  6. 8.2.6. A reproducibility report SHOULD assess the impact of each modification on the results.

9. Results Comparison

9.1 Output Verification

  1. 9.1.1. A reproducibility report MUST compare the results obtained through reproduction with the key results extracted from the original paper.
  2. 9.1.2. A reproducibility report MUST document the methodology used for comparison.
  3. 9.1.3. A reproducibility report MUST provide a quantitative assessment of the degree of consistency between original and reproduced results.
  4. 9.1.4. A reproducibility report SHOULD employ statistical tests appropriate to the domain to assess the significance of any differences between original and reproduced results.
  5. 9.1.5. A reproducibility report SHOULD account for sources of non-determinism in the computational process when assessing result consistency.
  6. 9.1.6. A reproducibility report SHOULD provide visualizations that highlight the comparison between original and reproduced results.

9.2 Robustness Assessment

  1. 9.2.1. A reproducibility report MUST assess the sensitivity of results to changes in computational environment.
  2. 9.2.2. A reproducibility report MUST document any tests of robustness to parameter variations.
  3. 9.2.3. A reproducibility report MUST provide a qualitative assessment of the overall robustness of the results.
  4. 9.2.4. A reproducibility report MUST document any continuous testing procedures implemented for the reproduction.
  5. 9.2.5. A reproducibility report MUST include results from the most recent execution of the reproduction workflow.
  6. 9.2.6. A reproducibility report SHOULD include tests of the robustness of results to variations in input data.
  7. 9.2.7. A reproducibility report SHOULD assess the robustness of results to different implementation choices (e.g., different libraries or algorithms).
  8. 9.2.8. A reproducibility report SHOULD quantify the range of variation in results that still supports the paper's claims.
  9. 9.2.9. A reproducibility report SHOULD specify triggers for re-running the reproduction (e.g., changes to dependencies).
  10. 9.2.10. A reproducibility report SHOULD document changes in reproducibility over time.

10. Report Synthesis

10.1 Basic Report Structure

  1. 10.1.1. A reproducibility report MUST include an executive summary that clearly states whether the reproduction attempt was successful.
  2. 10.1.2. A reproducibility report MUST include a comprehensive description of all steps taken during the reproduction attempt.
  3. 10.1.3. A reproducibility report MUST include a section summarizing key findings and lessons learned.
  4. 10.1.4. A reproducibility report SHOULD be structured to allow both high-level understanding and detailed inspection of the reproduction process.
  5. 10.1.5. A reproducibility report SHOULD include a section on recommendations for improving the reproducibility of the paper.
  6. 10.1.6. A reproducibility report SHOULD include sections that map directly to the required elements of this standard.

10.2 Documentation and Metadata

  1. 10.2.1. A reproducibility report MUST include metadata about the report itself (e.g., authors, date, version).
  2. 10.2.2. A reproducibility report MUST include a license specifying terms of use for the report and associated materials.
  3. 10.2.3. A reproducibility report MUST include version information for the report itself.
  4. 10.2.4. A reproducibility report MUST document any updates or revisions to the report.
  5. 10.2.5. A reproducibility report MUST maintain a history of changes to the report and associated materials.
  6. 10.2.6. A reproducibility report SHOULD include a persistent identifier (e.g., DOI) for the report.
  7. 10.2.7. A reproducibility report SHOULD include machine-readable metadata following a standardized schema.
  8. 10.2.8. A reproducibility report SHOULD include a change log documenting any updates to the report.

10.3 Supplementary Materials

  1. 10.3.1. A reproducibility report MUST include or link to all code, data, and environmental specifications used in the reproduction attempt.
  2. 10.3.2. A reproducibility report MUST provide a manifest of all supplementary materials.
  3. 10.3.3. A reproducibility report MUST specify the license or usage terms for each supplementary material.
  4. 10.3.4. A reproducibility report SHOULD package supplementary materials in a self-contained, portable format.
  5. 10.3.5. A reproducibility report SHOULD include persistent identifiers (e.g., DOIs) for all supplementary materials.
  6. 10.3.6. A reproducibility report SHOULD provide a mechanism for validating the integrity of supplementary materials.

11. Reproducibility Assessment

11.1 Reproducibility Verdict

  1. 11.1.1. A reproducibility report MUST include a clear verdict on the overall reproducibility of the paper's key results.
  2. 11.1.2. A reproducibility report MUST use a standardized classification scheme for the reproducibility verdict (e.g., "fully reproducible," "partially reproducible," "not reproducible").
  3. 11.1.3. A reproducibility report MUST provide a justification for the assigned reproducibility verdict.
  4. 11.1.4. A reproducibility report SHOULD use a numerical score or grading system to quantify the degree of reproducibility.
  5. 11.1.5. A reproducibility report SHOULD assess reproducibility separately for different components or claims in the paper.
  6. 11.1.6. A reproducibility report SHOULD provide a confidence level for the reproducibility verdict.

11.2 Factors Affecting Reproducibility

  1. 11.2.1. A reproducibility report MUST identify the key factors that facilitated or hindered reproducibility.
  2. 11.2.2. A reproducibility report MUST assess the quality of documentation provided in the original paper.
  3. 11.2.3. A reproducibility report MUST evaluate the availability and quality of research artifacts (code, data, etc.).
  4. 11.2.4. A reproducibility report SHOULD quantify the effort required to reproduce the results (e.g., person-hours, computational resources).
  5. 11.2.5. A reproducibility report SHOULD identify best practices demonstrated in the paper that other researchers could adopt.
  6. 11.2.6. A reproducibility report SHOULD suggest specific improvements that would enhance the reproducibility of the paper.

12. Security, Privacy, and Ethical Considerations

12.1 Security Considerations

  1. 12.1.1. A reproducibility report MUST document any security vulnerabilities discovered in the reviewed code.
  2. 12.1.2. A reproducibility report MUST document any security measures implemented during the reproduction process.
  3. 12.1.3. A reproducibility report MUST identify any code execution that required privileged access or posed potential risks.
  4. 12.1.4. A reproducibility report SHOULD implement appropriate security controls for the execution environment.
  5. 12.1.5. A reproducibility report SHOULD include a security assessment of dependencies used in the reproduction attempt.
  6. 12.1.6. A reproducibility report SHOULD provide recommendations for secure execution of the workflow.

12.2 Privacy Considerations

  1. 12.2.1. A reproducibility report MUST document how sensitive data was handled during the reproduction process.
  2. 12.2.2. A reproducibility report MUST identify any privacy risks associated with the data or code.
  3. 12.2.3. A reproducibility report MUST document any de-identification or anonymization procedures applied.
  4. 12.2.4. A reproducibility report SHOULD assess the effectiveness of privacy protection measures.
  5. 12.2.5. A reproducibility report SHOULD document compliance with relevant privacy regulations and institutional policies.
  6. 12.2.6. A reproducibility report SHOULD provide recommendations for enhancing privacy protection in future reproduction attempts.

12.3 Ethical Considerations

  1. 12.3.1. A reproducibility report MUST document any ethical issues identified during the reproduction process.
  2. 12.3.2. A reproducibility report MUST document compliance with relevant ethical guidelines and Institutional Review Board (IRB) requirements.
  3. 12.3.3. A reproducibility report MUST note any potential misuses of the research methods or findings.
  4. 12.3.4. A reproducibility report SHOULD include an assessment of the societal implications of the research.
  5. 12.3.5. A reproducibility report SHOULD document any conflicts of interest relevant to the reproduction attempt.
  6. 12.3.6. A reproducibility report SHOULD provide recommendations for addressing ethical concerns in future research.

13. Long-term Preservation

13.1 Preservation Strategy

  1. 13.1.1. A reproducibility report MUST document the strategy for long-term preservation of the report and associated materials.
  2. 13.1.2. A reproducibility report MUST use file formats suitable for long-term preservation.
  3. 13.1.3. A reproducibility report MUST include sufficient metadata to enable discovery and reuse.
  4. 13.1.4. A reproducibility report SHOULD be deposited in a trusted digital repository that provides persistent access.
  5. 13.1.5. A reproducibility report SHOULD include a preservation plan that addresses technological obsolescence.
  6. 13.1.6. A reproducibility report SHOULD employ digital signatures or other mechanisms to ensure long-term integrity.

13.2 Reproducibility Package

  1. 13.2.1. A reproducibility report MUST include or link to a comprehensive reproducibility package containing all necessary materials.
  2. 13.2.2. A reproducibility report MUST document the contents and structure of the reproducibility package.
  3. 13.2.3. A reproducibility report MUST include instructions for accessing and using the reproducibility package.
  4. 13.2.4. A reproducibility report SHOULD package materials in a self-contained, portable format (e.g., container, virtual machine).
  5. 13.2.5. A reproducibility report SHOULD include mechanisms for validating the integrity of the reproducibility package.
  6. 13.2.6. A reproducibility report SHOULD provide multiple access methods for the reproducibility package.