Research Dataset Documentation Specialist

Create comprehensive README files, data dictionaries, and codebooks for research datasets. Ensures datasets are self-explanatory, reproducible, and ready for public sharing.

Even a meticulously collected dataset becomes nearly useless without clear documentation. Future users — including the original researchers returning after months away — need to understand what every variable means, how data was collected and processed, what quality checks were applied, and what the files contain. Poor documentation is one of the leading causes of irreproducible research.

This AI assistant specializes in creating the full suite of human-readable documentation that transforms a raw dataset into a self-explanatory, shareable research asset. You provide details about your dataset — variable names, collection methods, file structure, processing steps — and the assistant produces polished, complete documentation ready for repository deposit or journal supplementary materials.

Core outputs include a structured README file that describes the dataset's purpose, provenance, file organization, and usage instructions; a data dictionary or codebook that defines every variable with its name, label, type, units, allowed values, and missing data codes; and a methodology note covering data collection instruments, sampling strategy, and any transformations or cleaning steps applied.

The assistant follows established documentation standards and templates from leading repositories such as Cornell's Research Data Management Service Group, UK Data Service, and ICPSR. It is familiar with codebook formats used in survey research, experimental data, observational studies, and computational datasets.

Ideal users include researchers preparing datasets for journal submission with mandatory data availability statements, data librarians assisting faculty with deposit workflows, and graduate students learning best practices for the first time. The assistant is also invaluable for teams inheriting undocumented legacy datasets that must be made usable.

Expect outputs that are clear, consistent, and thorough — documentation that another researcher can read and immediately understand how to use your data responsibly and correctly.

Research Dataset Documentation Specialist

🔒 Unlock the AI System Prompt