Data Documentation & Organization

Getting the basics down

You will be asked to provide us (upon data submission) with three important pieces of information. These are all required for submission to this repository, but they're also best practice for other data repositories as well.

Descriptive Metadata about your data

When you go to submit your data, you know it (hopefully) better than anyone else does. You'll be asked to fill out a simple form with a few fields as follows (an asterisk indicates a required field):

  • Last Name, First Name (when you log into the form, this will already be filled out for you)*
  • If there are additional authors, list them by adding a new field
  • Title of your dataset*
  • Sub-title
  • Subjects/Keywords
  • Date data available (YYYYMMDD, this will generally be the day the dataset is submitted)*
  • Date(s) data collected (YYYYMMDD - YYYYMMDD for a range or YYYYMMDD for a single date in time
  • Description (this is a short description of the dataset)*
  • Geolocation (if you have a single lat/long coordinate enter it, if you have box coordinates we can take those as well, or a simple place name and we'll attempt to geolocate it)

If you miss a required field in the submission form, it will not let you submit your dataset.

How & Why to write a Readme file

We require a Readme file (in pdf format) for every dataset submission. This is needed to give your data context beyond what is captured as descriptive metadata.

Take into consideration the need for an external party to know the following to re-create your data on their own:

  • Who? Who contributed to the project (authors, research assistants, etc.)?
  • What? What kind(s) of data and analysis were used? What equipment and/or software? Did you use a specific version of software?
  • When? When was the data collected? When was analysis performed? Any other pertinent dates?
  • Where? Does the project involve a particular geographic area?
  • Why? What is the impetus for the project? What questions are you trying to answer?

Imagine that you have to leave the project as is for a couple months or even a year and then come back to it. What are the most important aspects of the project you'd need help remembering? Some examples:

  • file handling (how are they named, how are they divided)
  • processing steps (how to get from point A to B)
  • field abbreviation/name glossary (now what does ABC3130 stand for again?)

What else would you add to the list?

Here is a Readme template for you to model your own after. When you are finished with your Readme file, save it as a pdf and upload it using our submission form.

*You will not be able to submit your dataset without a readme file.

*Be aware that if you provide us with a mostly (or completely) empty Readme file, you will be asked to complete one with sufficient information before we publish your dataset.

Naming, organizing, and zipping your files

While there is no hard and fast "right" way to name and organize/format your files, there are certainly things you should avoid such as:

  • Spaces, special characters, colons, and hyphens in your file/directory names
  • Avoid using proprietary file formats such as .doc/.docx; use .txt if possible. Same with structured data files...avoid using .xls or .xlsx and instead use a .csv or a tab demilited .txt file

File names should be:

  • unique
  • consistent
  • informative when they are quickly scanned
  • easily sortable

It is best to use an organization that will help the files fall into a useful order or so that you can easily sort them when they are saved.

One goal for file naming is to give enough information so that either the creator or a new user can figure out where the information in the file fits into the project.

Elements that may be included in your file names are date, project name, type of data, location, and version.

When you are done organizing your files and directories for your dataset(s), no matter how many files and directories you have, we will need it to be zipped into one single file for ingest in the UW Data Repository. ARCC can help you with that step if you need it. Also, if you're uploading a a dataset, there is currently a 50GB file size upload limit if uploading via the submission form (but there are no limitations on file size otherwise). If you need help submitting the dataset, please contact us and we'll work it out with you!