CICLing Verifiability, reproducibility, and working description policy

Bottom line

In a restaurant you are offered not only a menu but also food. Same here. Your paper is the menu that describes the food you cooked. Code and data are the food.

What we should publish (= make public) in computer science are algorithms (code) and data (resources), not their descriptions (papers). In an ideal world, a paper would be only an attachment to the code/data. Not vice versa as we do now... or do we?

What to submit and why

Starting from 2011, CICLing implements a policy of giving preference to papers with verifiable and reproducible results:

   If the authors claim to have obtained a result, we encourage them to make all the input data necessary to verify and reproduce the result available to the community.

   If the authors claim to advance human knowledge by introducing an algorithm, we encourage them to make the algorithm itself -- and not only its (usually vague and incomplete) description -- available to the public.

   If the authors claim to have compiled a lexical resource, we encourage them to make the resource itself -- and not only its description -- available to the public.

Code: We encourage the authors to submit, together with the paper, a program (open source), as simple as possible, that follows the described algorithm and generates the results presented in the paper. No need in any sophisticated interface or performance improvements -- only easy to understand source code that generates the claimed results. Think of it as a proof of a theorem: the result reported in your paper is a theorem, and the source code generating this result is its proof. Our sole purpose is to exactly reproduce your results and to be sure that it is reproduced with exactly the same method as you describe. Minimalistic approach is the best: just implement your algorithm in a way simple to read and understand, nothing else. Please extensively comment your code. Naturally, input data are to be presented together with the code; when this is impossible, you can provide instructions on where the data can be obtained (e.g.: WordNet, Google pentagrams, etc. need not to be included with your code, but we do need instructions on how we can obtain exactly the version you used; using standard software or corpora is highly preferable over home-made ones when possible).

Resource: We encourage the authors to submit, together with the paper, a program (open source), as simple as possible, that follows the described algorithm and generates the results presented in the paper. No need in any sophisticated interface or performance improvements -- only easy to understand source code that generates the claimed results. Think of it as a proof of a theorem: the result reported in your paper is a theorem, and the source code generating this result is its proof.

We will give a special best verifiability, reproducibility, and working description award to the authors of the software that in the best way fulfills the above goals (that is: the simplest and clearest code that proves the claims of the paper and allows one to exactly reproduce its results).

Submission of such code is not a requirement. For example, the nature of the paper may not require any additional data or code, or your experimental setting does not allow it, even after you have done all reasonable effort to make it possible. However, if the reviewers judge that the paper does require and does allow submission of the code and data to be verifiable and reproducible, then preference will be given to papers accompanied by the code. We do understand that you may not have had time to prepare the code. We will use common sense in applying this policy. If for any reason you cannot submit the code, go ahead and submit your paper normally.

What we ask for is not a demo or tool based on your paper, but a form of proof and working description of the algorithm in addition to the verbal description given in your paper. An approximation of the idea is the code submitted with Church & Umemura's paper to be permanently hosted at CICLing servers, and cited in the paper (see last line). You see, we don't mean anything complicated. You can also show demo programs or tools based on your method, either as part of your talk or at the demo session (and we will be happy to host on our servers such software that complements your paper), but this is not required. In contrast, we do believe that a publicly published scientific paper must be accompanied by a minimal working description of the algorithm, open-source and available to the community. (In fact even the other way round: the code ought to be accompanied by a paper.)

We do not ask for impossible: if you present a large system, especially commercially distributed or a property of your company, then we do not expect you to provide its source code. Our point is that when the software and data can be provided, it should be provided.

We do not yet have specific rules: we hope to elaborate the rules basing on our experience, so please use common sense. See the problems this policy is to address, as well as the list of software reviewing committee and instructions for the reviewers.

CICLing will keep the right, though no obligation, to host your files on its servers. Upon acceptance of your paper, we will give you a permanent link to the hosted data; please indicate this link in the camera-ready version of your paper. Please accompany your code with a suitable license that would allow its free distribution, free study (and reverse-engineering if needed) by the public, and free use of the knowledge obtained from such a study. For the future editions of CICLing we plan to elaborate a special CICLing license for academic code, documentation, and data; if you have any ideas or suggestions on such a document, please let us know, any guidance is highly appreciated.

How to submit code / data

We recommend that you submit your code as a ZIP file attachment (in EasyChair, use the Attachment field on the paper submission page; if you didn't have time, try sending us the ZIP file by email later) containing in its root the following directories (you may choose another structure if it makes more sense):

Please keep things as simple as possible (though not simpler), and the installation and use instructions as clear as possible. Usually this implies detailed and specific instructions, as well as completely automatic script that performs all necessary steps without user intervention. In the text of your paper, make sure that the software reviewers will easily locate the description of the input data and the obtained results; section titles such as Main Algorithm, Experimental Methodology, and Experimental Results could be helpful. If the reviewers fail to quickly and easily install your program, run it, and interpret the results, they will probably give up and lower your score.

The archive should be as self-contained as possible: it is a good idea to include as much as possible -- compilers, interpreters, utilities used, etc. If possible, to include a complete distribution of Perl or WordNet is better than to rely on that we will somehow find the version you used. Remember that science is done for eternity: your results should be reproducible in twenty years, when it may be impossible to find a specific version of a utility or compiler you used. If the resulting file is too large so that the system does not allow uploading it, please submit your paper alone and contact us for the attachment.

No double-blind policy for software: For the time being, we encourage the software to be anonymous but we do not require this. We understand that in some cases it can be impossible or too labor-consuming. If you really cannot make your software anonymous, then leave it as is.

More information: see FAQ.