Presentation

The proposal of a consensus tool for statistical and genetic evaluation analyses in forest tree breeding sciences is done because of the confluence of two circumstances here, the non existence of consensus on best suited common tool over the long-term and in the other hand, the existence and perspectives given by the consortium of tree breeders of TREEBREEDEX.

The expected advantages of a consensus “well adapted” tool should be:

  • cover specific needs of users (tree breeders)
  • backed by community of users
  • guaranteeing demands on courses and technical support
  • drive evolvability
  • negotiate common licenses, prices, etc

Different solutions can be drawn as using an existing all-in-one ready-to-use commercial ware (and a common license negotiation for the community) or adaptation of existing tools with development and support (with multiple licenses). Anyway the three main cornerstones of the proposal are: Collaborative + Open + Free

Collaborative:

gather needs of all potential users, bottom-top design, users driven evolvability

Open:

leave code open to improvements from future collaborators, best guarantee of flexibility and survival over long-term

Free:

supported by project and made freely available to community

Solutions are multiple, but one could dream on something that could deal with:

  • Data warehousing module: agree upon database structure, labels standards, descriptors, etc
  • Raw data: input from field data and from previous databases
  • User interface module: web based, locally based
  • Graphical and analytical outputs
  • Data analysis modules: basic stats, spatial and time series, genetic evaluation, BLUP, GxE, etc

Success stories from this “collaborative + open + free” perspective to which present proposal could join:

Language and environment for statistical computing and graphics

GNU project similar to S language

Wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques

Highly extensible

Open Source

True computer language, additional functionality by defining new functions

Computationally-intensive tasks in FORTRAN or C++ code can be linked and called at run time (BLUP)‏

Extended (easily) via packages, eight packages supplied with R and hundreds more available through R official sites

Large and extremely active community

Genetic evaluation package could then be proposed within R.

Leopoldo Sanchez (INRA)

Friday, June 26, 2009

Bonjour,

At the risk of being redundant with what Eduardo has already posted, I added the following text which corresponds to what can be found at TREEBREEDEX website.
The aim is to raise reactions and comments

Please take some time to read this proposal thoroughly; there are important messages at the end.

Proposal:

The following is a summary of a development project which was presented at the TREEBREEDEX training Session « Data analysis, management and storage » in February 24-25, 2009, in Hann. Münden (Germany). It concerns the development of a consensus tool for statistical analyses in forest tree breeding sciences.

Context:

• Statistical analyses are a key step in the study of breeding resources.
• Breeding resources are becoming increasingly complex, adding heterogeneous sources of data.
• Analytical tools and protocols remain often a choice subjected to external constraints for the breeder, and a great source of heterogeneity in the process of obtaining results among breeders.
• There is no consensus on best suited common statistical tools over the long-term for genetic evaluations.
• TREEBREEDEX represents a unique opportunity as “launching pad” for development of common tools and protocols for tree breeding.

What is in the proposal?

The objective is to make available a ready-to-use analytical tool specifically designed for and by forest tree breeders. This tool should:
• take into account all needs and protocols of potential users, following a bottom-top design, where final users i.e. breeders and geneticists drive its evolvability;
• be open to continuous improvements from future collaborators and users, as the best guarantee of flexibility and survival over the long-term;
• be financially supported by a future research project and be made freely available to the forest tree community;
• ideally comprise data warehousing modules, user interface modules, data analyses modules and graphical modules.

For this, several options need to be evaluated in before hand. The tool can be the result of
(1) improvements on already existing packages;
(2) be developed anew.

Regarding (1), several alternatives are possible, which would need a negotiation phase with their respective developers. Considering the option (2), one of the most promising options appears to be a full featured package within the R platform (extremely successful GNU project of a language and environment for statistical computing and graphics).

How to proceed?

This is a very ambitious project, which will certainly need external expertise from developers and a steering committee serving as bridge between developers and the community of users of TREEBREEDEX. Before the development of the tool itself, there are several preliminary steps which must be carried out by the steering committee. These steps are:
• to gather users needs within the breeding community of TREEBREEDEX;
• to evaluate all possible options and make recommendations on best alternative;
• to draft a development project describing the required developments, the expertise needed to fulfil such developments and the required calendar;
• to seek potential calls of proposals among funding agencies, or within projects under construction;
• to identify potential developers within the community, or draft a developers’ profile for future position announcements;
• to monitor and guide the development of the tool;
• to design and launch first formation courses.

Your contribution…

As stated, this is a collaborative project. The project won’t be considered further on without a substantial positive support of TREEBREEDEX members. Your support can be:
• as detractor, providing clear reasons why we should not follow this way;
• as potential user, eager to participate in the identification of needs;
• as member of the steering committee (see next paragraph);
• as developer, if you master programming and statistics;

Who could be member of the steering committee? Any breeder, scientist or geneticist having some involvement in genetic evaluations and statistical analyses, but essentially with high levels of enthusiasm and perseverance (if backed this is expected to be a long process).

VERY IMPORTANT: Please, identify your role, discuss with your colleagues at your home institution and get back to us as soon as possible with your support and/or contributions. If successful, it will be of importance for your institution to be in.

Eduardo Notivol has created a blog that will serve as open discussion forum (http://tbxdataanalysisplatform.blogspot.com/).

For further information or contacts

Leopoldo Sanchez
Luc Pâques









Friday, February 27, 2009

TOPICS

Just for breaking the ice I want to start with some likely needs that should be addressed by this platform....(incomplete)

PLATFORM?

But, What does it mean “platform” in this context?
As usual the question is better than the answer and we are using the term platform because the thing is not defined yet.
A complete software package with different modules for the different needs to be solved?
A set or sets of scripts libraries integrated on an existing statistical software (i.e. R)?
In both and in any other possibility some features are very desirables: Not working as a “black box”, friendly GUI (graphic user interface), open source, free and compatible with standard data formats.