This package is in the public domain (GNU General Public License), is 2002 Ben Bolker and Toshinori Okuyama, and comes with NO WARRANTY. Please suggest improvements to me (Ben Bolker) at bolker@zoology.ufl.edu.
If you are feeling impatient and confident, turn to ``Quick Start'' (section 5).
The following installation instructions assume you are using a ``modern'' Microsoft Windows system (tested on 2000 and XP); it is possible to use R, and the turtle package, on other operating systems - please contact the authors for more information. The setup file is about 17M, and R takes up about 40M of disk space. If you are running an antivirus package that is configured to check the signatures of executable files before they run, make sure you turn it off or register the new files installed by R before proceeding. The latest version of R as of this writing is 1.4.1, but the package should work with any version after 1.3.1.
Once you have downloaded and installed R, start the R program. The setup program should have asked whether you want to add a shortcut to the desktop or the Start menu: if you didn't, you will have to search for a file called Rgui.exe, which probably lives somewhere like Program Files\R\rw1041\bin depending on what version of R you are using and where you decided to install it. R will open up a window for you with a command prompt (>), at which you can type R commands. (Don't panic.)
You can exit R by selecting File/Exit from the menus, or by typing q() at the command prompt. In general, if you want help on a particular command (e.g. turtleest) you can type a question mark followed by the command name (e.g. ?turtleest).
You will next need to install the turtle library and two other auxiliary libraries, over the WWW, from within R (you will need to maintain a connection to the internet for this piece, although it is also possible to do this step ``off-line''). Within R, at the command prompt, type the following commands:
bbcontrib <- "http://www.zoology.ufl.edu/bolker/R/windows"
install.packages("turtle",contriburl=bbcontrib)
install.packages("bbmisc",contriburl=bbcontrib)
install.packages("boa",contriburl=bbcontrib)
install.packages("coda")
In each case, answer y to whether you want to delete
the source files; you won't need them again.
The first command specifies the location of the
turtle, bbmisc, and boa libraries (the coda
library comes from the default source for R packages).
The install.packages commands download and
install packages.
The package can read plain text data files that are separated by white space (spaces and/or tabs) or commas. If your data are in Microsoft Excel, you should export them as a comma-separated (CSV) file. If they are in Word, save them as plain text. The expected data format is that each row of data represents a haplotype, each column except the last represents samples from a particular rookery, and the last column is the samples from the mixed population. Each row and column should be named; your life will be simpler if the names do not have spaces or punctuation other than periods in them (a common convention in R is to replace spaces with periods, e.g. North.FL for "North FL"). Do not label the haplotype column; R detects the presence of column names by checking whether the first row has one fewer item than the rest of the rows in the file.
For example, a plain text file (with haplotype labels H1 and H2 and rookery labels R1-R3) could look like this:
R1 R2 R3 mix H1 1 2 3 4 H2 3 4 5 6Or a comma-separated file could look like this:
R1,R2,R3,mix H1,1,2,3,4 H2,3,4,5,6
To read in your data, you first need to make sure that R knows how to find them. The best thing to do is to use the File/Change working directory option under the file menu to move to a directory you will use for analysis, which should contain the data files you want to use and will contain R's working files. Once you have changed to the appropriate directory, you can read in your data files and assign the data to a variable (for example) mydata:
mydata <- read.table("myturtles.dat")
if you
are using space-separated data, or
mydata <- read.csv("myturtles.csv")
if you have comma-separated values.
To make sure that everything came out OK, type
the name of the variable alone at the command prompt: e.g.
mydatato print out the data.
Next, convert your data to a form that the turtle package can use with the as.turtle.data command:
mydata <- as.turtle.data(mydata)You can also do the two previous steps as a single command:
mydata <- as.turtle.data(read.csv("mydata.csv"))
Once your data are converted to turtle.data form, you can produce a summary plot of the data:
plot(mydata)The default plot is a barplot, with the proportions of each haplotype sampled in each rookery represented by a separate bar; the mixed population data are shown as the rightmost bar.
Before proceeding, you will need to ``condense'' your data set by (1) excluding any haplotype samples that are found only in the mixed population (which will break some estimation methods, and provide no useful information on turtle origins) and (2) lumping together all haplotypes that are found only in a single rookery and the mixed population (distinguishing among such haplotypes provides no extra information in our analyses, and slows down estimation). You can do this by typing
mydata <- hapfreq.condense(mydata)(To examine the condensed form of the data, you can print them by typing mydata at the command prompt, or plot(mydata) to see the graphical summary.)
Some data are already entered in the package:
data(lahan98)makes the haplotype frequency data from Lahanas et al. 1998 [3]available as variable lahan98.
l98c <- hapfreq.condense(as.turtle.data(lahan98))converts it into a turtle.data object and condenses the haplotypes to make it ready for analysis.
data(bolten98)gives you the data from Bolten et al. 1998 [2]available as bolten98; convert and condense as above.
mydata.cml <- cml(mydata) mydata.cml plot(mydata.cml)When you print CML results, R will tell you there is no estimate for the rookery frequencies, because CML assumes that the true rookery frequencies are equal to the sample rookery frequencies, rather than estimating the rookery frequencies independently.
The default plot for estimation results shows the estimated proportions of the mixed population contributed by each rookery as a stacked bar chart. Rookeries contributing less than a specified minimum contribution (default equal to 0.01) are not included in the bar.
Standard unconditional maximum likelihood analysis (UML) takes a little longer, but is equally straightforward:
mydata.uml <- uml(mydata) mydata.uml plot(mydata.uml)UML estimates also include estimates of the true haplotype frequencies in each rookery, which are printed with the contribution estimates and can be included in the graphical summary as follows:
par(ask=TRUE) plot(mydata.uml,plot.freqs=TRUE) par(ask=FALSE)(The par commands tell R to wait for user input, or not, between successive plots.)
mydata.umlboot <- genboot(mydata,"uml")will generate standard (nonparametric) bootstrap confidence intervals for a UML fit to mydata, by resampling the data with replacement 1000 times (by default). This is fairly slow with a realistic size data set. (You can ignore warnings about singular matrix, returning equal contribs.) You can find out the results by typing
intervals(mydata.umlboot)
mydata.mcmc <- tmcmc(mydata)runs a standard (as defined in [4,1]) Markov Chain Monte Carlo analysis of your data, with default settings.
mydata.mcmc intervals(mydata.mcmc) plot(mydata.mcmc)do the standard things: print the results, show confidence intervals, plot the results. (By default the information on haplotype frequencies in rookeries is not saved - it tends to be voluminous - and so this does not show up in the MCMC results.)
When you are running MCMC analyses, you have to check that the Markov chains have converged (i.e. that you've run everything long enough for a reliable estimate).
calc.randl.0(mydata)runs Raftery and Lewis diagnostics on your data set: these criteria attempt to determine how long a single chain has to be in order for it to give ``sufficiently good'' estimates. This function actually runs an iterative procedure, repeating the chain until the R&L criterion is satisfied.
calc.gandr(mydata)tests the Gelman-Rubin criterion, which starts multiple chains from widely spaced starting points and tests to ensure that the chains ``overlap'' - i.e., that between-chain variance is small relative to within-chain variance.
bbcontrib <- "http://www.zoology.ufl.edu/bolker/R/windows"
install.packages("turtle",contriburl=bbcontrib)
install.packages("bbmisc",contriburl=bbcontrib)
install.packages("boa",contriburl=bbcontrib)
install.packages("coda")
mydata <- hapfreq.condense(as.turtle.data(read.csv("myfile.dat")))
mydata.mcmc <- tmcmc(mydata) mydata.mcmc intervals(mydata.mcmc) plot(mydata.mcmc)