Data mining resources for species distribution modeling and macroecology

Research projects in ecology, evolution, and conservation biology increasingly rely upon ecological modeling and GIS data and tools for manipulating and analyzing spatially linked data from across a variety of spatial scales. Of particular importance are species distribution modeling (SDM; also known as Ecological Niche Models, or ENMs, or Environmental Niche Models) and niche divergence tools, geospatial data layers representing Earth environments from a variety of perspectives and scales, and GIS and mapping software.

This post provides a list of links to resources (mainly databases and websites) from which researchers can mine data for conducting SDM and macroecology analyses. I intend to curate this list and update it as I learn about or use new data or data mining resources. This is the first draft of the list, and so it will necessarily contain some gaps; however, I have tried to link to the most popular databases and software programs. Websites in the References section provide additional lists, but most of the software they list has been incorporated herein. Species distribution modeling software and GIS methods will not be reviewed here (see manuscripts and online lists of these elsewhere).

Along with my collaborators, I am currently using these data sources and software programs to conduct species distribution modeling analyses of North American conifers and Neotropical freshwater fishes.

Jump to…

 

Species Occurrence & Range Map Data

 


 

Climate Data Layers

 

NOTE: I don’t have time now but I plan to add links to download pages for several global circulation models, as well as future climate models, here in the near future. ~J

 


 

Other GIS Data Layers & Maps

 


 

GIS Software

Note: Check back… more links coming soon! ~J

 


 

References

CMEC (2017) Software and GIS data. Center for Macroecology, Evolution and Climate. Natural History Museum of Denmark, available at: http://macroecology.ku.dk/resources/software/.

WWF (2017) Conservation Science Data and Tools. World Wildlife Fund, available at: https://www.worldwildlife.org/pages/conservation-science-data-and-tools.

 


 

Other Resources

MVZ (Museum of Vertebrate Zoology) GIS Portal, University of California, Berkeley –  https://mvzgis.wordpress.com/

Setting DNA substitution models in BEAST

JC model of sequence evolution from TreeThinkers.

BEAST (Bayesian Evolutionary Analysis Sampling Trees) software provides a growing set of clear options for specifying evolutionary models of DNA substitution for alignments loaded into the program. However, the set of models that are readily available and “spelled out” in drop-down menus in the BEAUti (Bayesian Evolutionary Analysis Utility) GUI is much smaller compared to the standard set of 11 substitution schemes considered by other software programs for model selection (giving rise to a total of 88 models in jModelTest, or 56 models in PartitionFinder). This poses a problem that I have heard several BEAST users bring up in online discussion groups, and that several people have asked me about personally. In this post, I shed some light on this problem and clear up some of the confusion by giving details on how users can easily specify different substitution models for their BEAST runs. 

Background: Substitution models in BEAST

I will assume that readers are familiar with DNA substitution models. Also, there are a number of details about how BEAST handles DNA substitution models, which you can learn from the BEAST2 book or the manual, but that I will not cover. For example, by default, BEAST normalizes the rate matrix so that the mean rate is 1.0, and so on. Moving on to the issues…

It seems that many people are having issues with the fact that it’s not a simple “one-click” procedure to specify the diverse variety of substitution models that are frequently selected during model selection inference conducted prior to setting up BEAST runs. This is because BEAST1 only gives the obvious options to specify three of the most popular models of DNA substitution: TrN, HKY, and GTR, while BEAST2 versions give users obvious options to specify one of four models: JC69, TrN, HKY, and GTR. Thus other models such as TIM, K80, and F81 can seem cryptic to some users, who will often exclaim in dismay, “Where is the option for model X???” or “I don’t see how to specify model Y!!!” and then post a question about it (that may languish, never to even be directly answered, who knows).

Solving the problem: Learning BEAST functionalities “under your nose”

The real solution to the above problems is to realize the options available to you, especially that BEAUti actually gives several important options in the Site Model panel that each user must learn to use to tailor the models of evolution for each alignment/partition. These options are located right “under your nose,” but I’ll show you where they are and how to use them in Tables 1 and 2 below, as follows:

Table 1. Details for setting substitution models in BEAST1 (e.g. v1.8.3).

Best-fit substitution model (Base) Model to select in BEAUti Additional specifications in BEAUti
JC69 JC69 None 
TrN TN93 None 
TrNef TN93 base Frequencies set to “All Equal”
K80 (K2P) HKY base Frequencies set to “All Equal”
F81 GTR turn off operators for all rate 
parameters
HKY HKY None
SYM GTR base Frequencies set to “All Equal”
TIM GTR

edit XML file by hand so that all other rates are equal to the AG rate

TVM GTR unchecking AG rate operator
TVMef GTR unchecking AG rate operator and setting base Frequencies to “All Equal”
GTR GTR None

Table 2. Details for setting substitution models in BEAST2 (e.g. 2.4.0+).

Best-fit substitution model (Base) Model to select in BEAUti 2 Additional specifications in BEAUti 2
JC69 JC69 None 
TrN TN93 None
TrNef TN93 base Frequencies set to “All Equal”
K80 (K2P) HKY base Frequencies set to “All Equal”
F81 GTR fix all rate parameters to 1.0 (uncheck the “estimate” box)
HKY HKY  None
SYM GTR base Frequencies set to “All Equal”
TIM GTR fix CT and AG rate parameters to 1.0 (uncheck the “estimate” box)
TVM GTR fix the AG rate parameter to 1.0 (uncheck the “estimate” box)
TVMef GTR fix the AG rate parameter to 1.0 (uncheck the “estimate” box), and also set base Frequencies to “All Equal”
GTR GTR  None

 

 

I have quickly constructed these tables based on different options available in BEAST versions, my knowledge of the substitution models, and the list of model code for different substitution models on the BEAST1 website (information for XML format for BEAST1 / earlier versions than present). How can you use the above tables to set a desired site model? Easy. You simply check Tables 1 and 2 and for a given model you might want to set (first column), the tables tell you which model to select in BEAUti (second column) and the modifications, if any, you will need to make to that model or the XML file itself to get the model you want (third column). 

Another way…

If you would like more detail, you can also draw on other resources available online to do this properly. For example, if you wanted to set the F81 model for a partition of your dataset, you could also search the substitution model code page mentioned above for “F81”. If you did this you would find the following block for BEAST1:

<!-- *** SUBSTITUTION MODEL FOR PARTITION Gene2 *** -->
<!-- The F81 substitution model (Felsenstein, 1981) -->
<gtrModel id="F81_Gene2">
	<frequencies>
		<frequencyModel dataType="nucleotide">
			<frequencies>
				<parameter id="F81_Gene2.frequencies" value="0.25 0.25 0.25 0.25"/>
			</frequencies>
		</frequencyModel>
	</frequencies>
	<rateAC>
		<parameter id="F81_Gene2.ac" value="1.0"/>
	</rateAC>
	<rateAG>
		<parameter id="F81_Gene2.ag" value="1.0"/>
	</rateAG>
	<rateAT>
		<parameter id="F81_Gene2.at" value="1.0"/>
	</rateAT>
	<rateCG>
		<parameter id="F81_Gene2.cg" value="1.0"/>
	</rateCG>
	<rateGT>
		<parameter id="F81_Gene2.gt" value="1.0"/>
	</rateGT>
</gtrModel>

We can infer from this that the F81 model can be specified in any BEAST version by first specifying the GTR model, and then setting all rates to 1.0 and also setting all base frequencies to be equal “All Equal” (i.e. each equal to 0.25). And you would specify this information in the Site Model panel in BEAUti. You would also need to make sure that the priors were not included for these parameters, since they will be fixed; so you do that by commenting out the rate priors within the XML file. Accordingly, you would also need to delete the rate parameters from the log, because you won’t need to log those parameters since you won’t be estimating them.

~J

Additional Reading

Substitution model basics from Wikipedia: substitution model, models of DNA evolution  

Beast users google group threads:

BEAST2 website FAQ: link

PartitionFinder website: PartitionFinder

Automatically install or update all phylogenetics packages in R

In case you are unaware of this trick, it is possible to simultaneously install or update all of the phylogenetics packages available in R in one fail swoop.

A number of different CRAN Task Views have been created summarizing R packages and utilities for different fields of analysis or research. Luckily, there is not only a compilation of these located on the CRAN-R website here, but also Brian O’Meara (@omearabrian #omearabrian) has developed a nice CRAN Task View entitled “Phylogenetics, Especially Comparative Methods.” This means that you can view all of the software for phylogenetics and phylogenetic comparative methods in R at one time from the corresponding Phylogenetics website. Here is what these websites look like:

CRAN_Task_Views_screenshotOmeara_Phylogenetics_ctv

 

 

 

The most important thing, though, is that you can automatically install the entire Phylogenetics, Especially Comparative Methods view! Here’s how you do it! First, open R, download the ctv package(ctv stands for “CRAN Task Views”), then load the ctv library by typing:

library(ctv) 

Next, you can install the Phylogenetics, Especially Comparative Methods view using the ‘install.views’ function, and later you can update this view using the ‘update.views’ function. If this is your first time using R for phylogenetics, you will not have any of the phylogenetics software, so you will do the following to install the view:

install.views("Phylogenetics") 

If you have already been working with phylogenies in R and have some of the related packages installed, you will want to use update.views as follows:

update.views("Phylogenetics") 

This will check through all of the packages in the Phylogenetics, Especially Comparative Methods view and update packages you already have installed, while also installing for the first time any packages that are missing. Here, an important note to make is that you must 1) be connected to the internet to do this, and 2) realize that this could take a substantial amount of time to install all of the packages. So, only do this when you have time!!

~J

How to kill an R process without forcing R to quit on mac

From time to time, we make mistakes in programming or testing a new R script or function, only to find that R “freezes” and appears to be stuck, or working but giving the impression that it will take an eternity to complete the computation. What could be happening is that the process is based on an maximum-likelihood estimation of a parameter that requires convergence, you could have accidentally (e.g. by default) run a function that needs to visit the total number of models possible for your dataset or a certain amount of parameter space. Alternatively, there might be an issue with FORTRAN coding. Or, the function you’re using might require a maximum number of iterations to be specified, or else it will use an exhaustive search. R can “hang” for these and many other reasons.

Today, I’m doing a short post to show you how to get out of this situation by killing the process in R from outside the R environment. Specifically, I’ll show you how to do this from mac Terminal, i.e. the command line.

First do the following at the command line to obtain a list of processes including R:

ps aux | grep R 

Look through the resulting list of PIDs (process IDs, which are unique identifiers for each process) and info for the R process, which will likely be using a great deal of CPU (%CPU; eg. %CPU >90%), and thus will be near the top of the list (which will be in decreasing order). Here is an example of the results of running this code from my terminal to ongoing processes, including an R analysis (this is just an excerpt, the output was much longer):

Last login: Sun Apr 3 23:45:14 on ttys000 Justin-Bagleys-MacBook-Pro-3:~ justinbagley$ ps aux | grep R 
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND 
justinbagley 334 93.2 12.7 9618560 2122612 ?? R Tue05PM 317:07.03 /Applications/R.app/Contents/MacOS/R -psn_0_57358 
justinbagley 339 7.2 2.1 4949212 355924 ?? R Tue05PM 47:05.88 /System/Library/CoreServices/Finder.app/Contents/MacOS/Finder _windowserver 200 1.7 1.4 14600476 231856 ?? Ss Tue05PM 113:19.80 /System/Library/Frameworks/ApplicationServices.framework/Frameworks/CoreGraphics.framework/Resources/WindowServer -daemon _mdnsresponder 94 0.2 0.0 2544976 3864 ?? Ss Tue05PM 0:39.06 /usr/sbin/mDNSResponder 
justinbagley 491 0.1 0.2 3713764 33524 ?? S Tue05PM 11:19.43 /System/Library/CoreServices/Dock.app/Contents/Resources/DashboardClient.app/Contents/MacOS/DashboardClient 
justinbagley 23663 0.0 0.0 3358044 6476 ?? S Fri10AM 0:02.38 /Applications/Adobe Acrobat Reader DC.app/Contents/Frameworks/RdrCEF helper.app/Contents/MacOS/RdrCEF helper --type=renderer --disable-3d-apis --disable-databases --disable-direct-npapi-requests --disable-file-system --disable-notifications --disable-shared-workers --no-sandbox --lang=en-US --lang=en-US --log-severity=disable --product-version=ReaderServices/15.10.20056 Chrome/45.0.2454.85 --enable-delegated-renderer --num-raster-threads=4 --gpu-rasterization-msaa-sample-count=8 --content-image-texture-target=3553 --video-image-texture-target=3553 --channel=23661.1.1456228839

An alternative way of getting this information on mac is to use the Activity Monitor app, but we’re not covering that here. OK, now armed with this information you can use the kill command to identify and stop the analysis. In my example above, the process ID for the R process is 334, and this process is consuming 93.2% of the CPU and 12.7% of the computer’s memory, so we can stop this analysis by entering the following command at the command line:

kill -s INT 334 

Two great things about this are 1) that this is easily accomplished at the command line, and 2) that you don’t have to call R or alter R, or do “force quit” (e.g. by pressing option + command + Esc), you only have to stop the offending process. So, this means that R will not be closed as a result of these actions, but will be “liberated” from the hung prodess. As a result, your log of commands and output will not be lost.

Just as a tip, if you are worried about losing your command history at any time, you can usually save this from R even if the R environment/GUI hangs on an analytical process by first clicking the “Show/Hide R command history” button at the top of the R GUI, and then clicking “Save History.”

~J

Latest *BEAST trace: Most beautiful thing I’ve seen today

My new *BEAST (starbeast) species tree runs are converging wonderfully in BEAST v1.8.3, so I am off to a great start this morning. Here, I’m including a screenshot of the results of one of these runs, which I ran for 200 million generations using 3 normally distributed calibration points, as visualized in Tracer. Note the good posterior trace, very high ESS scores for all parameter estimates, and good use of burn-in (default 10% value works quite well in this case).

This is the most beautiful thing I’ve seen in my research today, or even this week!38_STARBEAST_PSmle_post_trace_Mar10 

I am now checking and processing the results from multiple such runs using different species schemes (trait mapping schemes/files). They seem like they are going to give me a nice estimate of the species tree for this dataset.

Currently, the second steps of these runs are still running, using path sampling/stepping-stone sampling to calculate marginal likelihood estimates (MLE) for each model. When this is completed, I will calculate Bayes factors for each model and use them for Bayes factor species delimitation (BFD) sensu Grummer et al. (2014). 

Is anyone else working on BFD analyses like this right now? Would love to hear from you.

~J

References

Grummer JA, Bryson RW Jr., Reeder TW. 2014. Species delimitation using Bayes factors: simulations and application to the Sceloporus scalaris species group (Squamata: Phrynosomatidae). Systematic Biology 63(2): 119-133.

Getting BEAST v1.8.3 running! Beagle, CUDA, Java, and LINUX

OK, I recently posted here about the new BEAST v1.8.3 release to let people know it was available (just echoing the announcement from Andrew Rambaut). Not sure about you, but when I get ahold of a new version of BEAST, I generally get it set up in my file system and try to run it. It’s good to go into this type of thing with a little dose of skepticism, as in my experience new versions of BEAST contain bug fixes but also seem to introduce new bugs or problems that you can run into. I think this is why some users avoid the latest updates; I mean, I still see papers publishing analyses using BEAST 1.6.1, for example, which was released a couple of years ago. At any rate, a test run can be very illuminating, so I downloaded the updated BEAST distribution and tried testing a very simple xml input file that should definitely have run. Unfortunately, I found that BEAST wasn’t running “out of the box” for me. Luckily, this was due to issues having nothing to do with the new version of BEAST!

Here, I’m going to detail how I encountered and overcame issues installing and running the new BEAST version. I hope this helps others set up BEAST on their computers and get it going. I’m going to start with the download, then touch on Java, CUDA, Beagle, and putting BEAST on a LINUX-based supercomputing cluster. Some actions I’ll cover may require admin privileges. If you don’t feel like reading the whole post, here is a brief summary:

Before using the new version of BEAST,

  1. update Java,
  2. update your Beagle library,
  3. update NVIDIA CUDA drivers, and
  4. put the UNIX/LINUX version of the updated BEAST software up on LINUX-based supercomputers and use it…

While this is probably a good enough reminder for veteran users, BEAST beginners will want to read the whole post that follows if the topic of installing and running BEAST on mac or UNIX/LINUX is of interest.

Assumption: updated version of Java

NOTE: Everything we’ll go through below assumes that you have the most recent version of Java running on mac. Without an updated Java version, you might complete some of these installs and updates only to find that BEAST would fail to run on your system. So, before proceeding through the rest of the post, I recommend that you first stop and verify your version of java by clicking here, clicking “Verify Java version” button, and doing an install/update as needed based on the results…

Also note that having older version of Java on your system could pose problems for running BEAST (=beast-users Google group thread).

Problems from the start because of my mac

Given I use mac, I installed BEAST by downloading the 1.8.3 dmg file from the BEAST v1.8.3 GitHub site (see screenshots of site and links below) and opening it, then moving the new software files to a folder in my Applications directory.

BEAST183_GitHub

My first attempt with BEAST v1.8.3 failed and resulted in several warnings and errors, one of the most significant of which was an exhortation to download the newest version of the Beagle library. I don’t have a screenshot of the error, but it occurred near the top of the run log… (look for something that starts like this: “WARNING: …”). The moral of the story here is that when you get a new version of BEAST, you should automatically check to make sure you have an updated version of Beagle.

Updating Beagle…and CUDA drivers on mac

Beagle is a free, open source application programming interface (API) and library for phylogenetic inference calculations developed by Mark Suchard and Andrew Rambaut, and colleagues. The Beagle library was designed to help ease the computational burden of phylogenetic modeling with codon-based/partitioned models. By drawing on Graphical Processing Units (GPUs) and cores using Beagle, the authors found a vast increase in computation time for Bayesian phylogenetic inference, relative to runs using CPUs for computation (see papers describing the method/library here and here).

Fortunately, updating Beagle is very simple. To do this, I went to the Beagle download site on GitHub, and, according to the properties of my Macbook Pro laptop, I downloaded the appropriate binary installers by clicking on the “BEAGLE v2.1.2 for Mac OS X 10.6 and later” link, as shown in the screenshots below right.

Beagle_readme_and_links Beagle_lib_GitHub_front

Surprisingly, this took me to Dropbox (screenshot below), so you may need to create a Dropbox account to download the file. I already use Dropbox, so I just signed in and clicked “Download” to get the “BEAGLE-2.1.2.pkg” file. Once you have this file on mac, all you need to do is double click on it and click through the install windows. 

Beagle_DropboxIt is tempting to assume that installing Beagle would be sufficient; however, this may not be the case. Why is that? Well, you’ll probably want to run Beagle using a GPU, and so you will want to do one more step. If you look at the install instructions for Beagle, for example by clicking the “Instructions for installing BEAGLE on Mac OS X” link for mac users, you will find that, beyond getting the most recent Beagle version, the authors recommend that you proceed in a second step to install/update the NVIDIA CUDA drivers for mac. You just simply go here to get the updated drivers (top of list; screenshot below right), download and click on the new file to install CUDA. If you already have olderCUDA drivers on your mac, there’s no need to uninstall or worry about potential issues with a new install (e.g. “installing over” previous versions); this is irrelevant for mac users, but I am unfamiliar with potential related issues on windows.

CUDA_download

After installing the above Beagle and CUDA updates, my test file ran when trying BEAST v1.8.3 on my MacBook Pro! So, that was great news!! 

However, I don’t actually run my final BEAST runs locally. I run BEAST on an enormous supercomputing cluster at the Brigham Young University Fulton Supercomputing Lab. So, my next step was prepping and setting BEAST up on our cluster.

Setting the new BEAST up on a LINUX-based supercomputing cluster

As general comments, let me say that, with administrator access, you’ll be able to do more and do it faster when you install new programs. But without superuser privileges you probably won’t be able to follow the Beagle library install instructions for Linux given on GitHub, word for word. In this case, if you are also trying to get BEAST v1.8.3 / Beagle 2.1.2 up on a Linux-based supercomputing cluster, I recommend that you get with your cluster administrator or maintenance group and request that they do the install for you (first beagle, then BEAST). This would be the best practice.

However, for those of you interested in a shortcut, the quickest and dirtiest way to proceed if you already have an earlier version of Beagle installed on your cluster would be to simply skip the Beagle update, and add the new BEAST to your supercomputing account and run it. I tested whether the new version of BEAST would run with an older version of Beagle and found that it ran just fine!

Either way, whether you update Beagle or not, you will want to update BEAST on your cluster by downloading the UNIX/LINUX binary and installing it. Remember, you can get the appropriate UNIX/LINUX download from the BEAST v1.8.3 GitHub site. Here, the correct file is the one with the “.tgz” extension (the site doesn’t tell you this is for UNIX, but it is; the file with the “.zip” extension is for Windows machines). This will be opened and converted to a “.tar” file on mac, so you will just need to double click on that and install from there to get the new software folder. Then move the folder to the appropriate user or lib directory on your supercomputer and remember to point to it in your run scripts.

~J

Dealing with negative phylogenetic branch lengths in BEAST starting trees

Negative branch lengths and the BEAST

beastWhen conducting phylogenetic analyses, you will from time to time encounter negative branch lengths. Negative branch lengths are a nuisance because they can cause substantial visualization problems or keep analyses (e.g. downstream analyses in other programs) from running.

For example, it will be desirable in many cases to add a starting tree to input files for datasets with phylogeographical sampling (e.g. multiple individuals in a standard BEAST analysis, or a *BEAST species tree analysis with multiple individuals sampled per species/population modeled). However, intraspecific sampling has the downside that it lends itself to negative phylogenetic branch lengths in maximum clade credibility (MCC) trees (final annotated tree resulting from BEAST analysis). This can happen when a clade in the MCC tree occurs at low frequency relative to many other trees that are being summarized. But watch out!! TreeAnnotator will not alert you that negative branches are present in the MCC tree output. You are unlikely to catch this issue unless you visualize your MCC tree in FigTree, which in more conspicuous cases will show some really funky looking branches running the opposite direction!

This is problematic because when you take a tree that unbeknownst to you has negative branch lengths and try to use it as a starting tree, BEAST will hang up and return the following error message:

java.lang.RuntimeException: Negative branch length: -0.0036835877071336926 
at beast.evolution.likelihood.BeagleTreeLikelihood.traverse(Unknown Source). 
Solutions

You can deal with this problem by manipulating the branch lengths of your tree so that small/negative branch lengths are converted to zero in one of several ways:

  1. manually, e.g. in a text editor, 
  2. using regular expressions i.e. grep (=super search and replace!)
  3. manipulating tree edge lengths in R without relying on a specific function but taking advantage of phylo structure,
  4. using the R collapse multichotomy function in APE.

In one of his Boopsboops blog posts, Rupert Collins provides an example of this from a COI neighbor-joining phylogeny of cypriniform fishes, and he also gives two examples of how to change the negative branch lengths to zero branch lengths using regular expressions and R code to change tree edge lengths (branch lengths) less than zero to zero. So, you can simply check out his blog post for info on #2 and #3 in the list above. However, when modifying edge lengths in R, it’s worth considering non-negative values less than zero; I mean, a branch length of 0.01 still extends back to 10 thousand years prior, and you may not want to convert it to zero. Searching around online, I also found the following solution using sed posted by Andreas on biostars.org:

sed -e 's,:-[0-9\.]\+,:0.0,g' in.tree > out.tree

So, that’s another way to use the command line to solve the negative branch length problem! In the sed example, “in.tree” is the input tree name and “out.tree” is an output tree name; both can obviously be changed to fit your preferences.

For #4 in the list above, you would use 

di2multi(phy, tol = 1e-08)

which according to APE documentation “deletes all branches smaller than tol and collapses the corresponding dichotomies into a multichotomy.” This will take internal zero- or negative-length branches and collapse them. Here, phy is a class “phylo” object containing your tree.

As a slight side note, the APE function multi2di, which is a companion function to di2multi, can be used to accomplish the opposite task–arbitrarily resolving multichotomies into series of dichotomous branches. The general usage of multi2di is as follows:

multi2di(phy, random = TRUE)

If random is set to FALSE, then polytomies will be resolved in their order of appearance in the tree.

Known issues

‘Purists’ will likely notice that there are some issues with the above suggestions. In particular, when you collapse branches or convert small/negative branch lengths to zero-length branches, this has the effect of shortening the tip-to-node distance for the adjacent branch. The difference doesn’t get added back in. So, if you are working with an ultrametric time tree (e.g. output from a previous BEAST analysis), the above manipulations will yield a final tree that is technically, and by an ever so slight margin, non-ultrametric. If the tree of interest wasn’t ultrametric to begin with (e.g. it was from RAxML), then the resulting tree simply will not preserve the overall branch lengths. 

The good news is that, after you fix negative branch lengths in your BEAST starting tree, and assuming that you 1) have your starting tree properly specified in your xml input file and 2) the tree fits any calibration priors, your BEAST analysis should start without being impeded by these kinds of errors. Of course, the manipulations I’ve discussed above will also fix graphical errors caused by negative branch lengths. OK, I hope the above helps you get your BEAST analyses running full speed ahead when you encounter negative branch length issues.

~J

New version of BEAST–v1.8.3–released!

beastIn case you were unaware, Andrew Rambaut has made a new version of BEAST 1, v1.8.3, available for Mac, Windows and Linux (.dmg, .zip, .tgz) at the BEAST 1 GitHub website.

According to Andrew, new features of this version include:

  • Generalized Stepping Stone sampling for Marginal Likelihood Estimation.
  • Continuous quantile version of uncorrelated relaxed clock as option in BEAUti [from Li & Drummond 2012, MBE].
  • Option in BEAUti to log complete histories in Markov Jump Counting.
  • Jukes-Cantor added as an option for BEAUti nucleotide substition model.
  • New tree transition kernel (SubTreeLeap operator) implementd in BEAST.
  • Defaulting to randomizing rate categories for DiscretizedBranchRates.
  • BEAUti ‘guess dates’ options are now persistent from run to run (and shared with Path-O-Gen/Tempest).
  • Transmission tree model of Hall et al (2016) implemented.
  • Fast, general multidimensional scaling (MDS) implemented.
  • Clock panel in BEAUti simplified.
  • Parameter linking available in Priors panel in BEAUti.
  • Command line is logged into log file headers for reference.
  • Plus bug fixes.

I personally am downloading the update and running stepping stone sampling for MLEs for Bayes factors. What in the new version appeals most to you?

~ J

Update: BEAST v2.3.2 released!


Hi guys, OK, in case you haven’t heard, there is a new version of BEAST2, so make sure that you go get that update on GitHub. According to Remco Bouckaert, the new version, v2.3.2, improves the ability of the user to interact with data in BEAUti, and it also improves the efficiency of some BEAST functions, particularly the MRCAPrior. Moreover, LogCombiner has been fixed to ensure logcombiner burn in editing finished properly, and LogAnalyser now has one line per file mode. Supposedly, BEAST now has “more sensible error messages” in several packages.

However, I find that BEAST is still plagued with problematic error reporting in some cases. Take for example this infamous error: “Parsing error – the input file is not a valid XML file”. The program developers have made good suggestions for dealing with such vague errors. But hopefully the guys will continue to improve error reporting and related issues much more in future versions.

~J

Update to BEAST v2.3.1

cropped-beast2-logo-2

A couple of months ago the developers released a new version of BEAST, v2.3.1. I like to remind myself and others to update your BEAST and its utilities to this latest version, which you can download it here.

BEAST v2.3.1 contains a number of important updates. Most notably, TreeAnnotator now has the default memory set to 4 Gb, and this will greatly help ambitious people who try to upload and summarize large numbers of trees, or the posterior distribution of trees from an analysis with many tree tips. If you haven’t updated in awhile, then you will get the greatest benefit from the present update and from managing all of the new packages that have come available in the last couple of years. For example, you’ll find that the new version of BEAST is compatible with new packages for Bayesian stochastic variable selection (e.g. BASTA), species delimitation, new ways of handling monophyletic constraints and fossil calibrations (if you haven’t seen CladeAge), and new models for generating species trees from genomic datasets (e.g. SNAPP using SNP data).

~J