Many R users are unfamiliar with the process of updating all their R packages after upgrading to a new version of R.

Would you like to learn how to update all R pacakges in a safe and straightforward way?

The method I described below should work for all platforms, however, I didn’t use R in Windows, so I don’t know the specific directory path, I will describe the steps using a Linux system.

For example, I upgraded R from 4.2.1 to 4.3.1 recently.

As we can see in the R library folder, there are two sub-folders, one is 4.2 (/home/ylj/R/x86_64-pc-linux-gnu-library/4.2) and the other is 4.3 (/home/ylj/R/x86_64-pc-linux-gnu-library/4.3).

First, we need to copy all the packages from the 4.2 folder to the 4.3 folder. Alternatively, it is also possible to perform the reverse operation, but in that case, the folder needs to be renamed.

Then, in R terminal/Rstudio run each line below.

## load all installed packages
pkglist=data.frame(installed.packages(lib.loc="/path/to/lib")) # at here, lib.loc="/home/ylj/R/x86_64-pc-linux-gnu-library/4.3"
## update all packages
BiocManager::install(pkglist$Package,update=TRUE,ask=FALSE,checkBuilt=TRUE)
## reload new Rsession
## check if there are any packages that weren't updated
pkgupdated=data.frame(installed.packages(lib.loc="/path/to/lib"))
pkgfailed=pkgupdated[pkgupdated$Built<"4.3.1",]

For all packages in the pkgfailed. we have to install them manually.


I have initiated a category dedicated to discussing the valuable insights gained from my supervisors, and peers, and sharing personal reflections/thoughts. This category aims to share thoughts during the ongoing, voluntary, and self-motivated pursuit of knowledge for either personal or professional reasons (lifelong learning).

In this initial post, I will continuously update and share insightful quotes from various individuals.

J: You are using only two numbers to determine your life. confusion matrix isn’t enough, let’s use AUC.

T: If this decision is hard to make, that means this decision does not matter. If this decision does not matter, then you can decide by tossing a coin.

T:

“Must a name mean something?” Alice asks Humpty Dumpty, only to get this answer: “When I use a word… it means just what I choose it to mean – neither more nor less.”

This reminds me of a famous statement from the 1960s: The medium is the message. Here you seem to be letting Google docs or Google form/spreadsheet determine how to organise your thoughts. That doesn’t seem to me to be the right way around.


I have been worked in multiple dry labs during the last 10 years. Each laboratory has its own style, but few of them training students with good dry lab habits. I learned a lot of good habits from Dr. Malay Basu’s lab. Now, I would like to share these good habits and tips when doing bioinformatics study in a dry lab. All the suggestions based on the Linux environment.

Habtis

No.1

Maintain a good documentary for all analysis. I learned this since the first week at Malay’s lab. We recommand follow the style of Rmarkdown reprot from MD Anderson Cancer Center

The report should includes:

  1. Executive Summary (*)
    • Introduction
    • Data and Methods
    • Results
  2. Data Mungling
  3. Analysis
    • Step 1
    • Step 2
  4. Appendix (*)
    • Working dir
    • Script descriptions
    • List of Figures
    • List of Tables

If you want to share the report with others, it is better to generate a HTML report, if you want to see the report as a MD file on Github, you need to generate a Github MD report.

Always use sessionInfo() to print the collect information about the current R session in the bottom of the report.

No.2

Each research project should also follow a clean structure:

  • bin
  • original_data
    • README - containing where the data is from
    • Data in compressed form.
      1. If the raw data is huge. Then only the process data. The the processed step should be captured in the “exp” directory.
  • “exp” - Actual experiments. Each directory must be named as
    1. YYYYMMDD-some_identifiable_info-#githubissue-INITIALS
    2. each directory should link input files in in folder and generate outputfiles in out folder. The outfiles should have the name date and time appended to it.
    3. The entire experiments should be done through and RMD file or single “run.sh” or a “Makefile”. There must be the RMD/MD file in each directory. You must write the executable summary and the list of figures and tables and their descriptions in the report.
  • “design” (?)
    • Reserved for PI.
  • docs
    • Reserved for PI.
  • reports
    1. “index.htm”: A top level index files listing every file and few lines description of the files in the directory.
    2. A list of files names matching with the each directory names of “exp”, preferebly in html format generated from RMDs

You can upload the project structure and code to any cloud storage, make a backup every day/week.

No.3

Whatever, back up the server data every week, lost data is a huge pain for every researcher.

No.4

After a project is finished. You need to review the whole project’s data and make sure every folder in the project has a document to record what you had done before, every script should also have a note to record the meaning of it.

No.5

when link a data from one sub-folder to another sub-folder in one project folder, you should use relative symbolic link ln -rs. When you need to move a project folder to other place, the data structure will not be broken.

Tips

No.1

Use tmux or screen when you want to run some program after exit the terminal session, they are usable with interactive commands. I perfer using tmux , because it has more function compare with screen. Please try to avoid to use nohup which you couldn’t control the program later.

When you have a Rstudio-server, sometime you may wish to run program via Rstudio termianl after you close the Rstudio interface, but I still suggest you run the program using tmux inside the Rstudio terminal.

The main reason for this is when the Rstudio terminal session is not reponse, you still can do interactive in tmux from other ssh session.

No.2

Always monitor the memory usage via ps aux --sort -rss | head, some parallel program may cause momory leak, you maynot know about memory leak even the program is running sucessfully.

No.3

Control+C aborts the application almost immediately while Control+Z shunts it into the background, suspended. If you shunt any application into the background, please keep in mind that the program is still waiting for your command.

No.4

Do not save the large files to /home directory if your server has limited storage for /home. Use df -h to check the size of each system disk.

Use du -sh * | sort -rh to check the size of each folder.

No.5

Better to learn one of Pipeline Building Framework and Docker, it makes everything reproducible more easily.

No.6

You need have basic knowledge of conda, and use it when run python code.

Also makesure you know where is the installed version of it. I found someone who is a su user installs miniconda on the /home directory but the all user’s environment of the conda has changed to this person’s /home version. That’s ridiculous.

No.7

Always compress dataset in to .xz or .gz format. R and Python (or other programming language) has function to read compressed datasets, you don’t need to uncompress any dataset for analysis. That’s may save a lot of space.

No.8

When generate a modified file, never cover the orginal files.

No.9

Bioinformatics is a field that has rapid changes. Stay hungry, keep learning.


This is an Attic to store some useful R codes, I don’t want to search them every time, so I put these codes at here.

Check packages if they are installed

check.package <- function(pkg){
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg)) 
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}


packages.name <- c("reshape","ggplot2","gridExtra")
check.package (packages.name)

Draw beautiful arrowhead

require(ggplot2)
require(grid)

d = seals[sample(1:nrow(seals), 100),]

ggplot(d, aes(x = long, y = lat)) +
geom_segment(aes(xend = long + delta_long/100, 
                 yend = lat + delta_lat/100),
             arrow = arrow(angle = 10,type="closed"),
             colour=c("black"),
             arrow.fill=c("red"),
             size=0.5) 

TBC


webm is an audiovisual media file format. It is primarily intended to offer a royalty-free alternative to use in the HTML5 video and the HTML5 audio elements. It has a sister project WebP for images. On Fedora system, you could use the Gnome’s embedded screencast tool to create a 30 seconds video of your screen by default. What I want is a gif animation that would be easier to be transferred and shown online.

So how to convert webm file to gif file on Linux?

The best way is using ffmpeg 1.

ffmpeg -y -i input.webm -vf palettegen palette.png
ffmpeg -y -i input.webm -i palette.png -filter_complex paletteuse -r 10 output.gif

After that I also recommend using GNU Image Manipulation Program (GIMP) to crop off the unwanted part in the animation, it is also a way to reduce the animation size.