Craig Venter

Posted in Uncategorized and tagged Human Genome on May 4, 2026

If I don’t write it down, I never seem to find time to do it. Craig Venter passed away in April 29, it makes me feel my world become old and faraway with these excited people.

For me, it marks the end of an era.

Venter is someone I first learned about from textbooks when I studied genomics. My mentor in the US also worked at the J. Craig Venter Institute before, and he told me that he could focus on a project and not talk to anyone for a week (which I doubt), but that place seems amazing.

The most important thing I learned in last two years

Posted in Learning and tagged Lifelong Learning on Mar 29, 2026

What people say to you may not reveal their true character, watch how they treat others.

I finished my PhD in the last two years, but I haven’t updated my blog since 2023. It is not just too busy with research, but also about seeing how people treat others, and for me, how to manage up (apparently, I am not good at it).

Academic life is all about how to manage your time, your thoughts, your health, and your emotions. Thanks to the good environment, peers, and more experienced supervisors, I graduated.

I will keep writing, my views are my own.

How to update all R packages after installing a new version of R?

Posted in HowTo and tagged update R packages on Jul 5, 2023

Many R users are unfamiliar with the process of updating all their R packages after upgrading to a new version of R.

Would you like to learn how to update all R pacakges in a safe and straightforward way?

The method I described below should work for all platforms, however, I didn’t use R in Windows, so I don’t know the specific directory path, I will describe the steps using a Linux system.

For example, I upgraded R from 4.2.1 to 4.3.1 recently.

As we can see in the R library folder, there are two sub-folders, one is 4.2 (/home/ylj/R/x86_64-pc-linux-gnu-library/4.2) and the other is 4.3 (/home/ylj/R/x86_64-pc-linux-gnu-library/4.3).

First, we need to copy all the packages from the 4.2 folder to the 4.3 folder. Alternatively, it is also possible to perform the reverse operation, but in that case, the folder needs to be renamed.

Then, in R terminal/Rstudio run each line below.

## load all installed packages
pkglist=data.frame(installed.packages(lib.loc="/path/to/lib")) # at here, lib.loc="/home/ylj/R/x86_64-pc-linux-gnu-library/4.3"
## update all packages
BiocManager::install(pkglist$Package,update=TRUE,ask=FALSE,checkBuilt=TRUE)
## reload new Rsession
## check if there are any packages that weren't updated
pkgupdated=data.frame(installed.packages(lib.loc="/path/to/lib"))
pkgfailed=pkgupdated[pkgupdated$Built<"4.3.1",]

For all packages in the pkgfailed. we have to install them manually.

Lifelong Learning

Posted in Learning and tagged Lifelong Learning on Jul 1, 2023

I have initiated a category dedicated to discussing the valuable insights gained from my supervisors, and peers, and sharing personal reflections/thoughts. This category aims to share thoughts during the ongoing, voluntary, and self-motivated pursuit of knowledge for either personal or professional reasons (lifelong learning).

In this initial post, I will continuously update and share insightful quotes from various individuals.

J: You are using only two numbers to determine your life. confusion matrix isn’t enough, let’s use AUC.

T: If this decision is hard to make, that means this decision does not matter. If this decision does not matter, then you can decide by tossing a coin.

“Must a name mean something?” Alice asks Humpty Dumpty, only to get this answer: “When I use a word… it means just what I choose it to mean – neither more nor less.”

This reminds me of a famous statement from the 1960s: The medium is the message. Here you seem to be letting Google docs or Google form/spreadsheet determine how to organise your thoughts. That doesn’t seem to me to be the right way around.

Keep good dry lab habits

Posted in bioinfo and tagged Rstudio, tmux, Ctrl-z on Apr 28, 2021

I have been worked in multiple dry labs during the last 10 years. Each laboratory has its own style, but few of them training students with good dry lab habits. I learned a lot of good habits from Dr. Malay Basu’s lab. Now, I would like to share these good habits and tips when doing bioinformatics study in a dry lab. All the suggestions based on the Linux environment.

Habtis

No.1

Maintain a good documentary for all analysis. I learned this since the first week at Malay’s lab. We recommand follow the style of Rmarkdown reprot from MD Anderson Cancer Center

The report should includes:

Executive Summary (*)
- Introduction
- Data and Methods
- Results
Data Mungling
Analysis
- Step 1
- Step 2
Appendix (*)
- Working dir
- Script descriptions
- List of Figures
- List of Tables

If you want to share the report with others, it is better to generate a HTML report, if you want to see the report as a MD file on Github, you need to generate a Github MD report.

Always use sessionInfo() to print the collect information about the current R session in the bottom of the report.

No.2

Each research project should also follow a clean structure:

bin
original_data
- README - containing where the data is from
- Data in compressed form.
  1. If the raw data is huge. Then only the process data. The the processed step should be captured in the “exp” directory.
“exp” - Actual experiments. Each directory must be named as
1. YYYYMMDD-some_identifiable_info-#githubissue-INITIALS
2. each directory should link input files in in folder and generate outputfiles in out folder. The outfiles should have the name date and time appended to it.
3. The entire experiments should be done through and RMD file or single “run.sh” or a “Makefile”. There must be the RMD/MD file in each directory. You must write the executable summary and the list of figures and tables and their descriptions in the report.
“design” (?)
- Reserved for PI.
docs
- Reserved for PI.
reports
1. “index.htm”: A top level index files listing every file and few lines description of the files in the directory.
2. A list of files names matching with the each directory names of “exp”, preferebly in html format generated from RMDs

You can upload the project structure and code to any cloud storage, make a backup every day/week.

No.3

Whatever, back up the server data every week, lost data is a huge pain for every researcher.

No.4

After a project is finished. You need to review the whole project’s data and make sure every folder in the project has a document to record what you had done before, every script should also have a note to record the meaning of it.

No.5

when link a data from one sub-folder to another sub-folder in one project folder, you should use relative symbolic link ln -rs. When you need to move a project folder to other place, the data structure will not be broken.

Tips

No.1

Use tmux or screen when you want to run some program after exit the terminal session, they are usable with interactive commands. I perfer using tmux , because it has more function compare with screen. Please try to avoid to use nohup which you couldn’t control the program later.

When you have a Rstudio-server, sometime you may wish to run program via Rstudio termianl after you close the Rstudio interface, but I still suggest you run the program using tmux inside the Rstudio terminal.

The main reason for this is when the Rstudio terminal session is not reponse, you still can do interactive in tmux from other ssh session.

No.2

Always monitor the memory usage via ps aux --sort -rss | head, some parallel program may cause momory leak, you maynot know about memory leak even the program is running sucessfully.

No.3

Control+C aborts the application almost immediately while Control+Z shunts it into the background, suspended. If you shunt any application into the background, please keep in mind that the program is still waiting for your command.

No.4

Do not save the large files to /home directory if your server has limited storage for /home. Use df -h to check the size of each system disk.

Use du -sh * | sort -rh to check the size of each folder.

No.5

Better to learn one of Pipeline Building Framework and Docker, it makes everything reproducible more easily.

No.6

You need have basic knowledge of conda, and use it when run python code.

Also makesure you know where is the installed version of it. I found someone who is a su user installs miniconda on the /home directory but the all user’s environment of the conda has changed to this person’s /home version. That’s ridiculous.

No.7

Always compress dataset in to .xz or .gz format. R and Python (or other programming language) has function to read compressed datasets, you don’t need to uncompress any dataset for analysis. That’s may save a lot of space.

No.8

When generate a modified file, never cover the orginal files.

No.9

Bioinformatics is a field that has rapid changes. Stay hungry, keep learning.

Biomedical science and Mathematics

To Infinity and Beyond!

Craig Venter

The most important thing I learned in last two years

How to update all R packages after installing a new version of R?

Lifelong Learning

Keep good dry lab habits

Habtis

No.1

No.2

No.3

No.4

No.5

Tips

No.1

No.2

No.3

No.4

No.5

No.6

No.7

No.8

No.9