Skip to main content

Collecting meta data from Entrez

It's often to show growth of sequence data of interest when one writes research proposal. For an example, you requires to collect number sequences from agricultural organisms and compare it to human if you want to explain how sequences regarding to agricultures grow faster than human data. Usually the gross statistics of GenBank, is posted on NCBI's Web page, might be not enough to describe details of the data growth.

By using show index, preview, and limit functions in Entrez, you can quickly collect meta information like number of entries.

dbEST
Total records
Records for last 3 years
Growth rate for last 3 years
human
8,315,231
177,492
2.1%
mouse
4,853,547
3,289
0.1%
cattle
1,559,494
45,232
2.9%
pig
1,620,570
144,207
8.9%
chicken
600,423
1,041
0.2%
insects
4,493,137
1,864,326
41.5%
bacteria
1,266
1,012
79.9%
fungi
2,893,583
1,508,814
52.1%
plant
22,633,681
7,290,397
32.2%


To complete the above table, we need to count total records for each species in dbEST at current date. It's quite simple as the following: 1) choose a database, EST, on NCBI's front page; 2) press search button; 3) click advanced search; 4) choose an "Organism" field; 5)  type "human" on query box; 6) click show index; 7) press "add to Search Box"; 8) press preview button (not search here),  and 9) repeat for every species from 3).

Select DB of interest

In advanced search

Now you can validate your query word whether correct term in Entrez and then choose "human (10064832)" among drop items. 

Here the number in parenthesis means number of entries for human organism. Please try yourself to choose "All fields" and same query, then compare the numbers. Maybe they are different counts, why?

One query was built and the results was stored as #4 symbol in Search History.

With repeats for every species, you can get similar results like the above.

Now we need to find whole numbers of EST for last 3 years. At first back to front page of dbEST and the click the limits link. Now you can see the following  screen and choose proper items in "Published in the last", and then press search button. When you see a list page of results, click "advanced search" again. If you see similar page like the below, it's correct.
In this page, #11 query means total numbers of ESTs published for the last 3 years

To find human ESTs published for the last 3 years, you better to use an "AND" Boolean operator with shortcut symbol. That means to type query as the following then push the "Preview" button.


By repeating for every organisms, you might get similar results like the below. The growth rates for each organisms should be calculated by your self.
























Comments

Popular posts from this blog

[왜#1] 똑같은 환경의 방안에 둔 금속이 종이나 나무보다 더 차가운건 왜일까?

여름이든 겨울이든 방안에  둔 금속이 종이나 나무로 만들어진 것 보다 훨씬 차갑다는 것은 다 아는 사실이다. 금방 냉장고에 꺼낸것이 아니라 2-3시간에 그냥 둔 것이니 책이나 캔의 온도는 같아야 할텐데 우리가 손을 데면 금속이나 액체를 담은 팩이 훨씬 차다. 갑자기 우리 딸아이가 이런 것을 질문하면 어떻게 답해야할까? 라는 생각이 들었다. 전공시간에 열역학을 배우긴 했지만... 결국 구글링과 여러 링크를 서핑한 결과 두 가지 단서를 찾을 수 있었고, 나름 결과를 정리해본다. "온도감각은 현재의 온도가 아니라 온도변화율에 반응한다" 피부가 온도를 느끼는 것은 주로 온각섬유 warm receptor와 냉각섬유 cold receptor가 각각 작용한다(1). 참고자료를 꼼꼼히 읽어보면 각 섬유세포의 역치(threshold)에 대해서 나오는데 단위가 [온도/시간]이다. 온각의 역치는 0.001 ℃/sec , 냉각의 역치는 0.004 ℃/sec 이며 공통으로 약 3초 후에 순응한다. 즉, 사람은 초당 온도가 얼만큼 변화는 가에 따라 온도를 느끼는 정도가 달라진다는 것이다. 겨울에 바람이 많이 부는 날은 체감온도가 낮아지는 이유도, 어느정도 따뜻해진 옷속의 공기를 계속 차갑게 하면서 변화율을 높이기 때문인 것이다. 결국 방안의 물건들이 같은 온도임에도 불구하고 금속성 물질과 종이로 만들 것을 만질때 감각의 차이는 온도변화율에 기인하는 것임을 알 수 있다. 다음으로  "접촉한 두 물질의 종류에 따라서 온도변화율이 달라지는가"에 대한 의문이 꼬리를 문다. 몇 가지 키워드로 자료를 찾다보면 "열전달 heat transfer", "열전도율 heat conductivity"이라는 용어를 접할 수 있다(2). 사실은 이공계 출신이라 전도율이라는 단어는 금방 떠올라서, '온도 전도율'로 검색했더니 올바른 표현인 '열전도율' 자료를 찾을 수 있었다. ...

Out-focusing of Lens

Photo 1. A toy car, a trial of out-focusing Yesterday I went to a photo studio, LemonTerrace Dongtan-branch , to take pictures of my son for celebrating the first birthday of him, " Dol ", a Korean word. The host of the studio, a specialist of camera, taught how to use my DSLR camera (EOS1000D) in details. Especially he demonstrated the out-focusing technique in case-by-case. A simple protocol for the beginner of DSLR user is the following; (1) Set 'Av' mode of your camera, (2) Set max number of ISO , my case is 1400, (3) adjust the distance between the camera and the subject as close as possible, (4) check shutter's speed by half pressing shutter button, (5) the speed should be lower than 1/60, (6) if too lower eg. 1/125, change ISO to be lower, (7) if higher speed, adjust the distance or set higher ISO, and (8) take a picture by full pressing. And he highly recommended for me to change my camera lens for portraits, 30 mm f/1.4 . The studio facilities and peo...

Running X-Window application without a screen

Batch processing of beautifying phylogenetic trees I prefer to use treedyn when I decorate many number of phylogenetic trees at  one time. In use on Linux desktop, my Perl script works well to convert from raw trees to beautified trees. However,  it was stopped at running of treedyn because Tcl of treedyn essentially required a DISPLAY of X-Window when  the script was integrated into an Web application. Xvfb (X virtual framebuffer) presents a virtual screen for user or program, which works only on memory not real video device. And xvfb-run.sh , a script of Xvfb, is an utility for command line program. $ xvfb-run.sh my_script.pl -infile tree.phy -outfile tree.ps Please see details of xvfb-run.sh  in Dascalita's very concise post . Additional comments Chevenet, the author of treedyn, and Christen recently published ScripTree that is more flexible and functional program for script developers. This ScripTree is also required to use Xvfb in case of backgrou...