- Making genomic data come alive with circos plots
- Circos plots are a great way to show genomic data and are famous (and infamous!) for their ability to show several different data types across dozens of chromosomes in a single plot. But it isn’t always easy to make these plots — this article covers some of your best options.
- What are circos plots for?
- Chromosomes
- Chord diagram
- Phylogeny
- The dark side of circos plots
- How to actually make a circos plot?
- The original circos
- R packages
- Circos table viewer
- Circa
- Circos plots are awesome, but…
- Обзор Circos: круг — это хорошо
- Pie Chart
- Circos
- The terrifying dinosaur corn genome
- Using News Reports to Track Wildlife Black Markets
- Circos on Cancer Discovery Covers
- Circos charts the placenta transcriptome
- Circos Maps America’s Restless Interstate Migration Without a Map
- Circos on cover of UCSF Magazine
- Circos on Cover of Cancer Cell
- Circos reaches 500 literature citations
- Circos deals with 8 Gb Rye Genome
- Circos Stages Mesolithic to Neolithic Transition
- Circos in 54 million pixels
- Circos Tracks CO2 Emissions
- Circos Round — Lotus Sacred
- 6.9e11 g of oil and Circos was there
- Plants Love Circos
- Circos for R
- Circos Interchange Diagrams — Networks and Flow
- Circos connects to the connectome
- Circos is the Method for Visualizing Translocations
- Circos Paints Chromosomes of Capsella Rubella
- Circos on the Cover Of Journal of Pathology
- Circos on the Cover Of Nature’s Asian Journal of Andrology
- What is Circos?
- Circular visualization
- Popular and Pretty
Making genomic data come alive with circos plots
Circos plots are a great way to show genomic data and are famous (and infamous!) for their ability to show several different data types across dozens of chromosomes in a single plot. But it isn’t always easy to make these plots — this article covers some of your best options.
Sep 26, 2017 · 7 min read
Circos is really the brainchild of Martin Krzywinski, who released it to the world in 2009 with this brilliant paper: “Circos: an Information Aesthetic for Comparative Genomics.”
What are circos plots for?
There are a few different types of data that circos plots can be very useful for within biology/genomics research. One way to think about this is what determines the coordinates around the circle.
Chromosomes
The first type is genomic data, where ea c h chromosome is a segment around the circle, and all of the data points for a chromosome are plotted onto specific positions within that chromosome’s slice of the pie. Chromosome-based circos plots are the ones that can take a lot of different types of data, as long as all the data points know which chromosomes they are on and what their positions are within those chromosomes.
A standard data format for a genomic circos plot would be where each row is a data point and each column represents a variable like chromosome, position, p-value, gene expression, etc.
The data for these chromosome-based circos plots can be any features that have chromosomes and positions: structural variants, repetitive elements, homology, evolutionary conservation scores, SNPs, genes, differential gene expression, DHS peaks, copy number profiles, ChIP-seq peaks, CpG islands, … the list goes on. This kind of data can be downloaded from the UCSC genome browser’s table browser, or generated yourself from sequencing data by running any of hundreds of different bioinformatics tools.
Chord diagram
The second type is a chord diagram, which describes relationships or flow between different things.
Chord diagrams have been used to show people changing jobs into different sectors, international migrations, and other quantifiable connections between different segments. For example, each segment is a country, and the coordinates within the segment correspond to a percentage of the population that emigrated from that country. Generally these chords can be arranged differently within each segment and still show the same information, which is clearly different from chromosome-based coordinates where the positions within a chromosome segment are important.
The data for a chord diagram usually takes the form of an adjacency matrix, such that the rows and columns have the same names, and the values in the matrix determine how wide the chords are between any two segments. The size of a segment is then the sum across a row or column in that adjacency matrix. Notice that the chord diagram above has two halves, one of which may be column names and the other row names. This is one way to deal with asymmetric adjacency matrices where you need to distinguish flow from A to B with flow from B to A. Other ways include showing both in the same chord but varying the sizes of the attachments to represent the difference between A->B and B->A flow.
Phylogeny
Another distinct type of circos plot is a circular phylogenetic tree. Here the radial coordinates correspond to leaf nodes on the phylogenetic tree. This is distinct from both chromosomal coordinates and from chord diagrams. The data to use for this type of circos plot is any dataset from which you could create a regular phylogenetic tree. One benefit to making the tree circular is increased space for the leaf nodes because you get the whole circumference of the tree to spread them out rather than just its diameter.
Those were some of the most common types of circos plots used in genomics. There are many more examples out there with months of the year, sports, and all sorts of other applications, many of which will still fall loosely into these categories — if you turn the chromosomes into general segments like months, seasons, etc.
The dark side of circos plots
For chromosome-based circos plots where many different kinds of data can fit on the same plot, there is a clear trend in the literature of cramming as much information in there as possible. Some people go overboard and plot upwards of 10 different datasets with clashing color schemes and tiny points. This leads other people to hate on circos plots, which is unfair since many circos plots show perfectly respectable amounts of data. The key as with all visualization is to focus on showing the patterns in the data that are actually interesting, rather than plotting everything you have.
On the other hand, a complicated circos plot can make a standard paper look quite impressive. If the sentiment you are going for is “we did every analysis under the sun” then an overcrowded circos plot is a great way to make that point.
How to actually make a circos plot?
You have a number of options for creating your circos plots, which range from extremely flexible to very limited applications and from taking days writing hundreds of lines of configuration to clicking around for 20 minutes.
The original circos
Circos itself is software written in Perl, and the way that you create a plot is by writing a long configuration file that says exactly how your plot should look. This option is very flexible and lets you make literally any kind of circos plot. The downside is that for most people I have talked to, it takes them at least a few days to make their first circos plot — and trust me those people were not lacking technical skills. Circos is a serious power tool, but it can be difficult to install (oh dependency hell) and takes a lot of reading and iteration to get the configuration file right. However, circos also has the most flexibility, so it is up to you to decide if it is worth the effort to create your super-customized masterpiece of a circos plot without taking any of the shortcuts below.
R packages
Various R packages now exist that let you create a circos plot by writing R code. These include circlize, RCircos, CIRCUS, and OmicCircos. Each of these packages have pros and cons relative to each other, so I recommend finding one with examples that resemble what you want to do and seeing if you can adapt the example code to your needs.
Circos table viewer
For making chord diagrams from an adjacency matrix, there is a great web tool called the circos table viewer, that makes the process much easier than working with the original circos software in perl.
Circa
For chromosome-based circos plots, Circa allows you to make circos plots without writing any code. It doesn’t do everything imaginable like the original circos does, but it is very easy to use and allows you to tweak everything in real time.
Full disclosure: I made Circa myself after struggling with circos plotting for weeks during graduate school, and it is a paid tool. But check out this 1-minute demo video if you want to see it in action.
If you find the original circos software and the R packages too time-consuming, Circa will get you results much faster and without any coding, but only for the chromosome-based circos plots. For chord diagrams, check out the free circos table viewer, and for phylogeny circos plots, try the circlize R package.
Circos plots are awesome, but…
A wise man once said “With great power comes great responsibility”
With the power to create beautiful circos plots from your genomic data comes the responsibility to avoid the dark side and resist stuffing twenty datasets into a single plot. Regardless of your motivations for good or evil, the options above will help you get there.
Leave a comment and tell me if you love/hate circos plots!
Обзор Circos: круг — это хорошо
Circos — открытый программный пакет для визуализации данных и информации. Он визуализирует данные в форме круга, что идеально подходит для изучения связей между объектами. Также, это просто красиво.
Наверняка, каждому из вас хотя бы раз приходилось представлять информацию в графическом виде. И наверняка казалось, что на обычный график или диаграмму помещается слишком мало информации, а ведь хочется рассказать и показать так много. Особенно эта проблема (нехватки места) проявляется в тот момент, когда нужно показать связи одного объекта с другим, их схожести и различия. И тут нам на помощь приходит … круг.
Pie Chart
Многие из вас скорее всего использовали круговые диаграммы. Но даже если нет, то наверняка согласитесь, что из представленных ниже способов донести информацию, наличие у Вовы контрольного пакета акций нагляднее всего показывает именно круговая диаграмма:
Имя | Количество акций |
---|---|
Ваня | 2345 |
Петя | 3454 |
Вова | 5989 |
Однако у стандартных круговых диаграмм очень узкая область применения и добавление еще одного столбца в таблицу делает их использование в данном примере невозможным.
Circos
Circos же легко справляется с такой задачей. Продемонстрирую преимущества их «кругового» подхода на следующем простом примере — пусть у нас есть три вида автомобилей (Small, Medium, Executive) и три страны где они продавались (A, B, C). Количество проданных автомобилей в зависимости от страны можно описать, например, такой таблицей:
Тип автомобиля \ Страна | A | B | C |
---|---|---|---|
Small | 120 | 300 | 340 |
Medium | 150 | 100 | 45 |
Executive | 30 | 15 | 25 |
А вот в такую красочную картинку можно автоматически превратить эти цифры при помощи Circos:
По периметру круга располагаются метки наших столбцов и строк. Слева мы видим страны, а справа классы машин. Цвет, привязанный к каждой из меток, изображен на внутренней (самой толстой) окружности, так, например, стране А соответствует красный цвет, а классу машин Small — фиолетовый. Длина окружности пропорциональна доле данной страны среди всех стран или доле данного класса машин среди всех. Полоски, идущие через центр круга, демонстрируют в какой пропорции машины определенного класса покупались в какой стране, так по фиолетовым полосам видно, что машины малого класса практически одинаково популярны в странах В и С, а в стране А закупаются в два с лишним раза меньше.
Для удобства, все полосы в правом полукруге начинаются с небольшого участка, покрашенного в тот цвет, куда эти полосы ведут. На диаграмме нанесены как абсолютная (в штуках машин, внутренняя окружность), так и относительная (в процентах, внешняя окружность) шкалы, позволяющие понять как долю различных классов машин в какой-то стране, так и распределение каждого класса по всем странам.
Конечно, рассмотренный пример является искуственным и может показаться, что Circos создан исключительно для развлекательных целей, однако это не так. Circos создавался как визуализатор для сложных данных в биоинформатике, в частности из области сравнительной геномики. Именно необходимость исследовать структурные различия между геномами вдохновила Мартина (Martin Krzywinski) из Canada’s Michael Smith Genome Sciences Centre к созданию этого инструмента в 2004 году. С тех пор Circos использовался во множестве биоинформатических (и не только) проектов, а созданные с его помощью графики красовались на обложках крупнейших научных журналов.
Для создания иллюстрации из этой статьи я использовал онлайн версию визуализатора, но Circos можно установить и локально скачав дистрибутив с официального сайта.
Статью написал мой коллега AlexeyGurevich, но в силу отсутствия приглашения не смог опубликовать.
UPD: Ниже SergeiStartsev упомянул JavaScript библиотеку D3, которая тоже умеет делать подобные диаграммы, да ещё и с динамикой для веба.
The terrifying dinosaur corn genome
Amblin Entertainment and Legendary Pictures, the studios that produced Jurrasic World, try to inject genome science into the movie. Unfortunately, since we don’t quite know how to construct viable genomes of extinct species, much less grow the creatures themselves, we don’t know whether the depiction of the science is right. Perhaps theirs is exactly what a genome lab would look like in a dino-building facility.
But, we can get fewer things wrong. In the Creation Lab companion website, a Circos image is used to illustrate a triceratops genome.
Unfortunately, this is an image of the B73 Maize reference genome (B73 RefGen_v1), as published in Nature’s The B73 Maize Genome: Complexity, Diversity, and Dynamics.
Schnable PS Ware D Fulton RS et al. 2009 The B73 maize genome: complexity, diversity, and dynamics Science 326 ( 5956 ) 1112-1115
Using News Reports to Track Wildlife Black Markets
THE INTERNATIONAL BLACK market in wildlife—alive or dead—is notoriously difficult to track. Hunters and smugglers don’t report their take for the same reasons that drug dealers don’t report profits to the IRS. But if you could actually track those networks, maybe you could do something about them. That’s what sent Nikkita Patel, a veterinary epidemiologist at the University of Pennsylvania, to an unusual source of data on the illegal wildlife trade: the news.
The image shows the illegal global rhinoceros trade network before (top) and after (bottom) a hypothetical targeted disruption. Created with Circos online table viewer.
Circos on Cancer Discovery Covers
The July 2013 issue cover shows a Circos plot of relative copy number changes in 38 oral squamous cell carcinoma tumors.
The September 2012 issue cover shows a collection of Circos images of somatic mutations in melanoma tumors.
Circos charts the placenta transcriptome
Saben et al. use Circos to visualize the transcriptome and gene expression of placenta from 20 healthy women in their article A comprehensive analysis of the human placenta transcriptome.
Circos Maps America’s Restless Interstate Migration Without a Map
Circos on cover of UCSF Magazine
The Fall 2013 issue of UCSF Magazine has my Circos illustration of personalized medicine. The human outline motif is incorporated into other design elements in the issue.
The look of the image is inspired after Nature’s Encode cover by Carl De Torres.
To learn how to generate the cover and variants, read the Circos Encode Cover Tutorial.
Circos on Cover of Cancer Cell
Yang et al. used network analysis approaches characterize a subtype of ovarian cancer associated with poor overall survival.
E-cadherin is a protein encoded by the CDH1 gene and is responsible for cell-cell adhesion. Yang linked the expression of E-cadherin to specific miRNAs that influenced the regulatory network singled out in this cancer subtype.
Circos reaches 500 literature citations
To celebrate, I’ve made a commemorative poster that features over 400 Circos images from the literature.
Circos deals with 8 Gb Rye Genome
Because of its large 8 Gb genome, the genomic analysis of rye has lagged behind other cereals.
To address this, Martis et al. eastablished a linear gene order model for 72% of the rye genes based on synteny information from rice, sorghum and B. distachyon.
Although it appears that six major translocations shaped the modern rye genome, highly dissimilar conserved syntenic gene content, gene sequence diversity signatures, and phylogenetic networks were found for individual rye syntenic blocks.
Martis MM, Zhou R, Haseneyer G et al. 2013 Reticulate Evolution of the Rye Genome Plant Cell
Circos Stages Mesolithic to Neolithic Transition
Bollongino et al. present evidence of a slow transition between Mesolithic hunter-gatherer groups to Neolithic farmers.
Previous theories that the foragers disappeared shortly after the arrival of farmers are at odds with palaeogenetic and isotopic data analysis from Neolithic human skeletons from the Blätterhöhle burial site in Germany. Instead of an abrupt transition, the data suggest a more complex pattern of coexistence that persisted for over 2000 years.
Circos in 54 million pixels
Ruddle et al. demonstrate their commodity hardware 54 million pixel data display in exploring copy number variation data.
Circos Tracks CO2 Emissions
Kanemoto et al. report on the disturbing trend of emissions leakage, in which developing countries are displacing emissions intensive production offshore.
The report confirms previous findings that adjusting for trade, developed countries emissions have increased, not decreased. A connection is made to the kind of emissions displacement that has already occurred for air pollution, where despite aggressive legislation in major emitters total global air pollution emissions have increased.
The conclusion warns us that «if regulatory policies do not account for embodied imports, global emissions are likely to rise even if developed countries emitters enforce strong national emissions targets.»
Circos Round — Lotus Sacred
The pleasing roundness of Circos is used by Ming et al. to depict the Sacred Lotus genome in the publication «Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.).
The Sacred lotus has religious significance in both Buddhism and Hinduism and has been used as a food and herbal medicine product in Asia for over 7,000 years. Its seeds have exceptional longevity, remaining viable for as long as 1,300 years.
The plant is known for its exceptional water repellency, known as the lotus effect. The latter property is due to the nanoscopic closely packed protuberances of its self-cleaning leaf surface, which have been adapted for the manufacture of a self-cleaning industrial paint, Lotusan.
6.9e11 g of oil and Circos was there
Rivers et a. describe the effects of the Deepwater Horizon blowout on the microbial blooms of petroleum-degrading bacteria.
By sequencing 66 million community transcripts, the identity of metabolically active microbes and their roles in petroleum consumption was revealed.
Plants Love Circos
Circos frequently appears in plant literature, twice on the cover of Plant Biotechnology Journal in the last year.
Circos has appeared 8 times each in the Plant Journal and Plant Cell.
Circos for R
Zhang et al. implement Circos in R.
Same round shape you expect. And now, in everyone’s favourite open source statistics and data analysis environment.
Circos Interchange Diagrams — Networks and Flow
Zeng et al. introduce a new type of visualization based on Circos, the interchange diagram, in their paper Visualizing Interchange Patterns in Massive Movement Data.
The design is applied to displaying movement data, such as daily trips made by passengers in a city. By incorporating interactivity, this visualization method is helpful to understand interchange patterns at different spatial (between trains, between cities) and time scales (different times of day).
Circos has been used for urban planning before. The town of Caceres in Spain has used Circos to communicate their urban planning strategy.
Zeng W, Fu C-W, Arisona SM et al. 2013 Visualizing Interchange Patterns in Massive Movement Data Computer Graphics Forum 32 : 271-280
Circos connects to the connectome
Methods to visualize the connectome are reviewed in Craddock et al — Circos is one of them.
A good layman description of the work can be found at the neurosceptic blog.
Circos is the Method for Visualizing Translocations
Genomic rearrangements can cause disease and are implicated in many cancers. Being able to see the patterns in these changes across samples and patients is important.
In the review article End-joining, Translocations and Cancer, Bunting and Nussenzweig demonstrate how compositing the genome circularly adds value and clarity to the presentation.
Bunting SF, Nussenzweig A 2013 End-joining, translocations and cancer Nat Rev Cancer
Circos Paints Chromosomes of Capsella Rubella
Slotte et al. use Circos to show the genomic structures, chromosome painting and comparative genomic mapping in C. rubella, A. lyrata and A. thaliana.
Their figure illustrates how Circos is effective at showing two-way comparisons of syntenic structure. For three-way comparison, consider hive plots.
Circos on the Cover Of Journal of Pathology
The June 2013 issue of the Journal of Pathology features a pair of Circos plots on the cover. The images are from the paper by Weier et al. describing TMPRSS2 and ERG rearrangements in prostate cancer.
«TMPRSS2–ERG rearrangements occur in approximately 50% of prostate cancers and therefore represent one of the most frequently observed structural rearrangements in all cancers.»
Circos on the Cover Of Nature’s Asian Journal of Andrology
The May 2013 Special Issue of Asian Journal of Andrology presents the outcomes from the Sixth Annual Forum on Prostate Disease (6th FPD), which was held on June 8-9, 2012 in Shanghai, China [source: nature.com]. The cover art for the issue shows a Circos plot of 90 significantly recurrent molecular alterations in prostate cancer from an analysis of 372 prostate tumors discussed in the Wyatt et al. review article.
The review summarizes the current state of understanding of prostate cancer, «including the sentinel role of copy number variation, the growing spectrum of oncogenic fusion genes, the potential influence of chromothripsis, and breakthroughs in defining mutation-associated subtypes. Increasing evidence suggests that genomic lesions frequently converge on specific cellular functions and signalling pathways, yet recurrent gene aberration appears rare».
What is Circos?
Circular visualization
Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive.
Circos is ideal for creating publication-quality infographics and illustrations with a high data-to-ink ratio, richly layered data and pleasant symmetries. You have fine control each element in the figure to tailor its focus points and detail to your audience.
Circos is flexible. Although originally designed for visualizing genomic data, it can create figures from data in any field—from genomics to visualizing migration to mathematical art. If you have data that describes relationships or multi-layered annotations of one or more scales, Circos is for you.
Circos can be automated. It is controlled by plain-text configuration files, which makes it easily incorporated into data acquisition, analysis and reporting pipelines (a data pipeline is a multi-step process in which data is analyzed by multiple and typically independent tools, each passing their output as the input to the next step).
Popular and Pretty
Have you noticed how beautifully everyday science and technology is rendered in movies? Information is delivered seamlessly from interfaces oozing with style and function. While others complain that the movie doesn’t get the science facts right, I contrarily note that it doesn’t get the science look right. No busy scientist is able to make such great design and type face choices!
Circos attempts to bring a different aesthetic to science and strike a balance between flexibility and ease-of-use. Circos makes no assumptions about your data, uses extremely simple input data format, and makes image creation and customization easy. It’s helping to make science look better, one figure at a time.