Plant Breeder & Data analyst
Current Position - Genetics Intern - Syngenta
Summary:
Ph.D. in Plant Breeding and Genetics with 6 years of research experience in genetics, genomics, and data analysis of crop plants. Skilled in experimental design, statistical analysis, handling large genomic data and development and deployment of R/Shiny packages for plant breeding applications. Interested in a research or industry role involving genetics, plant breeding, bioinformatics, or Artificial Intelligence for plant breeding.
Publication Statistics:
Cumulative Impact Factor: 34.171
Total Citations: 39 Google Scholar
Technical Skills:
- Programming Languages: Proficient in R, Python, Linux command line; experience with Shiny, Google Colab.
- Phenomics and Genomics: Expertise in molecular breeding techniques, NGS data analysis, QTL/GWAS, genomic selection, designing field trials, managing phenotyping pipelines, and analyzing high-throughput phenotypic and genotypic data.
- Data Science: Multivariate analysis, machine learning, large scale genomic data analysis and visualization.
🎓 Education
🔬 Research Experience
Ph.D. research @TNAU (2021 - 2023)
- Developed and applied advanced tissue culture techniques like embryo culture to rapidly generate somaclonal variants of rice with improved salinity stress tolerance.
- Identified and validated novel genomic regions (QTLs) governing salinity tolerance through molecular mapping in rice F2 populations along with thorough phenotypic and biochemical characterization.
- Employed multi-omics approaches including GC-MS metabolomics to uncover key genes and metabolites linked to superior osmotic/ionic adjustment under salt stress.
Research scholar @ ICRISAT (2018 - 2020)
- Single plant phenotying of diverse germplasm accessions (Sorghum, Peral millet, Pigeon pea – genotyped 1,980 plants/samples combined) to examine intra and inter accession genetic diversity.
- Single plant genotyping of accessions using DArTSeq based SNPs.
- Harnessed whole genome genotyping, simulations, and predictive modeling to provide integrated insights into the dynamics of genetic variation across a species’ range, by quantifying genomic diversity within and between isolated subpopulations, and developed optimized statistical frameworks to guide sustainable sampling regimes that limit genetic drift.
💼 Work experience
Data Science Consultant @ Fiverr (Decembe 2020 - Present)
- 5⭐ rated freelancer with over 200 hours of data analysis and visualization projects completed in R with over six years of experience specializing in genomic data analysis and bioinformatics.
- Proficient in handling big data and performing complex modelling using popular R libraries and packages such as tidyr, data.table, dplyr, plyr, tenserflow, ggplot, ggdendro, ggtree, ggheatmap, and circos.
- Received positive feedback from clients for knowledge, professionalism, and mastery of R programming. Demonstrated ability to deliver high-quality work as evidenced by the portfolio available on the Profile.
💻 Programming and data analysis skills
- Proficient in full stack development of R packages using modular coding practices.
- Created production-grade Shiny web applications for interactive data analysis and visualization and; expertise in dependency management tools like Golem for scalable deployment.
- Machine learning model building for image classification/segmentation tasks; trained CNNs and other deep learning architectures in R, Python (PyCharm).
- Multivariate data analysis of large-scale omics datasets including genomics, phenomics and metabolomics using cutting-edge bioinformatics tools.
- Experience with analysis of next-generation sequencing data including quality control, read mapping, variant calling, expression quantification, metagenomic profiling, and associated statistical analysis using standard workflows in R and Python.
- Advanced visualization for multi-dimensional biological data through Circos, ggtree, ggtreeextra, Cytoscape and other platforms.
Additional Skills:
- Git/GitHub for version control and collaborative coding.
- High performance computing on clusters for scalable data analysis.
- Bioconductor for analyzing genomic/transcriptomic experiments.
- Workflow automation to enhance reproducibility and to faster workflow.
🌱 R Packages developed
✅ PBGeno
GitHub
Developed pbgeno, an R package to streamline data analysis workflows for plant breeders. The package provides functions for calculating genetic distances, clustering genotypes, estimating diversity statistics, creating publication-quality visualizations, and automating routine tasks. Key features include calculating genetic distance matrices, structure-based clustering, polymorphism quantification, and converting proprietary marker genotypes into standardized formats for genome-wide association mapping.
✅ PBPerfect
PBPerfect (Visit Page) is a interactive web tool enabling reproducible multivariate analysis with visualization of phenotypic and genotypic data. It features basic statistics, experimental designs, SSR workflows, multivariate analysis, mating designs, and dynamic graphics with outputs exported as publication-standard tables and graphics requiring no further formatting.
✅ PBMLT: Plant Breeding Multilocation Trail Data Analysis Software
PBMLT(Visit Page) is a comprehensive and user-friendly platform that provides plant breeders with an all-in-one solution for analyzing multi-environment trial data through:
- Powerful Analysis of Variance (ANOVA) to explore significance of variation across locations and treatments.
- Additive Main Effects and Multiplicative Interaction (AMMI) analysis for in-depth genotype-environment interaction studies.
- Calculation of essential AMMI-based stability indices like ASV, ASTAB, ASI, MASI, SIPC, ZA for identifying adaptable lines.
- Evaluation of overall productivity using metrics such as mean AVAMGE for high-yielding genotype selection.
- Scaled stability measures like SSIASTAB and ASI_SSI for ranking lines based on trait stability.
- Interactive visualizations including Biplots, GGE plots, WASS plots for straightforward interpretation of complex data.
- Centralized platform integrating meta-analysis, statistical analytics, and genotype-environment interaction analysis.
✅ PBlinkagemap
PBlinkagemap (Visit Page) enables easy creation of linkage maps and identification of associated quantitative trait loci (QTLs) from genomic and phenotypic datasets.
It allows users to:
- Import chromosome, marker, map distance and trait score data
- Interactively explore results on linkage maps
- Visualize QTL locations and effects
By handling computationally intensive linkage analysis and mapping behind the scenes, PBlinkagemap makes it simple for users to go from datasets to QTL discovery through an intuitive interface.
✅ PB-GWAS 🧬
PB-GWAS (Visit Page) makes powerful genome-wide association studies accessible through an easy-to-use web app 👩💻
Key features:
📥 File upload in 4 clicks
🏃♂️ One-click GWAS launch
⚙️ Adjust parameters via sidebar
📈 Interactive result plotting
📄 Full PDF report downloading
By eliminating coding barriers, PB-GWAS allows both new and advanced users to leverage GAPIT workflows with no programming expertise required!. Whether you want to map simple or complex traits, PB-GWAS provides the automated analysis to accelerate discoveries 🔬
✅ PBHaploMineR 🧬
PBHaploMineR (Visit Page) provides a toolkit to streamline pangenome haplotype mining and comparison from next-generation sequencing data. This R package aims to make large-scale haplotype analysis efficient and accessible for species with reference pangenomes.
Key Features:
- Sequence Import - Functions to import raw reads from multiple platforms and store in standardized schema
- Haplotype Calling - Optimized algorithms for pangenome-wide haplotype calling, incorporating structural variation
- HapViz - Interactive visualization system to explore and compare haplotypes in context of pangenome structure
- HapCompare - Statistically compare haplotypes between groups of samples/accessions and identify associated genomic signatures
- Parallelization - Built-in parallelization to scale analyses across HPC infrastructure
PBHaploMineR is still under development and testing. ETA for first stable version is Q1 2024.
🎤 Workshop and Conferences
✍️ Articles & Blogs
📜 Publications
- Allan Victor, Mani Vetriventhan, Ramachandran Senthil, S. Geetha, Santosh Deshpande, Abhishek Rathore, Vinod Kumar, Prabhat Singh, Surender Reddymalla, and Vânia CR Azevedo. “Genome-wide DArTSeq genotyping and phenotypic based assessment of within and among accessions diversity and effective sample size in the diverse sorghum, pearl millet, and pigeonpea landraces.”Frontiers in Plant Science 11 (2020): 587426. (DOI: https://doi.org/10.3389/fpls.2020.587426)
- Allan, V. (2023) ‘PB-Perfect: A Comprehensive R-Based Tool for Plant Breeding Data Analysis’, PB - Perfect. Available at: https://allanbiotools.shinyapps.io/pbperfect/.
- Backiyalakshmi, C., Mani Vetriventhan, Santosh Deshpande, C. Babu, Allan Victor, D. Naresh, Rajeev Gupta, and Vania CR Azevedo. “Genome-wide assessment of population structure and genetic diversity of the global finger millet germplasm panel conserved at the ICRISAT Genebank.” Frontiers in Plant Science 12 (2021): 692463. (DOI: https://doi.org/10.3389/fpls.2021.692463)
- Vetriventhan, Mani, Hari D. Upadhyaya, Vania CR Azevedo, Allan Victor, and Seetha Anitha. “Variability and trait‐specific accessions for grain yield and nutritional traits in germplasm of little millet (Panicum sumatrense Roth. Ex. Roem. & Schult.).” Crop Science 61, no. 4 (2021): 2658-2679. (DOI: https://doi.org/10.1002/csc2.20527)
- Jagadesh, M., Duraisamy Selvi, Subramanium Thiyageshwari, Thangavel Kalaiselvi, Allan Victor, Munmun Dash, Keisar Lourdusamy, Ramalingam Kumaraperumal, Pushpanathan Raja, and U. Surendran. “Exploration of microbial signature and carbon footprints of the Nilgiri Hill Region in the Western Ghats global biodiversity hotspot of India.” Applied Soil Ecology (2023): 105176 (DOI: https://doi.org/10.1016/j.apsoil.2023.105176).
- Jagadesh, M., Cherukumalli Srinivasarao, Duraisamy Selvi, Subramanium Thiyageshwari, Thangavel Kalaiselvi, Aradhna Kumari, Santhosh Kumar Singh, Allan Victor “Quantifying the Unvoiced Carbon Pools of the Nilgiri Hill Region in the Western Ghats Global Biodiversity Hotspot—First Report.” Sustainability 15, no. 6 (2023): 5520. (DOI: https://doi.org/10.3390/su15065520)
- Jagadesh, M., Duraisamy Selvi, Subramanium Thiyageshwari, Cherukumalli Srinivasarao, Thangavel Kalaiselvi, Keisar Lourdusamy, Ramalingam Kumaraperumal, and Victor Allan. “Soil Carbon Dynamics Under Different Ecosystems of Ooty Region in the Western Ghats Biodiversity Hotspot of India.” Journal of Soil Science and Plant Nutrition 23, no. 1 (2023): 1374-1385. (DOI: https://doi.org/10.1007/s42729-023-01129-2)
- Allan Victor., N. Meenakshi Ganesan, R. Saraswathi, R. Gnanam, and C. N. Chandrasekhar. “Exploring the phenotypic diversity of rice: A multivariate analysis of local landraces and elite cultivars of Tamil Nadu and Exotic Lines.” Electronic Journal of Plant Breeding 14, no. 3 (2023): 857-866. (DOI: 10.37992/2023.1403.099)
- Allan Victor, S. Geetha, Mani Vetriventhan, and Vânia CR Azevedo. “Genetic diversity analysis of geographically diverse landraces and wild accessions in sorghum.” Electronic Journal of Plant Breeding 11, no. 03 (2020): 760-764. (DOI: https://doi.org/10.37992/2020.1103.125)
📚 References
| |
|
| Name: |
Dr. Vania de Azevedo |
| Position: |
Former Head, Plant Genetic Resources |
| Organization: |
ICRISAT, Hyderabad, India |
| E-mail: |
azevedovcr@gmail.com |
| LinkedIn: |
Visit Page |
| |
|
| Name: |
Dr. Mani Vetriventhan |
| Position: |
Senior Scientist, Plant Genetic Resources |
| Organization: |
ICRISAT, Hyderabad, India |
| E-mail: |
M.Vetriventhan@cgiar.org |
| LinkedIn: |
Visit Page |
| |
|
| Name: |
Mr. Rajaguru Bohar |
| Position: |
Regional Genotyping Coordinator (South Asia) / Senior Scientist (Project management) |
| Organization: |
CIMMYT |
| E-mail: |
wishmeguru@gmail.com |
| LinkedIn: |
Visit Page |
| |
|