About

I am a postdoctoral fellow at OICR working with professor Lincoln Stein in the department of Adaptive Oncology.

I started my PhD working in the field of Natural Language Processing and in about a year after, I moved to the amazing field of Computational Biology, joined Combine lab and started working under the supervision of professor Rob Patro. Since then, my main focus has been on designing and implementing time and space efficient data structures plus reference-based and reference-free indexing tools that are used to query, search, align, assemble or compare sequencing read datasets. I have also experience in abundance estimation specifically for metagenomic samples. In combine lab, we made use of different statistical approaches and probabilistic models in addition to succinct data structures and hashing algorithms to optimize our approximate alignments or sequence search on large collections of sequencing or assembled data and have a more statistically accurate interpretation of the estimated results.  I have recently become more and more interested in the application of such tools for better understanding of the sequencing data and novel discoveries via modeling and statistical analysis of such data. That is why I started my postdoc in a cancer research institute and specially with prof. Stein to get closer to the biology of things, specifically sensing the main computational challenges that arise in the midst of curing cancer!! or looking for the computational improvements that can be made to increase either the accuracy of different cancers’ diagnosis and prognosis or the performance of the dedicated pipelines. the computational pipelines. Before my PhD, I worked in industry on different areas for 5 years, mostly developing data mining and outlier detection applications in database platforms, and web enterprise applications in Java.

To see my complete CV, click here

List of Publications

2020 - Puffaligner: An efficient and accurate aligner based on the pufferfish index
Fatemeh Almodaresi, Mohsen Zakeri, and Rob Patro
Biorxiv.
2019 - An Efficient, Scalable and Exact Representation of High-Dimensional Color Information Enabled via de Bruijn Graph Search
Fatemeh Almodaresi, Prashant Pandey, Michael Ferdman, Rob Johnson, and Rob Patro
International Conference on Research in Computational Molecular Biology, 1-18.
2018 - Pufferfish: A space and time-efficient compacted de Bruijn graph index
Fatemeh Almodaresi, Hirak Sarkar, and Rob Patro
Bioinformatics, 34(13), i169-i177 (appeared in the proceedings of ISMB2018).
2018 - Mantis: A fast, small, and exact large-scale sequence search index
Prashant Pandey, Fatemeh Almodaresi, Michael A. Bender, Michael Ferdman, Rob Johnson, and Rob Patro
Cell Systems, 7(2), 201-207.
2017 - Rainbowfish: A Succinct Colored de Bruijn Graph Representation
Fatemeh Almodaresi, Prashant Pandey, and Rob Patro
In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017), volume 88 of Leibniz International Proceedings in Informatics (LIPIcs), pages 18:1–18:15.
2017 - On the distribution of lexical features at multiple levels of analysis
Fatemeh Almodaresi, Lyle Ungar, Vivek Kulkarni, Mohsen Zakeri, Salvatore Giorgi, and H Andrew Schwartz
In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, pages 79–84.
2017 - Improved data-driven likelihood factorizations for transcript abundance estimation
Mohsen Zakeri, Avi Srivastava, Fatemeh Almodaresi, and Rob Patro
Bioinformatics, 33(14):i142–i151. (appeared in the proceedings of ISMB 2017).
2013 - The relation between friendship and academic performance in university students: Role of personality
Fatemeh Almodaresi, Naser Mozayani, Mohammadreza Jahed Motlagh, and Meisam Ahmadi
Journal of Technology of Education, 8(2):81–92 (in Persian).

List of Projects

  • Mantis : is a space and time efficient data structure to index and query large collections of raw sequencing read experiments. The index is based on colored de Bruijn graph representation and therefore supports graph-based operations such as graph traversal, and bubble calling useful for assembly and variation detection. In our recent work we have advanced the index to more than 30,000 raw read sequencing samples and enabled the nice feature of gradual growth by making the index incrementally updatable.
  • Pufferfish+Puffaligner : This tool provides an efficient data structure for indexing colored compacted de Bruijn graphs which can achieve a balance between time and space resources by making use of succinct data structures and minimum perfect hash function. Pufferfish provides the underlying data structure for mapping short sequencing reads to a huge population of references while keeping the mapping information for each reference individually. Submitted to Recomb 2018. PuffAligner, our recent work, is a highly sensitive aligner on top of Pufferfish for aligning different types of short sequencing reads to a huge population of references, specifically good in the representation of high similarity in the reference sequences.
  • Rainbowfish : This tool provides a new data structure to store and query colored de Bruijn graphs that in case of large data sets improves storage by more than twenty times compared to state-of-the-art tools without hurting performance of the queries.
  • Grouper : which is an extension on RapClust, is a tool for clustering contigs of a de novo transcriptome assembly. We improved the accuracy of clustering by making use of orphan reads, for which each end of the pair is mapped to a different reference sequence.
  • MLDD : Multi-Level Distribution Detection project (MLDD) is the one I worked on when I was in Data Science and NLP lab in 2015. Using statistical tests and classification models such as NaiveBayes we show how distribution of NLP features in social media changes in different levels of analysis (county, user, and message). This can highly affect prior assumptions for further text analysis as we show that central-limit theorem could be applied in social media language analysis as well.
  • AutismFD : In this project I designed a game application to improve face emotion detection in children with Autism. Beside collaboration with psychology students to design the method, I also implemented the idea as a tool in C# language. This package was used in a treatment center to help children with Autism to identify face emotions and track their progress over time.
  • DDFactorization : We design a new efficient and accurate factorization of likelihoods across equivalence classes of read fragments in Salmon, a transcript-level quantification estimation tool. Using the new model, the estimation accuracy improves significantly without hurting the run time.

Click here to see my academic and professional CV

Contact