I’m a fourth year PhD student in Computer Science at Stony Brook University. I have worked on different areas after completing my undergrad, mostly developing data mining applications in database platforms, and web enterprise applications in Java. I have moved to the amazing field of Computational Biology for about a year now and joined Combine lab and started working under the supervision of professor Rob Patro. Most of my focus now is on designing and implementing time and space efficient data structures plus reference-based and reference-free indexing tools that are used to query, search, align, assemble or compare sequencing read datasets. We make use of different statistical approaches and machine learning methods in addition to succinct data structures and hashing algorithms to optimize our approximate alignments or assembly outputs and have a more statistically accurate interpretation of the estimated results. 

To see my complete CV, click here

List of Publications

2019 - An Efficient, Scalable and Exact Representation of High-Dimensional Color Information Enabled via de Bruijn Graph Search
Fatemeh Almodaresi, Prashant Pandey, Michael Ferdman, Rob Johnson, and Rob Patro
Accepted in RECOMB2019.
2018 - Pufferfish: A space and time-efficient compacted de Bruijn graph index
Fatemeh Almodaresi, Hirak Sarkar, and Rob Patro
Bioinformatics, 34(13), i169-i177 (appeared in the proceedings of ISMB2018).
2018 - Mantis: A fast, small, and exact large-scale sequence search index
Prashant Pandey, Fatemeh Almodaresi, Michael A. Bender, Michael Ferdman, Rob Johnson, and Rob Patro
Cell Systems, 7(2), 201-207.
2017 - Rainbowfish: A Succinct Colored de Bruijn Graph Representation
Fatemeh Almodaresi, Prashant Pandey, and Rob Patro
In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017), volume 88 of Leibniz International Proceedings in Informatics (LIPIcs), pages 18:1–18:15.
2017 - On the distribution of lexical features at multiple levels of analysis
Fatemeh Almodaresi, Lyle Ungar, Vivek Kulkarni, Mohsen Zakeri, Salvatore Giorgi, and H Andrew Schwartz
In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, pages 79–84.
2017 - Improved data-driven likelihood factorizations for transcript abundance estimation
Mohsen Zakeri, Avi Srivastava, Fatemeh Almodaresi, and Rob Patro
Bioinformatics, 33(14):i142–i151. (appeared in the proceedings of ISMB 2017).
2013 - The relation between friendship and academic performance in university students: Role of personality
Fatemeh Almodaresi, Naser Mozayani, Mohammadreza Jahed Motlagh, and Meisam Ahmadi
Journal of Technology of Education, 8(2):81–92 (in Persian).

List of Projects

  • Pufferfish : This tool provides an efficient data structure for indexing colored compacted de Bruijn graphs which can achieve a balance between time and space resources by making use of succinct data structures and minimum perfect hash function. Pufferfish provides the underlying data structure for mapping short sequencing reads to a huge population of references while keeping the mapping information for each reference individually. Submitted to Recomb 2018. Also, we are writing a very cool and efficient aligner on top of it that expect be since it takes advantage of the uniqueness property of unitigs in the de Bruijn graph.
  • Rainbowfish : This tool provides a new data structure to store and query colored de Bruijn graphs that in case of large data sets improves storage by more than twenty times compared to state-of-the-art tools without hurting performance of the queries.
  • Grouper : which is an extension on RapClust, is a tool for clustering contigs of a de novo transcriptome assembly. We improved the accuracy of clustering by making use of orphan reads, for which each end of the pair is mapped to a different reference sequence.
  • MLDD : Multi-Level Distribution Detection project (MLDD) is the one I worked on when I was in Data Science and NLP lab in 2015. Using statistical tests and classification models such as NaiveBayes we show how distribution of NLP features in social media changes in different levels of analysis (county, user, and message). This can highly affect prior assumptions for further text analysis as we show that central-limit theorem could be applied in social media language analysis as well.
  • AutismFD : In this project I designed a game application to improve face emotion detection in children with Autism. Beside collaboration with psychology students to design the method, I also implemented the idea as a tool in C# language. This package was used in a treatment center to help children with Autism to identify face emotions and track their progress over time.
  • DDFactorization : We design a new efficient and accurate factorization of likelihoods across equivalence classes of read fragments in Salmon, a transcript-level quantification estimation tool. Using the new model, the estimation accuracy improves significantly without hurting the run time.

Click here to see my academic and professional CV