By analyzing bacterial data, researchers have discovered thousands of rare new CRISPR systems that have a range of functions and could enable gene editing, diagnostics, and more.
Microbial sequence databases contain a wealth of information about enzymes and other molecules that could be adapted for biotechnology. But these databases have grown so large in recent years that they’ve become difficult to search efficiently for enzymes of interest.
Now, scientists at the McGovern Institute for Brain Research at MIT, the Broad Institute of MIT and Harvard, and the National Center for Biotechnology Information (NCBI) at the National Institutes of Health have developed a new search algorithm that has identified 188 kinds of new rare CRISPR systems in bacterial genomes, encompassing thousands of individual systems. The work appears in Science.
The algorithm, which comes from the lab of pioneering CRISPR researcher Professor Feng Zhang, uses big-data clustering approaches to rapidly search massive amounts of genomic data. The team used their algorithm, called Fast Locality-Sensitive Hashing-based clustering (FLSHclust) to mine three major public databases that contain data from a wide range of unusual bacteria, including ones found in coal mines, breweries, Antarctic lakes, and dog saliva. The scientists found a surprising number and diversity of CRISPR systems, including ones that could make edits to DNA in human cells, others that can target RNA, and many with a variety of other functions.
The new systems could potentially be harnessed to edit mammalian cells with fewer off-target effects than current Cas9 systems. They could also one day be used as diagnostics or serve as molecular records of activity inside cells.
The researchers say their search highlights an unprecedented level of diversity and flexibility of CRISPR and that there are likely many more rare systems yet to be discovered as databases continue to grow.