Text to DNA Encoder
An exploration of encoding digital information into synthetic DNA sequences. Click on the Live Site link to test it out for yourself!
DNA has a storage density that makes hard drives look primitive. A single gram can theoretically hold hundreds of petabytes (Church et al., 2012). This project is the beginning of an exploration into what it actually takes to encode information that way. This is currently just a simple set up that converts text to a DNA oligo sequence that you can actually order from IDT! However, I am continuing to build upon this infrastructure to explore encoding more complex information like photographs, math, etc and optimize data storage methods to reduce the number of nucleotide base pairs required per bit of information.
The encoder takes a text input and maps it to a sequence of nucleotides (A, C, G, T) using a rotational character set scheme. The challenge isn't just encoding — it's encoding well. A DNA strand with poor GC content or long runs of the same base is difficult to synthesize and sequence reliably (Goldman et al., 2013). So the algorithm searches across 64 rotational shifts to find an oligo that's both accurate and physically viable.
It's early stage, but the foundation is there: encode, verify quality, decode back. I’ve incorporated error correction with Reed-Solomon but will be including more features soon (Organick et al., 2018).
References
Church, G. M., Gao, Y., & Kosuri, S. (2012). Next-generation digital information storage in DNA. Science, 337(6102), 1628. https://doi.org/10.1126/science.1226355
Goldman, N., Bertone, P., Chen, S., Dessimoz, C., LeProust, E. M., Sipos, B., & Birney, E. (2013). Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature, 494(7435), 77–80. https://doi.org/10.1038/nature11875
Organick, L., Ang, S. D., Chen, Y. J., Lopez, R., Yekhanin, S., Makarychev, K., Rishi, M., Sarkar, G., Kosuri, S., Joesaar, A., Bhattacharya, T., Ward, B., Nguyen, C., Lennon, M., Milhem, M., Koul, S., Gu, X., Strauss, K., Yuan, L., & Ceze, L. (2018). Random access in large-scale DNA data storage. Nature Biotechnology, 36(3), 242–248. https://doi.org/10.1038/nbt.4079