About This Project
AI-driven protein design is undergoing a transformation, driven by recent breakthroughs such as AlphaFold. However, the application of AI for engineering enzymes for climate-related applications like carbonic anhydrase remains a challenge that necessitates the design of an intelligent pipeline and computational metrics. In this project, we will leverage existing AI architectures, propose new AI architectures and MD tools to engineer more stable variants of CAs.
Ask the ScientistsJoin The Discussion
What is the context of this research?
Atmospheric CO2 removal (CDR) and point-source capture (PSC) of CO2 are well-accepted as being necessary for successfully decarbonizing within climate goals (1). Direct air capture (DAC) is a CDR pathway with ideal verifiability and durability. Both DAC and PSC are cost constrained, primarily by the CapEx of the gas contactor and the energy required to drive large swings in temperature or pH to regenerate CO2 from the capture material (2).
Those high cost and energy requirements are driven by a thermodynamic trade-off between the rate of CO2 absorption and the CO2 regeneration energy: CO2 capture materials with high absorption rate, which reduce cost by reducing the gas contactor size, typically have high CO2 regeneration energy, and vice versa (3).
What is the significance of this project?
Carbonic anhydrases (CAs) catalyze fast CO2 absorption in solvents with low CO2 regeneration energy, resolving the tradeoff described above (4). CA could reduce DAC and PSC cost by reducing parameter swing size or gas contactor size, if it were stable in DAC or PSC processes that may include high pH, temperature, or ionic strength. E.g., thermostable CA via protein engineering (PE) can already reduce PSC cost >30% (5,3).
AI-driven PE and screens of many natural variants are revolutionizing PE but haven’t been applied to CA. Ultrastable CAs produced using those tools likely could reduce DAC and PSC cost substantially. While modeling is needed to quantify application-specific benefits and target CA properties, PE for ultrastable CA can begin now and later be adapted to specific uses.
What are the goals of the project?
While modeling analyses are ultimately required to provide target properties for ultrastable CAs to be used in development of novel CA-enhanced DAC and PSC, initial efforts to use AI-based PE and screens of many variants should target many-fold CA stability improvements compared to the state-of-the-art while retaining high activity (kcat/kM ~108 M-1s-1). For a comprehensive discussion of state-of-the art CA engineering and performance, see (6) and (4).
Examples of the state of the art are:
- temperature stability
- pH stability
- 90% activity retention after 24 hrs at pH 11.0 (9)
Stability demonstrations should be performed in solvents relevant to DAC and PSC, such as 10-20% K2CO3.
The pipelines we proposed and the in-silico metrics are computationally intensive. We have some in-house computational capabilities already, but we would utilize this budget to expand our access to more compute. We will utilize some of the budget to also keep the pipeline code up to date when any new models are released.
We estimate that the project will take 5-6 months to complete. So we are providing a timeline for the project month by month.
Dec 31, 2023
Phase 1: Data Collection - All CAs in the literature, evolutionary data, sequences,structures (PDB or af2). Training supervised models.
Jan 31, 2024
Phase 2: Completion of proposed Structure and Sequence Pipelines
Mar 31, 2024
Phase 3: Developing New Architectures for Joint Sequence and Structure Modeling
Apr 15, 2024
Phase 4: In-silico Metrics and Organization of Data for Web Interface
May 31, 2024
Phase 6: Publication of the code for pipelines, models, server-less web interface code and deployment of web UI
Meet the Team
We are an academic team. The work will be primarily done by Manvitha Ponnapati and Allan Costa who are PhD students at MIT advised by Dr. Joseph Jacobson. Our lab is primarily a lab focused on engineering computational tools for protein design. Members on our team have successfully designed proteins for therapeutic and diagnostic applications in the past. This project is one of our first explorations of protein engineering for climate problems.
Manvitha Ponnapati on our team has experience in machine learning, drug discovery and protein engineering. Her work primarily focuses on translating deep learning models for protein engineering to intelligent libraries for experimental screens.
Allan dos Santos Costa
Allan dos Santos Costa has experience in building foundational models that enable rational protein engineering pipelines. He is one of the contributors to ESMFold, which enabled structure prediction of 617 million metagenomic protein sequences. His work primarily focuses on physics-inspired architectures with applications to small-molecule, nucleic acid and protein data.
Submitting a solution statement to a problem statement from the homeworld.bio's repository so copying the problem statement here.
- $0Total Donations
- $0Average Donation