Supramolecular Cylinders Target Bulge Structures in the 5′ UTR of the RNA Genome of SARS‐CoV‐2 and Inhibit Viral Replication

Abstract The untranslated regions (UTRs) of viral genomes contain a variety of conserved yet dynamic structures crucial for viral replication, providing drug targets for the development of broad spectrum anti‐virals. We combine in vitro RNA analysis with molecular dynamics simulations to build the first 3D models of the structure and dynamics of key regions of the 5′ UTR of the SARS‐CoV‐2 genome. Furthermore, we determine the binding of metallo‐supramolecular helicates (cylinders) to this RNA structure. These nano‐size agents are uniquely able to thread through RNA junctions and we identify their binding to a 3‐base bulge and the central cross 4‐way junction located in stem loop 5. Finally, we show these RNA‐binding cylinders suppress SARS‐CoV‐2 replication, highlighting their potential as novel anti‐viral agents.

Introduction SARS-CoV-2 is anovel coronavirus that causes COVID-19 and as of 1 st March 2021 there have been 113 267 303 recorded cases and 2520 550 deaths worldwide. [1] Emerging so soon after other major coronavirus outbreaks (SARS, MERS), this global pandemic has highlighted the need for greater preparedness to tackle newly emergent viruses that may spread with lethal consequences.F undamental understanding of viral processes needs to be coupled to the development of av ariety of broad-acting antiviral strategies to interfere with these processes,i no rder to maximise the armory of drugs that we have available to treat novel pathogens.T od ate,a ntiviral drug designs have largely targeted viral proteins [2,3] especially those with enzymic functions such as proteases and polymerases. [4,5] An alternative antiviral approach is to target viral nucleic acid structures that are essential for replication. With current advances in sequencing technology,t he sequence of an ew virus can be identified within the first weeks of an outbreak, identifying both the protein coding regions and the untranslated regions (UTRs). Therole of the UTRs is not completely understood for many viral families,b ut their conserved structures underline their functional importance.W here UTRs have been studied to determine function (retrovirus HIV-1, [6,7] flavivirus, [8][9][10][11] to al esser extent coronavirus [12][13][14] ) they have been shown to have dynamic structures crucial for the viral replication. [15,16] These non-coding RNAregions are highly structured with multiple stem loops,b ulges,c rosses,a nd pseudo-knots,w ith common structural elements seen in many viral UTRs.These structures play arole in RNA-RNAinteractions (both within the viral genome and with host machinery) and in protein binding for the initiation of mRNAp roduction, translation, and viral replication. Moreover,t hese RNAs tructures may act as trans acting elements or mediate translational frameshifting, ac ommon feature in viruses with plus-strand RNA genomes.
Nucleic acid sensors mediate the early detection and host response to virus infections,and recognise either viral nucleic acids or "unusual" cellular nucleic acids present upon infection. [17] Sensors from the RIG-I-Like Receptor (RLR) family are key pattern recognition receptors for coronaviruses [18,19] which detect RNAs with specific structures such as 5'triphosphate or 5'-diphosphate ends. [20,21] Therefore UTR structures within double-stranded viral RNAp rovide attractive drug targets,b oth for direct inhibition of viral replication [13] and induction of host innate immune responses.
Compared to protein-and DNA-recognition, RNA-recognition by drugs has been much less explored. Nucleic acid recognition often focuses on sequence recognition but for RNA, which folds into complex shapes,its structure provides an opportunity for specific targeting;indeed, it is the structure of the UTR that is conserved for function, rather than sequence.S mall molecule libraries have been screened for RNAb inding (analogous to protein drug screens) [22][23][24] and agents targeting RNAstructures include small molecules that hydrogen bond within the heart of trinucleotide DNA/RNA repeats, [25] and planar RNAq uadruplex binders. [26][27][28][29][30][31] We have explored nano-size metallo-supramolecular cylinders ( Figure 1) as RNA-binding agents. [32] They are larger than traditional small molecules,w ith extensive aromatic surfaces to stack with the RNAb ases ( Figure 1b) and cationic charge (4 +)t hat ensure strong binding and excellent shape-fit for RNAc avities.W eh ave characterized the binding of cylinders in an RNA3 -way junction [32] by crystallography ( Figure 1c)and showed analogous binding in an RNAbulge structure. [33,34] Furthermore,wedemonstrated cylinder binding to an RNA3 -base bulge in the TARr egion of the HIV-1 genome (located in its UTR), that prevented HIV-1 replication. [34] Given this anti-viral activity against HIV-1, we were interested to assess whether these cylinders would bind structures in the UTR of SARS-CoV-2. We report now combined modelling and biophysical approaches to define the 3D structures of the SARS-CoV-2 5' UTR, and demonstrate cylinder binding to specific bulge structures in the 5' UTR. Furthermore,w es how that cylinders inhibit SARS-CoV-2 viral replication in cells.

Results and Discussion
To create a3 Dd ynamic model of the 5' UTR from the published genome sequence [38] (original Wuhan strain, NC_045512), our approach was to predict the secondary structures in silico,o btain experimental evidence to verify these structures,and then model the tertiary structure and its dynamic behavior, again with experimental validation. RNA secondary structure prediction has improved dramatically over the last decade,w ith free energy approximations and machine learning algorithms available (adding to the attraction of the RNAa sarapid-response drug target). However, there are significant challenges with longer RNAs equences that can yield multiple distinct structures that occupy asmall space in the energy landscape.W ec ompared % 10 folding prediction algorithms (see Supplementary Information) with many failing to cope well with the large size of the SARS-CoV-2 5' UTR. Three representative predictions are shown in Figure 2. Thef ree energy RNAfold [39] and Mxfold2 [40] algorithms gave similar predictions,both akin to the known UTR structures of related coronaviruses, [16,41] while the machine learning based VFold [42] gave aq uite distinct structure.
To experimentally probe the UTR, we used SHAPE, (Selective 2'-Hydroxyl Acylation Analyzed by Primer Extension Sequencing) analysis where the 5' UTR RNA sequence was first folded in vitro and the open strand (nonduplex) RNAs ites (e.g.s ingle stranded, bulges,h airpins) acylatedwith 1-methyl-7-nitroisatoic anhydride (1M7). These sites were then identified through ar everse transcription reaction that generates DNAfragments which end at the 1M7 tagged sites and were readily analysed by gel electrophoresis ( Figure 3A). Tw op rimers (RT1 and RT2) conjugated with  [35] L', [36] and L'' [37] form analogous cylinders that bear further aryl rings on their external surfaces. C) View of the crystal structure of acylinder bound in an RNA 3-way junction cavity from PDB 4JIY [32] showing its unique binding. fluorescent IRDye700 were used to cover the whole 5' UTR sequence.R T1 mapped the UTR from position + 1t o+ 140, and RT2t he distal region of the UTR (+ 141 to + 300).
Ther esults (summarized as ad iagram in Figure 3B)demonstrate that the RNAfold/Mxfold predicted structures best represent that formed in vitro.I np articular,t he long run of acylation around position Gc onfirms that the Vfold prediction does not adequately describe the experimental data. Thea dditional stem-loop (SL4) predicted by RNAfold but not Mxfold is acylated (region K) which suggests that if such astem loop forms it may be transient. Recent studies of the whole RNAv iral genome in cellulo by Miska [43] (COMRADES assay) and Pyle [44,45] (long amplicons with SHAPE-MaP) show dynamic folding and interaction between the 5' UTR and the 3' UTR, but that these key stem-loop structures (SL1, 2, 3, 5d epicted in Figure 2) are retained, affording further support and confidence that our in vitro findings are physiologically relevant.
Thee xtensive whole-genome sequencing of SARS-CoV-2 affords the opportunity to monitor the single nucleotide polymorphism (SNPs) mutations in the 5' UTR. We examined the available sequences in the gisaid [46] that were deposited before 7J anuary 2021 that contained complete 5' UTRs.I nterestingly the positions of SNPs within the UTR ( Figure 3C)o ften occur near the acylated positions in our SHAPE experiment (Figures 3B,S 6), suggesting that positions where the nucleotide has greater flexibility and hence less structural importance for the UTR are more likely to be substituted. Although not corrected for frequency, it is interesting to note that around 60 %( 19/31) of the SNP sites identified to date involve replacement with aUr esidue,w ith the largest subset (11/31) being aC -U mutation (Figure S6). These mutations do not affect the key structures of the 5' UTR.
After identifying the distinct stems loops (SLn) that were conserved throughout the results from the secondary structure prediction, we attempted the more challenging step of creating a3 -dimensional representation of the structure.W efocused on SL3 and SL5 as they have avariety of different structural features including bulges and loops. Although the exact structure/function of SL5 is not yet determined (to our knowledge), it contains the initiation codon and it is similar to the SL5 of SARS-CoV-1 [12,13] suggesting af unctional role. Understanding the tertiary structure and behaviour from the sequence,ismore complicated than predicting the secondary sequence since RNAi s an inherently flexible molecule and as ingle static conformation will not be sufficient to understand the binding properties.Recent advances in molecular dynamics parameterization of RNAand wider availability of high-performance computer facilities can provide new insights into the dynamic structure of the RNA and show the key regions of flexibility-usually bulges and junctions,w here both the secondary and tertiary structure is highly dynamic.A fter creating initial models using the short list of open-source software available,t he ROSETTAp latform (FARFAR2) [41,47] gave as tarting structure most consistent with the SHAPE analysis (notably the SL5 junction point having nucleotide interactions rather than being very open). We explored the dynamics around this central structure.
We employed the recent RNA-force field developed by Mathews, [48,49] which retains NMR characteristics of RNA structures even in non-minimum starting conformations,a nd coupled it with Markov state modeling [50] to analyse the conformational space accessed across different simulations. We started with 3i ndependent 1m icrosecond molecular dynamics simulations of the SL5 alone,a nd then performed additional 1m icrosecond simulations with both enantiomers of the cylinder (three runs of at least 1microsecond each;with parent cylinder and both enantiomers) to identify RNA regions that can be recognised by the cylinder. Thes imulations total 9 ms. Additionally,M arkov state modelling revealed micro states where the cylinder can be positioned within the RNAhelix in the bulge regions.Wealso performed simulations on the SL3, comprising overall 4 ms. Just as for the secondary structure predictions,t he observations in the molecular dynamics of SL5 were verified experimentally by the SHAPE results,a nd by using these two techniques in concert we gain amolecular level understanding of the three dimensional structure and dynamic behaviour of the RNA ( Figure 3C,E), and of how the cylinder binds.
Considering the SL5 RNAi na bsence of cylinder, molecular dynamics reveal the following features of the stem: a) There is abulge at G138-U140 which is highly flexible with al ot of transient stacking between its bases (region Wi n Figure 3). G138 base pairing with C10 elongates the bulge forcing U139-U141 to point outwards of the helical axis.T his is seen experimentally in SHAPE. This sharp twist of the backbone often creates ab end to the stem. b) There is am ismatch at C15 (halfway between regions La nd W) however there are many transient non-Watson-Crick base pairings between A14-A16 and C133 and those nucleotides did not produce aS HAPE signal;t hat is,t here is no significant bulge or base flipping outwards and the helix is contiguous.c )The next bulge (U21-U25;r egion L) is different. Relative stability is provided by three G:Cb ase pairs (G20:C128, C24:G126, C26:G124), causing flagging out of A23 as seen on SHAPE (region M). d) At the 4-way junction (regions N, R) the base pairings ("CUG"36-37 and "CAG"78-80) hold throughout the simulation (3 ms) creating an additional 7n ucleotide bulge on SL5a (G72-A79) where on the opposite strand there are only C38 A39. Although C38 remains stacked to G37 and transiently binds nucleotides of the opposite strand A39 lacks both strong stacking or base pairing,t herefore it can be seen on SHAPE. Thej unction is less open (i.e.c ontains more pairing) than the secondary structure prediction and this is reflected in the SHAPE experiment where there is only limited acylation. e) Higher up on the SL5a CG Watson-Crick (WC) pairs create rigidity which stops on the U47, which stacks strongly on C46 allowing stable non WC base paring with U67 but leaves U48 randomly pairing U66 and G66 (region O,Q). U48 and G66 are both identified by SHAPE. Thes tem closes with strong CG pairings and as hort loop (region P), whose bending exposes U91 and U96 and they are identified by SHAPE. f) On SL5b five CG pairs add rigidity allowing/stabilising non WC pairings.H owever,b etween C86:G100 and G89:C98 (region S) there is an additional base and as U87 and G99 strongly stack on the C86:G100 A88 is exposed and tagged by SHAPE. On the loop (region T) stacking continues strongly up to U92 and G95 creating atight bend exposing U93. g) The short SL5c is also stabilised by 2C Gp airs and all three A residues are stacked together but point outwards of the stem (region U).
These combined simulation/experimental pictures of the RNAd ynamics were then complemented by analogous SHAPE experiments and MD simulations of the SL5 RNA in the presence of the [Fe 2 L 3 ] 4+ cylinder (Figure 4). Four batches of simulations were carried out in the presence of cylinder;f or each enantiomer of the cylinder and with the cylinders positioned either away from the RNAorinside the bulges.I mportantly,t he MD simulations locate the cylinder binding sites on SL5 at the same positions that are affected experimentally in the SHAPE analysis,a nd not at the other areas of SL5 that are unaffected in SHAPE. As seen in free SL5, the bulges serve as dynamic hinges giving flexibility to the surrounding stems.Inthe simulations where the cylinders started away from the RNA, they quickly localized ON those hinges,reducing flexibility of the hinge drastically (in regions W, L, N, R). From studies with three base bulges (on HIV TAR) we know that such hinges can open and from such abinding position the cylinder can reorient and insert, though this can take very long on the time scales of simulations; [51] we can model this by pre-positioning the cylinder at or close to this position. Thec ylinders bind strongly to these structures. [32][33][34]51] Once the cylinder is in the SL5 bulge ( Figure 4A, cylinder D), the simulations show that the helical structure of the surrounding stems is disturbed, opening up the stem nucleotides to attack from 1M7, and this is confirmed experimentally in SHAPE leading to an increase in the signal in these regions (around Land Mand towards W, close to the RT primer).
In addition to the bulge as as ite of binding,i nt he simulations the cylinder can also insert into the cavity at the central cross (4-way junction) (Figure 4, cylinder A), protecting A193. This cavity is larger than the 3-base bulge and thus although the binding site may not offer as good astructural fit, it will be kinetically quite accessible.T he binding also to this site was confirmed experimentally by the disappearance of this SHAPE signal (A193, RNAp osition N) at increased concentration of cylinder. At the loading of cylinder used in the simulation, interaction with the stems containing regions Ua nd Twas not observed. TheS HAPE results suggest that these regions are also affected as the loading increases.
In SL3 there are no large bulges similar to that found in SL5, however mismatched pairs create ad istortion on the helical structure that can lead to exposure of nucleotides to IM7. Specifically,m olecular dynamics simulations ( Figure 5) on the free RNA( no cylinder) revealed short lived pairings of different types from G96:C126 to A102:U120. Furthermore, higher up the stem U104:A118 to G106:G115 is also ar egion of multiple cross strand pairings.Equally important for understanding the SHAPE results is the transient stacking between this stemsn ucleotides revealed in the 3D model.
In the presence of cylinders,w eo bserved that the cylinder is attached to the stem loop ( Figure 5C)i nas table manner, decreasing the flexibility of those residues and thus protecting the loop nucleotides from acylation, where we saw ar educed signal in SHAPE (Figure 4r egion I). Cylinders can also bind lower on the stem (region H/J) and this leads to an enhancement of acylationa ss een on the stem of SL5.
Alongside the SHAPE experiments with the [M 2 L 3 ] 4+ iron(II) cylinder (M = Fe), we also compared the analogous nickel(II) and ruthenium(II) cylinders (M = Ni, Ru; Figure 4). Changing the metal does not affect the overall cylinder structure or charge,a nd analogous patterns/ effects are seen in the SHAPE mapping confirming that they bind the RNAa tt he same locations and it is the cylinder shape/ charge that is responsible for the binding not the choice of metal. High cylinder excess (two last conditions,1 .25 and 2.5 nmoles corresponding to 25 and 50 cylinders per UTR) in most cases severely affected RNAs tructures and so SHAPE bands become less well defined indicating more random RT stops.I nP CR experiments the [Ru 2 L 3 ] 4+ cylinder is stable to the heat cycles and can inhibit polymerase amplification; [52] the reverse transcription efficiencys eems similarly affected at the highest concentrations of this cylinder. Some small gel shifts are also observed at high cylinder loading, possibly suggesting some cylinder-binding to the DNAt ranscript.
We also tested the effect of two substituted cylinders based on ligands L' and L'',t oc onfirm the key binding area of the cylinder design ( Figure 4B). These cylinders bear additional aryl rings at their ends while the central regions of the cylinder (which insert into the junctions/bulges) are unchanged. Both show similar patterns in the SHAPE analysis to the cylinders of ligand L, but while [Ni 2 L'' 3 ] 4+ had very asimilar impact on folding, the isoquinoline cylinder [Ni 2 L' 3 ] 4+ caused some changes in the SHAPE pattern even at the lowest cylinder concentrations.The results suggest that it may be possible to modify the cylinder structure to modulate the affinity for the binding sites.
Having established that the cylinder can bind to and modify the structure and reactivity of the SARS-CoV-2 5' UTR in vitro,w ee xplored their potential to inhibit viral replication in cellulo.S imian Vero cells were infected with SARS-CoV-2 virus England 2( Wuhan strain;i dentical 5' UTR to reference sequence) in the presence and absence of the Ru and Ni cylinders,[ M 2 L 3 ] 4+ (M = Ru, Ni), and the frequency of cells expressing the viral encoded spike glycoprotein quantified ( Figure 6). Both cylinders reduced spikeprotein-expressing cells in adose responsive manner,with the ruthenium cylinder being more effective and reducing the frequency of infected cells to < 5%at the highest doses tested (75 mm). MTT cell metabolic activity/viability assays confirmed that the cylinder is not cytotoxic to Vero cells in the timeframe of these experiments (See Supplementary Information).

Conclusion
We have shown that by combining experimental SHAPE results with molecular dynamics simulations we can create 3D models of the structure and dynamics of key individual stems that make up the 5' UTR of SARS-CoV-2. These stems contain anumber of intriguing structural motifs also found in the UTRs of other viruses,a nd which offer the possibility of developing new anti-viral agents that act against ab road spectrum of diseases.The unique nucleic acid binding activity of the supramolecular cylinders is ideally suited to target these types of structures and we show that the cylinders can bind non-covalently to an RNAbulge in stem loop 5, as well as the central cross (4-way junction) of that loop.T he ability to bind at different crucial RNAs tructural sites that are essential for virus replication limits the opportunity for the virus to mutate and to evade drug action. In line with their RNAb inding,t hese nanosized supramolecular helicates inhibit infection at concentrations where they have negligible cellular toxicity.
These helicate cylinders are currently the only metallosupramolecular architectures that have been demonstrated to thread through RNAbulge and junction structures,but there is ag rowing interest in metallo-supramolecular designs to bind nucleic acid structures. [53,54] While the SHAPE experiments provide further demonstrations of cylinder selectivity for junctions and bulges over other nucleic acid structures,an exciting possibility is that cylinders might also be able to bind host-cell RNAs tructures,m achinery on which the virus depends for replication, causing ad ual anti-proliferation effect. Ther esults herein suggest that nucleic acid binding metallo-supramolecular architectures,a nd the cylinder designs in particular, merit further exploration as anti-viral agents.  We thank Henri Huppert for expert assistance with developing CellInsight quantification algorithms to measure infection in the microneutralisation assay and Jack Dismorr for synthetic assistance.S imulations used the Bluebear and Castles HPC facility (U.Birmingham). [55]