Scientific Manuscripts
Overview
Scientific manuscripts are often the most tangible and externally visible manifestation of your research. Writing a good manuscript is hard work, but it is a very useful skill to have. Believe me: delivering clear and concise statements that accurately convey information is broadly useful and widely appreciated in both academic and industrial settings. In addition, Prof. Webb (and by extension, the group) takes pride in producing high-quality, well-written, and attractive manuscripts. At times, it may seem the extra effort is not worth it. Do people notice? If something is great, maybe. If something is bad, definitely. The goal is to effectively communicate the science. This necessitates clear writing and sensible figure composition. Anything less is a distraction. This is important.
The following sections aim to provide guidance and promote consistency for manuscript preparation and publication. Following the recommendations set forth should not only reduce the editorial burden on Prof. Webb but also simplify the writing process and result in an overall better product in less time. I recommend all students read this multiple times and revisit as necessary. Over time, you will and should develop your own sense of style and approach to writing, but you should also appreciate that what follows is a distillation of many years of experience of writing and observation. Therefore, this should establish a common baseline for navigating “the process” with useful and actionable approaches to requisite steps therein.
The Process: a 1000 ft view
Overall, the process of developing, submitting, and publishing a manuscript in our group involves roughly 10 steps. There are some deviations from this (e.g., to meet a pressing a deadline), but I would consider this most ideal for development.
Conceptualization - Folks often wonder when they should begin writing a paper. We can debate when the actually writing should begin, but the process begins at the outset. Research should be generally undertaken with a given hypothesis, question, demonstration, or application at the forefront of our minds. For me, it is helpful to consider “what would be the title for a paper featuring this research?” Additionally, I consider possible analyses and calculations based on how they might manifest as figures. What would the order of the figures be? This conceptual sketch of the paper gets us off the ground and into something tangible.
Execution - During this stage, the focus is on progressing research based on the concept and not on the actual writing or figure composition. As research progresses, we should continually revisit and evolve our conceptual narrative based on the data and what we learn. Therefore, we should always reflect on the research goals and questions to guide the next steps in the research. Often group meetings and small presentations are opportunities to evaluate a working narrative and get feedback. Of course, the final product may substantially differ from that initially envisioned in Step 1. Note that there is a difference between “losing sight” of the narrative and “evolving” the narrative. The former means we have been perhaps too careless and taken by whimsy; the latter typically means we have encountered something unexpected that necessarily requires us to revisit our initial questions.
Storyboarding - During this phase, you will distill your results into a set of digestible figures coupled with text that delivers clear points. Commonly, we return to Step 2 for additional work that seems necessary to strengthen the narrative and future manuscript. By the end of this part, this altogether yields a cohesive scientific narrative with which all coauthors agree.
Drafting - At this stage, the storyboard is converted to actual manuscript form. This is a major undertaking, but with the hard work that comes prior, it should not be as laborious as expected. The result should resemble a paper in form and function, with no major elements missing, although significant revision will follow.
Internal revision (self \(\times M\)) - After producing a complete document, it is probably tempting to inform me that “the manuscript is ready for review!” You are right, but the first reviewer should be you and not me. The Drafting phase already substantially depends on self-revision but usually in a micro sense. Here, you should take time to assess the manuscript both wholistically in its presentation as well as details. Soliciting the opinion of a labmate would also be a good idea. If the manuscript is too rough to show them, then it needs more work from your side first. Ideally, with the recommendations from this guide, subsequent steps should go over easier.
Internal revision (me \(\times N\)) - Once you have a completed and (what you believe to be) cohesive draft, ping me for comments. At first, I will look for major structural/logical/organizational elements that require attention. I will pay more attention to the progression of ideas and presentation of the figures. I will read the introductory paragraphs to ensure that we are providing necessary information in proper context of the literature; I will look at the reference list. The text associated with methods/results will be visited briefly–usually at a subsection/topic sentence/conclusion level. The rationale is that structural changes will impact the writing, such that it is premature to focus too heavily on it. I expect that comments will be specifically and transferibly addressed. By transferibly, I mean that the essence of the comment should be taken to heart and applied throughout the manuscript, even if not specifically noted elsewhere; this especially applies to figures. At early stages and iterations, I will likely make it partway through the manuscript, while you improve the latter parts based on earlier suggestions. We will do this as many times as necessary to obtain a high-quality draft. Ideally, \(M >> N\).
Fine-tuning and proofing - With a high-quality draft, we will turn to fine-tuning and word-smithing. Edits primarily emphasize clarity, concision, and technical precision. Recently, we have found it useful to also solicit the group to catch obvious typos and grammatical errors prior to a submission. This exercise is also beneficial for the group, as they can also experience writing in the group before they might have done it themselves.
Submission and peer-review - Before submitting, we need to put something together, which I call a
submission package.The submission packages consists of the elements below.Main text manuscript files
Cover letter (I typically write but sometimes have you draft initially)
Supporting information text
Supporting information files/data (sometimes submitted to data repository if warranted)
List of ~5 possible referees
TOC figure (if required–be sure to make!)
While it is not so many things, generally putting all this together does take more time than it seems like it should. After submitting at the journal (and a preprint server), we then wait to see if the paper is sent out for review and wait to receive comments. This is usually a 1-2 month process for most journals. We then receive around 2-4 referee reports that determine the next steps.
Revision and Response to Reviewers - After receipt of the referee reports, I will forward to all coauthors and set up a time to discuss. Because we have put in a lot of effort into preparing the manuscript, the most common outcome is that referees will indicate either “minor revisions” or “major revisions,” and the moniker of “major revisions” may practically reduce to fairly minor changes but those that the referee feels more strongly about. It is more rare than not that addressing referee critiques requires additional calculations or new simulations – again, we will have thought very hard about the scientific narrative and the analyses to make its case. There are also often some small requests from the editorial office that need to be addressed. It is my preference to keep the revision period as short as reasonable to be wholly responsive. Unless there are other urgent things going on, addressing the critiques is a top priority. Ideally, we need only do one round of revisions, but sometimes two are needed. There will be an additional section on crafting compelling response documents.
Acceptance and proofing - After receiving our response documents and reviewing changes, the editorial staff will make a decision on the manuscript, which we aim to be “accept” right out of the gate. The process from here is easy, and you can rejoice and breathe a sigh of relief. There are usually some small administrative things that need to happen (signing author agreements, etc.), but closely examining the proofs is by far the most important activity that requires your participation and attention. Here, the journal staff will typeset a version of the manuscript in their template and distribute this along with some questions. We want to principally identify typographical, grammatical, or equation-based errors. Very small tweaks to language can be recommended if our prior text was ambiguous or inaccurate in some way, but generally, this is not the time for any substantive changes. This is our last chance to make edits, so we need to catch everything!
Software and Workflows
Manuscript preparation
Most of the manuscripts in the group are prepared using LaTex and collaboratively shared via Overleaf. When we make a decision to start drafting, one of us will create and share an overleaf document using an appropriate journal template. We will choose the appropriate template based on our intended journal target, but it is relatively easy to swap templates in LaTex. On rarer occasions, Word may be used, but this is predominantly if we are participating on a collaborative work with experimental groups that insist on its use.
DropBox Folder
When we start to write a manuscript, a folder will be created to manage all final files. This will be in Webb_group-->Manuscripts--><folder>. Predominantly, this folder is used to organize everything for submission and coordinate preparation of figures.
Reference manager
In an effort to make managing references easier, we have set up a workflow through GitHub to update and share a common, growing bibliography file. I highly suggest using a program like JabRef (tutorial by Carlos on the DropBox) to keep track of papers and references. I have managed my references manually and also previously used BibDesk, Zotero, and Mendeley. Don’t make my mistakes–start using a reference manager now. For me, JabRef takes it. One nice thing about JabRef, other than the organization, is that it can be configured to automatically provided citation keys in a desired format; my preference is that we use citation keys of the form
R:YearPublished_LastNameOfFirstAuthor_ShortTitle
I suggest starting your reference set from my working version, which is located in the group DropBox folder:
Webb_group --> References --> group_references.bib.
The GitHub is at https://github.com/webbtheosim/group_references
Honestly, we don’t push/pull from this as often as we should; it tends to happen just around manuscript-writing time.
Journal Templates and Formats
As mentioned before, it is desirable to use a journal template where possible. Also, when we have selected a target journal, you should review their style guide and examine recent articles from that journal to get a sense of format/presentation. What sections/subsections are present? Where are the methods located? Are there restrictions on content length or number of figures? Are captions formatted in any particular way? It’s better to take stock in these things earlier than later.
Figure Software
Most of the plots that we create are generated using Python. In terms of compositing plots into figures with multiple panels, I recommend either using PowerPoint, InkScape, or Illustrator. I actually prefer PowerPoint because it is simple, although there is a trick that needs to be employed to ensure that the figures are generated at high resolution. It is useful for me to have access to the source figures so I can make edits as needed during crunch time. Also, it is useful to have these things for presentations and the like. Your source files should be deposited in the appropriate DropBox folder. There will be more about figures later.
Storyboarding
So, it’s time to write a paper, huh? I have experienced doing this several different ways. In my opinion, storyboarding is the most efficient approach and makes actually writing the easiest. Here, the objective is to solidify in your mind and mine what it is that we are actually trying to convey. To do so, you are encouraged to create a PowerPoint presentation with the following elements:
1. Draft of all figures and corresponding *full* figure captions
2. For each figure, a couple of *complete* sentences that summarize the key takeaways and major findings that derive from the figure.
3. A draft of the abstract.
The discussion in a manuscript is centered around the figures, so it makes sense to start by nailing these down. The figures themselves we can consider as our evidence for specific conclusions. We will quickly ascertain whether the evidence, as presented, is sufficient to justify the conclusions, which you will also be summarizing. Notice that I request a reporting of key takeaways and major findings as complete sentences. The rationale for this is two-fold. First, I find that presentation of ideas in the forms of bullets and sentence fragments allows one to get away with corresponding fragmented thoughts. By writing a complete sentence, one has to mentally work through the idea and ensure that it is coherent enough to warrant punctuation. Second, these sentences may comprise the first versions of topic or concluding sentences in manuscript drafts.
In terms of writing an abstract (without yet any manuscript text!), the logic is similar to forcing one to write a complete sentence. To write the abstract, the scientic narrative must be clear enough to you that it can be expressed in a succinct and well-organized paragraph. Often people leave the abstract till the end, but I view this as a mistake/missed opportunity. Writing the abstract early gives the manuscript direction from the get-go, and this should resonate with the the present set of figures. If the content and meaning of each figure is clear and if the progression of figures is cohesive, then this should not be a tall task. If it is unclear, then it means we probably still have work to do. This abstract is, of course, just a first pass and will be revisited later during the manuscript preparation stage.
Guidelines for Figure Preparation
The objective…
Figures are the foci of most manuscripts. It is important that these are clean, cohesive, illustrative, and polished. You might have been asked or asked yourself “Wait, what are you showing here? What does this mean?” Those are not questions we want asked. It should be evident/discernible from the plots themselves. Furthermore, note that some kinds of plots are more effective than others at making specific points. I have heard comments in the past about how certain styles of plots are “bad.” That wholly depends on what the purpose of the plot is. Are you trying to compare how something changes as a function of a dependent variable, or are you aiming to make comparisons between functional behaviors? The same style plot may be bad or good but not in isolation; it depends on the purpose/interpretation.
There is also often a tendency to describe the components of the figure (e.g., this line goes up and then goes down, this line is above that line, etc.). This also relates to recommendations on writing, but in my view, those kind of statements are pedantic and unnecessary. There can be exceptions for some subtle features here and there, but major points ought to be predicated on features that should be obvious to a discerning person based on the figure and choices made during its composition.
I consider scientific writing to be similar to (but less boring than) prosecution in a court of law. The concept is that we are building a case for an idea to be accepted into the corpus of scientific understanding. The figures (and their composite panels) are our evidence that the idea should be accepted, that it is true beyond a reasonable doubt. In fact, the analyses and their portrayal should be convincing enough that a reasonable jury of our peers (scientific colleagues), upon examination of the figures presented, would be inclined to agree that it is correct and reasonable.
Not only should the figures be clear enough that readers have such an inclination, but actually, I would prefer that readers formulate the same conclusions from the figures, even in the absence of our exposition. Note that this preference is also a practical matter, as many process papers by just flipping through the figures. If the figures are engaging and compelling, then perhaps it will get a closer read.
General requests/advice
Prepare figures at the size intended for printed display. This presents the highest chance that the appearance of the figure will be as intended with features and text both predictable and evident at a high-resolution. Single-column figures are usually alloted a width of 3.3in; double-column, spanning figures have a width of 7.0in. It is good to check journal guidelines.
Utilize consistent, human-readable text size. From an accessibility nad aesthetics perspective, all text should essentially be in the range of font size 8-12. I don’t like having anything smaller than 8. Text size typically diminishes as we move inwards towards the data, such that panel labels are at 12 \(\geq\) axes labels \(\geq\) tick labels \(\sim\) legend text \(\geq\) annotation text. Panel labels should be consistent size across all figures.
Utilize consistent font schemes throughout. Employ the same kind of font style across your figures. I have some preference for the simple sans serif fonts (e.g., Arial), although I admit that symbols in equations sometimes become hard to discern.
Pay attention to the style of good figures. Closely examine figures created in papers previously published by Webb and the group. It is also good to note nice figures in articles that you have read, particularly from journals similar to that targeted for submission.
Use stylistic “choices” to your advantage. More broadly, the statement might be “have a reason for things.” We should use things like color schemes and other visual elements with intent. These choices should help the reader understand and appreciate salient aspects of the data. We can also connect with and exploit intuition. Are we showing data across a range of temperatures? Maybe data at high temperatures should be associated with warm colors and data at low temperatures should be associated with cool ones. You should also “put yourself in the reader’s shoes.” Is it possible that a reader may form uninented mental associations based on your choices? If so, we may reconsider how the data is presented. I will help you through this. Be thoughtful, and we will go from there.
Be considerate to those with color blindness. We will talk more about colors elsewhere. To be brief here, if possible, we should try to use aesthetic color palletes that are also broadly accessible. In general, we will avoid pure reds and greens. In addition to color, note that things like markers can be different shapes to distinguish data types.
Pay close attention to alignment/placement/arrangement of panels, objects, and text. This gives an impression of sloppiness for those that notice. I am hyper attentive to these details (seriously), and the issues are easily rectified.
Avoid defying convention or inventing new conventions. This statement applies mostly to trivial things like panel labeling schemes, axes labels, etc. Unconventional but coherent representations of data are certainly fair to consider.
Deploy useful, existing data visualization/plotting strategies. Other researchers elsewhere may have come up with really cool ways to show their data. Make use of such good ideas! We shouldn’t use objectively worse visualization approaches just because we did not come up with it. Sometimes there is also room to innovate. Attribution will be given where appropriate.
Figures vs. Panels
In most cases, figures contain multiple panels (a,b,c,d, etc.) that each feature different plots of data. As a guiding principle, the intent of a figure is to present thematically related data that convey an overall message/purpose. At early stages of the research process, we may plot things to gain traction, but at the stage of manuscript preparation, you should refine these ideas and make the figures with purpose. It is not the other way around. It is unlikely that the same ole way that you have been conveying the data in weekly meetings is the optimal version for broad dissemination.
The panels within the figure should be constructed to make one (or perhaps a few, although this is less desirable) discrete, tangible points. The goal is to have presentations of the data that are meaningful, illusrative, and accurate. Therefore, having a single panel do a lot of heavy lifting is a recipe for reader (and PI) confusion or skepticism. To draw out the courtroom analogy in a macabre way, imagine we are prosecuting someone on a murder charge, and we have the murder weapon (a gun). It would be unconvincing to simply present the murder weapon in its entirety as evidence. No, we separately present a comparison of fingerprints of the defendant compared to those extracted from the weapon, ballistics forensics that show that the bullet recovered from the crime scene is consistent with the ammunition type of the firearm in question, a serial number trace that links the purchase of the firearm to the defendant, etc.
With the framework above, subsection of written discussion are often centered about figures, while paragraphs are centered about related panels. It is OK to have a figure that contains only a single (unlabeled panel).
Captions
Captions are important. There are basically two camps. In the first, the captions feature some limited commentary and interpretation of the data. In the second, the caption text is restricted to explaining elements that comprise the figure (i.e., there is no exposition on the meaning of the results). I lean strongly towards the second camp and prefer captions that are largely neutral with respect to any conclusions. In theory, our figures are so well-constructed and explained that the reader has almost no choice but to form the same conclusions as we have. In addition, these points will be delivered in the main text, so it seems redundant to also produce them here. In the same vein, I do not favor explaining stylistic elements of figures in the text; that information should be in the caption. That being said, making a neutral observation or calling attention to something atypical or necessary to be noted can be OK.
Captions must provide a full description of everything present in the figure/panels to properly understand and interpret the data
and sufficiently explain all components such that the reader can understand and appreciate the figure/panels. Sometimes journals place restrictions on length, which is annoying.
A common initial error is to under-describe or presume pre-existing knowledge. It is easy to skim or bypass things you know. It is much harder to try to figure out (no pun intended) something that has been ambiguously presented.
At the stage of drafting, I prefer to have more information here than less.
In composing the caption, the following are guidelines/recommendations:
Begin the caption with a summative fragment or statement. Our panels should be thematically related. This summative statement usually indicates what that theme is. Some examples from recent papers include:
“Performance of ML model for prediction of mean square radius of gyration, \(\langle R_\text{g}^2 \rangle\), for class I and class II polymers.
“Overview of systems studied.”
“Relationship between sequence characteristics of precursors and structural features of resulting single-chain nanoparticles.”
“Active-learning approach to heteromeric sequence design.”
“Physicochemical features of sequences at the thermodynamics–dynamics Pareto front.”
“ML guides design of highly stable polymer–protein hybrids.” – Notice this last one is a departure from the others in style. It is a collaborative paper and not out-of-place for the venue (Advanced Materials). The caption text itself is not very elaborative of the features present.
Avoid combining multiple plots into single panels. It’s generally preferable to give plots their own label, rather than group them and additionally use “left, right, center, top, bottom” etc. There are some exceptions to this. It’s definitely a no-go if the plot types/quantities are not the same.
Note all “non-standard” or added elements. This would include things like insets, guidelines, or marginal axes.
Explain less common data representations. Sometimes this changes over time. For example, violin plots were once rare, but now they are all the rage. Neverthless, it is better to be thorough than to risk mis-interpretation.
Explain error bars/regions, ticks, notches, etc. Are error bars standard errors, standard deviations, some % confidence intervals? How were they obtained? Are the notches percentiles? Which ones? Etc. This should not devolve into fleshed out methodological descriptions.
Note elements that are shared across panels. Many of our studies will feature comparisons of the same kind of data in multiple regimes or systems. Including the same legends and/or axes labels across all figures can make a figure over cluttered. Therefore, it may be appropriate for legends or axes labels to be shared across panels to reduce clutter without losing information. This is indicated by a statement like “Note that the axes labels are shared for panels (A)-(C)” For legends, they may be placed in the margins or be present in a single panel. If there is an association between parts of a panel and another panel (e.g., via color), then that may be notable.
Examples of captions
Below are some examples that illustrate the expected level of detail and composition. The first bullet provides a base line format.
Fig. X. Overall summary of figure theme or point. (A) A plain and simple fragment statement of what is the panel. Subsequent sentences that may follow are full sentences, which will explain panel-specific elements. (B) The simple fragment statement of this panel. (C-E) A common fragment statement that applies to (C) this condition, (D) that condition, and (E) the other condition. Note that some subset of panels may share something in common like axes. In all panels, something may be common such as error bars reflecting standard error of the mean obtained from bootstrap resampling.
Fig. 5. Targeted sequence design of size-specific polymers. (A) Statistical comparison of 〈\(R_\text{g}^2\)〉 distributions obtained from explicit MD simulations of all candidate polymers. (B) Average composition maps of the CUs for candidate rod-like, swollen, and globular targets with 〈\(R_\text{g}^2\)〉\(\sigma^{−2}\) = 3800 (top), 2000 (middle), and 250 (bottom). In (A), from left to right, the first 20 sequences are the globular targets, the next 20 are the swollen targets, and the remaining 20 are the rod-like targets; within each set, the sequences are ordered by ascending 〈\(R_\text{g}^2\)〉, given by the white dots. The violin plots indicate the distribution of values underlying the mean, with a notch at the median value and a bar extending from the 25th to the 75th percentile values. For reference, the target value is indicated by the horizontal line, and the shaded region indicates the average spread between 25th and 75th percentiles for class I polymers of similar size. The color of each violin is based on the average composition of the total polymer sequence. In (B), the colors are resolved by CU and backbone/pendant group but averaged over all sequences for each specific target size. The color contributions for each bead type are shown in the boxed legend, with ∅ indicating no bead present.
Figure 2: Machine learning guides design of highly stable polymer-protein hybrids. a-c) Copolymer designs and their measured REAs for HRP, GOx, and Lip. Marginal axes at the top contain Gaussian kernel density estimate distributions of REA in the seed dataset (blue), active learning iterations 1-4 (orange), and the final exploitation round (green). Medians of distributions are indicated by vertical lines. Main axes show the experimentally measured REA for all tested PPHs; individual markers are vertically located in bins according their degree of polymerization with random fluctuations added within bins to improve visual clarity. The marker color reflects the composition of the copolymer according to the ternary diagram (bottom right). d-f) Representation of active learning path traversed through copolymer chemical space for each enzyme. The chemical space is represented as a ternary diagram with coordinates providing the fraction of incorporation of hydrophobic, hydrophilic, and ionic monomers in copolymers. Colored stars indicate the mean composition of copolymers proposed during a given active learning iteration. The ternary diagrams are additionally colored by maximum REA observed for a PPH in a given region of the chemical space spanned by the ternary axes. g-i)Individual chemical compositions of copolymers proposed during each stage of active learning. The centroid of all points at a given iteration yields the position of the stars (d-f). The crosses denote copolymers that showed undesirable gelation during synthesis (see Experimental Section, Handling polymer gelation).
Figure 4: Morphological dispersity of single-chain nanoparticles originating from given precursor sequences. (a) A comparison of distribution of radius of gyration \(P(\langle R_\text{g}\rangle)\) for distinct single-chain nanoparticles. The distributions are obtained from 24 independent replicate simulations of the same precursor chain sequence. The data are for SCNPs formed from ten precursor sequences each from \(f=0.1\) and \(\beta\)=0.2 (yellow, left) and \(f=0.1\) and \(\beta\)=0.8 (green, right). The width of violins correspond to the density obtained from Gaussian kernel density estimation; the edges of boxplots in the violin depict the inner quartile range, while white dots indicate the median value. A comparison of the distribution of manifold-coordinate vectors \(P(\langle \mathbf{Z} \rangle)\) for single-chain nanoparticles formed from selected precursor chain sequences with parameters of (b) \(f=0.1\) and \(\beta=0.2\) and (c) \(f=0.1\) and \(\beta=0.8\). In panels (b) and (c), the color reflects Gaussian kernel density estimation over the \(\langle \mathbf{Z} \rangle\) for each of the 24 distinct SCNPs formed by each precursor sequence. The color scheme reflecting precursor parameters and sequence labels is the same across panels.
Figure 1: Overview of systems studied. (a) The structure of the monomeric repeat unit and BigSMILES\({}^{71,72}\) string along with study-specific reference name for all studied polymers. The reference name is shown inside a colored block; the same color is used to distinguish the polymer in all subsequent figures. (b) Simulation snapshots demonstrating distinct wetting behavior on amorphous polymer surfaces (top, left-to-right) as well as varying the number of water molecules present (bottom). Additional structural considerations for wettability in the present study include (c) crystallinity and (d) tacticity. In addition to amorphous polymer surfaces, crystalline surfaces are prepared for PE, PVC, and N66; these are respectively referenced as {\PEx}, {\PVCx}, and {\Nssx}. Amorphous surfaces formed by atactic and isotactic PVA (referenced as {\PVAd}) are also compared. Molecular images are visualized using OVITO.\({}^{73}\) The elements are colored such that carbon is gray, fluorine is blue, chlorine is green, oxygen is red, and hydrogen is white; water molecules are cyan.
Figure 6: Structural analysis of water in proximity to polymers without the capacity for conventional hydrogen bonding. (a) The average orientation of water relative the polymer-water interface \(\langle \alpha_\text{w} \rangle_d\) as a function of distance \(d\) from the interface. The inset schematically depicts the quantity. The region shaded in green corresponds to the interfacial (first) layer of water \(L_1\).The solid and dashed lines distinguish between amorphous and crystalline surfaces, respectively. (b) The distribution of hydroxyl group orientations \(P(\alpha_{\text{OH}})\) relative to the polymer-water interface for water molecules in \(L_1\). Only the lesser angle formed by the two hydroxyl groups per water is used for the calculation. The legend in (a) also applies to (b). (c) Schematic diagrams and representative snapshots that illustrate typical water configurations in proximity to (left) PTFE, (middle) PE, and (right) PVC surfaces.
Selective sources supporting things I said:
These are (mostly) not just wild, dictatorial mandates. Many have come to the same conclusions and strategies as I. Some editorials have been written on these topics, and they can be worth a read.
Guidelines for Writing
The objective…
Although figures are the foci of the manuscript, they cannot by themselves comprise a manuscript. Writing is still of paramount importance, addressing gaps and components of a scientific manuscript that cannot be fulfilled by figures alone. Scientific writing should be clear, understadandable, accurate, precise, and (as possible) concise.
Structure of the main text
Abstract - All manuscripts have abstracts. Good abstracts are essential. For some, it may be the only part that someone reads. If the abstract is compelling, then they may dive in or reserve it for another day. Less desirably, someone will disregard it as of no interest because it is unineteresting or convoluted. The abstract is essentially the whole article writ short. It should capture the essence of the paper: the motivation/rationale, the methods/approach, the new knowledge, and its effect. A good abstract will often follow the structure of
Introduction/motivation (1-2 sentences). This provides high-level description of the topic area to contextualize the article and identify it as relevant to the readership. This may be a brief “X are promising materials for Y and Z” or “There is fundamental interest in X for Y reason” type of thing.
Gap statement (1ish sentence). This needs to explain what has precluded something from happening previously, if the topic is indeed so important. What is the nature of the obstacle? Is there missing knowledge and why? This is the first chance to contextualize the importance of our work.
Our work/approach (1-2ish sentences). This dovetails with the gap statement. What have we done or what do we do to address the gap.
Introduction - All manuscripts also have an introduction. The structure of the introduction is often an expansion of the abstract, except that sentences are now fleshed out into paragraphs. The Introduction usually has three functional purposes. It must explain and review relevant topics/literature so readers can both (i) understand the manuscript and (ii) appreciate its context. It must (iii) succinctly explain and introduce our contribution.
Interestingly, introductions are best crafted after we have a thorough understanding of our own results and their implications. It is obviously necessary to be well-versed and have command of the literature as you are conducting the study and preparing our results. This is a given. However, the way in which you have understood the literature is distinct from the way in which we need to characterize/present it to the reader. If there are particular subtleties to our results, even if these are perceptually minor in the sea of literature, then such elements need to be properly introduced to provide the right context for the our results to be favorably received.
Artful introductions are really something to appreciate. Dissecting introductions is an excellent exercise in critical reading. It is also useful because you can then more readily access the headspace of the writer/researcher who did the work by paying attention to the subtext or meta-level organization.
I advocate for a structure along the following lines:
High-level introduction (1 paragraph). This paragraph needs to generally orient the reader in the field and generate immediate interest in the relevance of the current study. Note that this is an expansion of the first two points of the Abstract. References in this paragraph are usually more of a general topical nature.
Gap/question-oriented review of relevant literature (multiple paragraphs). In subsequent paragraphs, you will unpack information that is necessary for the reader to understand the context and innovation of the work. Note that this information is only alluded to in the abstract, but the referenced studies likely comprise a bulk of your dedicated reading. Too often introductions become unfocused and unwieldy. Here, I like to provide paragraphs that make an important observation or point, indicate relevant literature/findings for that statement, and conclude with a remark that sets the stage for our work. For example, the paragraph could open with the statement that MD simulations are broadly useful for characterizing something pertinent to the area of our study. Subsequent sentences may delve into how MD simulations have been used to extract insights in some cases. The final sentence may pivot to note how the specific angle that is relevant to our study has not been addressed or perhaps indicating discrepancies/unresolved knowledge.
This study… (~1 paragraph). The paragraph that follows the review of relevant literature should detail how the present study (the one you are writing about) addresses gaps/questions highlighted elsewhere in the introduction. In this paragraph, we want to frame our study and define its intended scope; this can safeguard against criticisms that we didn’t do \(Y\) or \(Z\). Often, the open may be “In this work, XXX” or “Here, we YYY” or “To address ZZZ, we AAA” with the variables indicating the major goal/objective/approach. We also highlight one or two salient findings or contributions to the literature that serve as teasers and further prime readers for what’s to come. We need not be too specific but also not overly vague; there should be some concreteness to the mention of methods and results so that it can be distinguished from just a generic simulation study. From reading this paragraph, the reader should have an appreciation for what we have done and why we think it is an important/useful contribution to the literature. It is important that we define specifically what we aim to accomplish. Sometimes we can fall into the trap of being too subtle for outside readers, and this can be very problematic during critical review where a referee may think we are addressing something distinct from our primary intellectual contribution.
optional roadmap (~few sentences of last paragraph) Occasionally, for longer papers, methods-oriented papers, or those with non-standard formats, it may be advisable to include a brief roadmap to highlight the organization of the paper.
Results/Discussion - In theory, if you have followed through with the roadmap/storyboarding approach, this section should be fairly easy to write. It may not be perfect, but a suitable draft can be readily crafted by just expanding on the points/sentences provided with respect to each of the figures. In your first drafts, I strongly advise you to not get fancy. I am not suggesting to be overly terse, but avoid making observations that are off-topic or saying many words when fewer would suffice. The more straightforward and direct, the better. If discussion is lacking, it will become apparent. It is usually easier to add commentary after the fact.
Some journals and fields allow for presentation of results and then discussion. I have never written a paper in this way, and I question if I ever will. For me, it seems awkward to decouple the two. In addition, it’s harder to follow because one has to circle back to the figures to provide the discussion. Nevertheless, the germ of the idea is to keep indisputable facts/observations separate from more speculative discussion or interpretations. That is fair, but we can also achieve this within a singular section rather than two separate ones. I will also note that some journals have effectively replaced Conclusions sections with those labeled as Discussion, but that is different from what we are talking about here.
Conclusions - My preference is to keep Conclusions brief and succinct. Usually, the section will usually feature 1-3 paragraphs. Minimally, we should offer a brief recap of the manuscript and reflect on the broader implications of that work. Here, we want to focus on the most important methods/messages. If sensible, we should mention any known limitations of the work. Because this relates to the scope, we need not necessarily perceive this as a weakness, rather we are just delivering proper context for its evaluation and trying to prevent mischaracterization. It is also natural to highlight future research or opportunities that are enabled by the research. Sometimes aforementioned limitations can also be framed as opportunities for future work. A sensible organization proceeds as:
Basic summary of the paper addressing what we did and for what purpose.
Recap of major findings. We do not need to dwell much on particulars of the data but moreso what we gleaned from the work. This could be unpacked into a narrative walkthrough, but it is also OK to keep this snippy.
Contextualization and opportunities. After recapping the work, we can offer some commentary, as it relates to present literature or prospctive future work.
Methods - Rigorous and complete methodological description is essential for proper scientific communication. Months or even years of time have been wasted on incomplete, ambiguous, or erroneous reporting of methods. I think many students presume that this section is relatively easy to write, but in reality, it can be quite challenging, especially given the standards of quality and consistency to which we aspire. Challenging does not necessarily imply overly complex or long. That being said, I usually find that initial drafts from students are overly descriptive/pedantic in Results and severely under-developed in terms of methods. Everything must be present in sufficient detail that another researcher (who is knowledgeable in the field/methods) could reproduce the work and its conclusions. To reproduce the work, they need to be able to effectively parse the methods.
Another common issue is leaving the Methods until the end, in terms of your writing. There are a number of reasons you should not do this. First, the notation defined in the Methods will need to be consistent throughout the text. Nailing down notation early will mean we know what to say and how to refer to variables/analyses in the results/discussion/figures. This saves us time. Second, you should have total command of what has been done here. This will also remind you of all the ins and outs of your analyses. Third, I need it to review. We discuss details all the time, but sometimes, things may be left unspoken or underappreciated in our conversations. I need the methods to understand precisely what is going on. We can often uncover some kind of issues at this stage, and it’s better to catch them earlier than later.
The positioning of Methods (as well as its naming) varies depending on the nature of the manuscript and the target journal. Conventional locations are (i) following the introduction but before the results, (ii) following all discussion and conclusions, or (iii) within the supporting information. There are also combinations of those three, typically with (i) or (ii) mixed with (iii). In this case, essential methods for understanding the manuscript will be included in the main text, while complete methods required for reproducing the results would be relegated to supporting information.
I usually like to start with a description of the systems/models. Then, we should provide details regarding any simulations or machine learning methods. After, we can describe essential analyses or other elements. If the techniques used in the present paper have been used elsewhere in our prior work, then it makes sense to use the methodological description in our prior paper as a template for the current paper. This includes things like standard descriptions of MD via LAMMPS or training/cross-validation of ML models. In this vein, it is totally OK to borrow the language or “plagiarize” the format of our own methods sections. As my friend Prof. Savoie says “The idea that you should rewrite an identical protocol just for the sake of rewriting it is absurd.”
Other than reviewing prior Methods sections, here are some general tips:
Be direct, complete, factual, and technically precise. Don’t get fancy.
Try to report everything in a logical order. Sometimes other methods or kinds of analysis will depend on the systems that are studied or the types of simulations that are run. So, describe those things first before you get into that kind of analysis.
Don’t sweat “awkward transitions.” Things should be presented logically, but we do not need some sort of seamless flow. Use sub- or sub-sub-sections to your advantage. These formatting choices cue the reader for a transition to a new topic.
Care about equation/symbolic/notational accuracy, precision, and consistency. We do not want to leave elements of our theory/method/analysis to be ambiguous or confusing. Try your best to come up with an internally consistent set of variables/notation. At this, avoid variable overload (i.e., using something like \(\sigma\) in different contexts, even if subscripted).
As an addendum, use notation and language that is consistent with the literature, if at all possible. There is no need to re-invent concepts/terms that have been established, and additionally, we do not want to confuse readers by associating a certain term with an unexpected variable or vice versa.
Do not overly rely on something reported in a previous manuscript. Undoubtedly, you may have encountered a paper that indicates that the methods were described in a previous paper only to (1) not be able to access such paper, (2) be further re-directed, or (3) find that the methods remain incomplete. That’s annoying. We may note that the procedures/protocols are similar to those employed previously, but our work should be self-contained. The main text could be shortened with such a statement like this followed by a brief recap. Even so, the full methodological description should be present in the SI.
Avoid software-specific jargon. The results and procedure should be robust and agnostic of the software. We should report the algorithms and techniques used rather than the command issued. For example, rather than ‘Simulations are performed using fix nvt,’ you should instead indicate the use of a Nosé-Hoover thermostat. Importantly, this should ensure that you understand the commands/analysis.
If there are options/hyperparameters to the commands issued in a program, indicate them. If default parameters were used, explicitly indicate those. Here is an example:
“The number of hydrogen bonds \(n_{\text{HB}}\) between water−water, water−polymer, and polymer−polymer is monitored and averaged across simulation trajectories for different surfaces. A hydrogen bond is considered as formed when two conditions are fulfilled: (i) the distance between donor and acceptor atoms does not exceed 3.5 Å and (ii) the angle formed by donor, hydrogen, and acceptor atoms exceeds 150°. These calculations are facilitated by the MDAnalysis package.\({}^{84}\)”
Supporting Information Text - Most manuscripts will also be accompanied by some supporting information (SI). It’s fair to wonder what goes in there. You may find some SI to be 50+ pages long and haphazardly put together. This is to the benefit of virtually no one.
My essential philosophy is as follows:
I prefer fairly lean SI. The SI is not the place for critical information or results.
By consequence, the SI is a fair place to deposit results or calculations that are not viewed as essential for the major findings of the work but are perhaps informative or supportive of our pursuit of such findings. These may preempt some criticism or address a potential concern. For example, if we were concerned about the simulation length and the convergence of a particular calculation, then results illustrating the timescale of convergence may be warranted. Another example would be if we used a specific measure/calculation to arrive at some conclusion, but a different calculation might have been reasonably pursued (and we in fact did so), then we can include the secondary calculations in the SI to bolster our arguments in the main text.
The SI may also feature duplicate analysis for other conditions or systems. In this scenario, the main text may detail some findings in a particular set of conditions as shown in a specific figure. The same calculations and analyses may be replicated in other scenarios, and we can draw conclusions from those as well. However, it may be cumbersome to present each figure in the main text. Instead, we can include in the SI with a callout from the main text.
The SI is also the place for laborious reporting of fine methodological details. The SI is not routinely visited by most of the readership, but the folks that do are often precisely interested in those details. For example, a graduate student may be trying to reproduce a calculation to verify an implementation, and they will benefit from this precise reporting.
Everything in the SI needs to have a callout or reference from the main text. If the main text does not reference it, then why are we including it?
Formatting is important, but we do not have space constraints. I strongly suggest you then use LaTex to compose the SI. It is just simple and straightforward to let it handle the arrangement and formatting.
The amount of discussion/text varies. If we are using the results in the SI for justification of something, then there should be accompanying discussion. Figures should have complete captions that hold to the same standard as main-text figures. Brief presentations of the figures are good to include.
Supporting information files/data - We may also like to supply either data or example input files/scripts with our manuscripts. This is in the interest of reproducibility. However, the inclusion of input files is not a substitute for appropriate description of methodology.
General Advice and Style
Here, I want to highlight some key stylistic elements that should make writing and editing easier for us.
Equations and Mathematical Notation
Equations and notation should be rigorous in presentation and precise in definition. There are a few stylistic conventions that you should adopt:
Italicized/math type should be used for variables, i.e., quantities with values that can can change. For example, if we are probing the self-diffusion coefficient for a given molecular species, this may be symbolically reported as \(D_i\) where both \(D\) and \(i\) are in italics/math type because the quantity itself is changing as may be the molecular species (e.g., H\({}_2\)O vs. CO\({}_2\)). Pay careful attention to both parts (italicizing variables and not italicizing static quantities/terms) because I will definitely notice.
The corollary is that italics/math type should not be used for quantities that do not change or firm names of things. For example, if we are considering correlation times, we may have \(\tau_{\text{res}}\) and \(\tau_{\text{rot}}\) to distinguish a residence time from a characteristic time for rotation; notice that we have \(\text{res}\) rather than \(res\). By convention, neither character in Boltzmann constant should be italicized (\(\text{k}_\text{B}\) vs. \(k_B\)). Make sure this is also true in your figures!
The letters in common (e.g., \(\cos x\), \(\ln x\)) or abbreviated function names (e.g., \(\text{MSD}(t)\)) should also not be italicized.
Use the proper “\times” symbol \(\times\) rather than “x” to indicate the multiplication. Use \(\times 10^a\) rather than “e+a” or similar such nonsense.
If using accents like overbars or hats, then restrict these to the space above the symbol that they are modifying and not its sub/superscripts. For example, \(\hat{\mu}_{\text{w}}^{\text{ex}}\) rather than \(\hat{\mu_{\text{w}}^{\text{ex}}}\).
Common Grammatical Errors/Constructs
I believe that proper grammar and punctuation are important for our papers. I have held many conversations on how to approach grammar in scientific writing; people close to me know that I think pretty deeply about these things. It has been reasonably argued that grammar and language is not crucial for scientific communication. That can be fair, but poor grammar and punctuation can also be distracting. I can still understand but remain distracted. Additionally, why make mistakes that we know how to avoid? Anyway, I give a lot of credit to my highschool English teachers; shout out to Kim Cuevas who really ingrained some of these things into my brain at a young age. Both of my academic advisors were/are excellent writers and editors as well. I continue to learn and refine skills in language, such that we make fewer and fewer mistakes and be more consistent. I want to make clear that I make mistakes in writing all the time; however, I am also pretty good at recognizing and correcting these errors.
This seems like a good resource.
Here are some things to be aware of:
Uncertainty. If you are uncertain about a grammatical rule/construct, then you can ask/research it. Alternatively, rephrase the sentenct to avoid the issue altogether. * Tense. Regarding tense, my main preference is consistency. Maintaining consistency, I somewhat favor using present and present simple passive tense. This is particularly true of main-text results and discussion. Methods may be in the present tense, “Simulations are performed for…” or in the past tense, “Simulations were performed…” Once established, it should be consistent. Recently, I have liked using the past tense. The use of progressive or perfect progressive is rarely needed.
* Wordiness and concision. Using too many words is not only unnecessary but also gramatically wrong. I want to avoid the impression that things must always be short. The overall objective is clarity and technical precision. The issue is that using too many words can obfuscate the meaning. There is certainly a point where a terse description actually changes the meaning. At that point, we have gone too far. There are several examples of phrases in common vernacular that fit the criterion of wordiness. You may write them by instinct, but you should strike them during revision.“In order to XXX, YYY”; just use “To XXX, YYY” the “in order” adds nothing.
“The reason why is XXX”; just use “The reason is XXX” or “This is because XXX.”
“Due to the fact” –> “Because” Also, always be on the look out for straightforward simplifications. During writing, you should always strive to simplify. It is remarkable to me how unnecessarily wordy our initial writing may be.
“Therefore, it is clear that X has an effect on upon the growth of Y.” vs. “ Therefore, X affects the growth of Y.”
“This implies that each particle is characterized by having several distinct neighbors.” –> “Particles possess several distinct neighbors.”
Simplify sentence structure. Unless we are very deliberate, we weave words and revisit concepts to make sense of everything as we go in everyday speech. This is a weakness of my speech patterns. In writing, we have the advantage of revising our thoughts before others see them. We should avoid having many twists, turns, and interjections. Those sentences are annoying to read. This is often achievable by simply rearranging the sentence. Simpler sentence construction also means fewer errors in punctuation because less punctuation is required. Especially in the age of large-language models, be very judicious about the use of interspersed hyphenated phrases. They may not be necessary with simpler and more direct sentence constructions.
Oxford Comma. We will use the “Oxford comma” unless it is against the formatting guidelines of the journal. The Oxford comma is used during the construction of lists with three or more items/phrases. This has the construction of “A, B, C, and D” or pehraps “E, F, but not G.” As I recall, the ommission of the Oxford comma was mostly to save space in print; that’s not an important consideration for us.
Faulty parallelism. Faulty parallelism is another common error that we fall prey to every now and again. This most often happens in the construction of lists, but we can generalize this to other repetitive structures. When repeating a structure, we should maintain the style/grammatical constructs throughout the series.
“Camden likes to walk, climbing, and water play.” vs. “Camden likes to walk, climb, and play in water.”
Commas, in general. Misuse of commas bugs me more than any other thing, and it happens a lot. We have already covered some proper use of commas. Commas are most often used in the construction of lists; to set off non-restrictive clauses, non-essential appositive phrases, or conditional statements; or to demarcate independent clauses around a conjuction. If you are using a comma outside of these scenarios, it is probably a mistake. A comma splice erroneously introduces a comma, thereby creating a sentence fragment. This happens frequently around words commonly used as conjunctions, such as “and” and “but,” when they are not incorporated to signify a list.
Camden ate some toast, and some strawberries. (comma splice)
This not only illustrates that A depends on B, but also C. (comma splice again)
Semicolons. Semicolons are interesting. The most common use is to separate two independent clauses, but the second clause is closely and obviously related to the first. In principle, we could have two different sentences. The use of the semicolon therefore cues the reader into the association. Additionally, it may simplify the latter clause by relying on information introduced in the first. This is also true of the “A; however, B” type of things. A second, less common use would be in constructing lists of complex phrases that may already contain commas.
Adverb Placement. By my preference, we will place adverbs before the verb that they modify, rather than after (e.g., “strongly depends” vs. “depends strongly.” I find this makes the language more direct and linear, but that’s just my opinion.
Writing Numbers. There are all kinds of rules about this that hardly anybody cares or enforces anymore. For numbers 0-10, write them out; otherwise, use the digit representation. If the number begins a sentence, either write it out or modify the sentence to avoid the issue.
Using symbols to lead sentences. Just don’t do it. I am not sure it’s not allowed, per se, but don’t begin sentences with numbers or mathematical notation. Change the sentence construction to avoid this.
Other words of advice
Word choice. This is very broad, and many recommendations are essentially variations on this. As you review your text, you should consider the question: “Is this what I really mean or want to convey? Will someone not involved with the work understand this statement?” If the answer is no or unknown, then describe it differently.
Avoid posessives, anthropomorphizing, and contractions. I don’t like using possessives with objects/ideas, and I am willing to use “of” to avoid it. In addition, contractions (e.g., don’t, can’t, we’re) should not be used.
Don’t say things you don’t need to say. This is essentially more specification to “stay on topic.” The intent of this advice though is a little bit different. In essence, we do not want to be put in a position of trying to defend observations or statements that are not essential to the real claim we are making. Certain claims may be speculative or under-supported, but are they important to the present study? If not, then simply avoid the inevitable criticism by not making the statement. Another manifestation is when one offers an unnecessary commentary or evaluation of degree/quality. How important is that assessment to the actual conclusions of the work? Here, I recommend avoiding unneccesary qualifiers. Finally, we want to be careful to avoid using language or making assessments that could unintentionally offend or mislead prospective readers. We may often want to motivate the use of molecular simulation or machine learning for a topic. That does not need to necessarily come at the expense of other approaches. Focusing on more positive or neutral assessments is preferred. Sometimes we can and should remark on limitations, but it needs to be done tactfully. This can often be accomplished through simple rephrasing. Is something truly “impossible”? Someone might disagree with that but agree that it is “challenging.” The general advice is to stick to statements that are factually correct and generally unobjectionable (by a reasonable person).
Common wording issues
The use of “We.” Surprisingly, I am not overly zealous as to how we should or should not use “We” in scientific writing. This was heavily discouraged “back in the day.” That being said, I would probably recommend recommend not using “We” by default. Then, there are two distinct uses of “We” that follow. First, we might use it to signify extrapolation or speculation beyond what is known from the data. In this case, “we” is to imply that it is our thoughts, which may not be in line with those of the reader. Similarly, we may use “we” to indicate a specific choice among many. Here, we are recognizing that reasonable people may have pursued the research a different way, for example, but by making it clear that this is a choice, the reader/referee may be more inclined to accept the idea. Second, “we” may be used in the royal sense. This is most relevant in the context of narrative flow. You can think about this as being a tour guide for the paper.
The use of “that” vs. “, which”. The words that and which are usually used to set off a clause that modifies the previous one in a sentence; the distinction is whether the new clause is essential or non-essential to the context of the first clause. If it is essential, we should use that, and no comma should be used. If it is not essential, we should use which, and the new clause should be set off by a comma.
“compose” versus “comprise”. Comprise is used when the subject makes up the object of the sentence, whereas composed is used when the object makes up the subject of the sentence. “Two beads comprise the side-chain.” “The side-chain is composed of two beads
Avoid absolutes unless you really mean it. Only Sith deal in absolutes. This falls into saying things you don’t need to say. A few common examples follow, and then hopefully, you get the idea. All it takes is one counter example, and then the statement is invalid.
“constraints” versus “restraints”. Constraints provide an absolute condition that is not violated. I find people often use the “constrain” when they actually mean something more akin to a restraint.
“prevent” and similar. Along the same lines, prevent is an absolutist term. A mistake might be suggesting a particular measure “prevents finite-size effects.” However, can that be actually guaranteed? It is more likely and safer to suggest that the measure “mitigates finite-size effects.”
“optimal”. Another dangerous word. This word has a certain sense of rigor to it that may not be guaranteed by the methodology employed.
Topic and Takeaway Sentences
Virtually every paragraph in the manuscript should start with a sentence/statement that cues the reader into what the paragraph is about or means; this is the topic sentence. Use them. It provides information to the reader and structure for the writer (you/us).
Many (not all) paragraphs will also feature a *takeaway sentence$ that concludes the paragraph. Takeaway statements usually expand on the idea of the topic sentence, now equipped with the information that has preceded it. The function of style of topic/takeaway sentences changes depending on the section of the paper in which they appear.
Useful Exercise. Consider whether the narrative of a manuscript is logical if reduced only to topic and takeaway sentences. In so doing, it may become readily apparent that your paragraph is starting with “the wrong” information. This also allows one to easily and critically assess the structure and organization of the manuscript. If you have strong topic/takeaway sentences, the flow should feel very natural. A reader should understand our essential scientific narrative and argument if processing the manuscript in this manner. The balance of content is evidence in support of that narrative. Now that you know where to look/notice, examine any number of papers by Webb and members of the group for different examples of topic sentences. As you read other papers, you may see similar strategies emerge.
Types of topic sentences. There are several different flavors or topic sentences that one can use. Just remember: paragraphs must have a focus and that should be clear from the outset! Below, a few types are summarized. Certainly, there are more. As appropriate, the takeaway sentence follows in parentheses.
Factual - these sentences usually open paragraphs that will provide information necessary to understand the narrative. These are most common in an Introduction as means to motivate the present study. In Methods or Results/Discussion, such sentences may introduce a new concept, particularly in more of a narrative/letter format. Here are some examples:
Many condensates are believed to form spontaneously as a result of phase separation taking place close to a local thermodynamic equilibrium (\(1,17\)). (However, understanding the general relationship between thermodynamic stability and the internal dynamics of biomolecular condensates requires additional and systematic exploration (\(23,24\)).)
Polymer-protein hybrids (PPHs) have emerged as attractive materials that leverage polymers to improve protein solubility and stability in often denaturing and abiological environments (\(1-5\)). (Thus, fit-for-purpose PPHs could facilitate myriad applications–biofuel production (\(8\)), plastics degradation (\(9,10\)), pharmaceutical synthesis (\(11\))– but a robust strategy for their design remains elusive.)
Significant progress has been made in both developing facile chemical pathways for synthesis of possible precursors (i.e., the initial, unfolded polymer chains) and the characterization of resultant morphologies for SCNPs. (Knowledge relating a precursor to structure formation (and its reliability) in resulting SCNP will be crucial in achieving target functional properties of these materials.)
Molecular dynamics (MD) simulations provide a powerful way to investigate water-surface interactions with molecular-level resolution.\({}^3\) (Studies of water interactions on chemically distinct and flexible polymer surfaces appear nascent.)
While the effect of modifying friction on the polymer relaxation times using the IFRM is roughly linear with salt concentration, simple and empirical descriptions for the viscosity of electrolyte solutions often have other functional dependences.\({}^{32-36}\)
Rationale or presentation-based - these kinds of sentences are useful for introducing an idea/mode of analysis/conceptual shift. Here are some examples:
Polypeptides are chosen for characterization based on a policy of expected hypervolume improvement (EHVI), which has previously been demonstrated to converge towards a true set of Pareto-optimal sequences.
To quantify changes in polymer dynamics, we compute Rouse-mode relaxation times for the atomistic polymers after mapping to discrete chains with N = 320 (SI, Section 2).
Because the class II polymers do not have a fixed-length repeat pattern of CUs, the input featurization approach must be generalized to account for variable sequence lengths.
We iterate with a Learn-Design-Build-Test cycle (Figure 1) to identify high-performing PPHs.
To further understand the physics underlying hydrophobicity and the robustness of the nanoscale hydrophobicity metrics, we also consider structural and stereochemical variations of polymer surfaces (Figs. 1c,d). (Together, these results suggest that the simulations capture key physical interactions that dictate wettability, and these interactions are reflected by computed nanoscale hydrophobicity metrics.)
We next investigate the impact of precursor patterning on the consistency of forming SCNP morphologies.
We characterized 7,680 unique SCNPs across the \(\beta-f\) space by the number of free backbone beads \(n_f\), the number of domains \(n_d\), and the radius of gyration \(R_\text{g}\) to quantitatively examine how particular morphologies arise as a result of \(\beta\) and \(f\). (Collectively, these results illustrate how sequence patterning of precursors can be manipulated to bias the morphological characteristics of SCNPs.)
Given the identification of highly stable PPHs for each enzyme, we sought to understand the important chemical features of copolymers that gave rise to their performance.
Point-driven - these sentences are effectively topic/takeaway packaged together. Often if a rationale-based statement leads the paragraph, this sentence will quickly follow. In the case that the rationale is already clear, then this becomes the first sentence. Here are some examples:
Both \(\beta \Delta f_{\sigma}\) and \(\beta \hat{\mu}\) quantitatively distinguish hydrophobic behavior across the amorphous polymers studied (Fig. 3).
Figure 2a-c shows that the active learning paradigm facilitated identification of numerous, diverse copolymers that enhanced retained activity for each of the three enzymes.
This aggregate-level analysis reveals how sequence (dis)similarity does not necessarily dictate differences in macroscopic properties in obvious ways.
Outside the low-\(B_2\)/low-\(D\) regime, differences between counterfactuals and Pareto-optimal sequences are more difficult to resolve.
Figure 2A shows that fingerprints that use either size-explicit or size-implicit representations of the polymer significantly improve ML models trained to predict properties in Dataset A.
Figure 4A reveals that most SFP-based strategies with size representation perform similarly, irrespective of the type of CU fingerprint and the prediction task for Dataset A.
The IFRM does not reproduce MD results at finite salt concentrations when local increases in friction are limited to 3ζ0.
Discussing figures
If you are paying attention, you might have noticed something about the way in which figures are referenced in the topic sentences. Namely, we should generally say what the figure or data means and not necessarily what it contains.
Why? It is generally unnecessary fluff to detail the type of plot. A sentence like “The radial distribution functions of Y are shown in Figure XA” delivers no useful content that would not be known or supplied elsewhere. We just added a sentence for no reason. Recall that we have crafted and detailed the figure sufficiently such that we need not do so again. Deliver your interpretation, “cite” the appropriate figure/panel as evidence, and move on. It is preferred to simply “cut to the chase” and indicate the major finding from our analysis. In the prior example, we are not typically interested in the specific structure of a radial distribution function but in what that function tells us about the structure of solution/assembly. So, something more along the lines of “All systems exhibit similar aggregation irrespective of ligand chemistry (Fig. XA)” is more informative.
Here are some examples: * “Figures 5a,b demonstrate that interfacial water dynamics indeed vary dramatically based on polymer surface chemistry.” * “To gain further insight into hydrophobic ordering, we examine how water specifically interacts with the polymers that do not exhibit hydrogen bonding (i.e., PTFE, PE, {\PEx}, PVC, {\PVCx}). Fig. 6 shows that water orientation with respect to the polymer-water interface differs according to polymer chemistry. In Figure 6a, the average orientation of water is significantly perturbed in the vicinity of PE and PVC whereas there is no orientational preference in the vicinity of PTFE.” * “Figure 3 demonstrates that both \(\beta \Delta f_{\sigma}\) and \(\hat{\mu}_\text{ex}\) can quantitatively distinguish hydrophobic behavior across the amorphous polymers studied.” * “Figures 6a,b show that sequence variability is highest for the mean value and dispersity of \(P(\langle R_\text{g}\rangle)\) at low \(f\) and high \(\beta\) values.” * “Figure 4 illustrates that a single precursor sequence can give rise to a set of SCNPs with diverse morphological characteristics.” * “Simulation durations of at least 2 microseconds are required to reach equilibrium, resulting in the formation of an interface between equilibrium condensed and dilute phases (Fig. 3a). As anticipated, the differences between the condensed and dilute-phase densities are anticorrelated with the second-virial coefficients, which decrease from P1 to P35 (Fig. 3b).” * a paragraph that anticipates some primer may be needed. “To further explore the relationship between copolymer features and PPH activity, we computed Shapley additive explanations (SHAP) values [41,42]. to quantify how chemical features of the copolymers (fractions of incorporation and DP) contributes to REA predictions by our GPR models. Here, positive SHAP values indicate positive contributions REA (negative SHAP values suggest negative contributions), and we use the mean absolute SHAP value of a feature as a proxy for its overall importance to model prediction. Figure 3c shows that different copolymer features have distinct impact on REA predictions. To elucidate these differences, we compare SHAP values for the fractions of incorporation for each monomer (Figure 3d-f) and DP (Figure 3g-i) for each enzyme. Although we previously associated hydrophobic chemistry with high-performing PPHs for HRP (Figure 2f,i), Figure 3d reveals that the \emph{exclusion} of BMA is favorable (higher REA), while the \emph{inclusion} of MMA, a similar hydrophobic monomer, is associated with higher REA. Similar observations can be readily identified for Lip (Figure 3f), for which SPMA and TMAEMA monomers (both highly ionic) represent the most and least important features based on their mean absolute SHAP values. Such differences in SHAP values between monomers with the same chemical classifications underscores the intricacy of designing effective polymer-enzyme pairing.”
There are occasional exceptions if it is deemed necessary that readers might benefit from a primer on the manner of analysis, if relevant features are uncommon, or if that presentation helps simplify the discussion. If a figure must be described, it is better to explain the essence or intent of the analysis rather than the specific manifestation. For example:
“Figure 2d-i examines both the progression of active learning and PPH performance as a function of the chemical constitution of copolymers” rather than “Figure 2d-i visualizes the distribution of designed copolymers produced during iterations of active learning projected onto ternary plots of copolymer composition coded by ionicity, hydrophobicity, with copolymers.”
“Fig. 6b also reveals subtle differences in the distribution of hydroxyl group orientations. For PE and PE\({}^*\), \(P(\alpha_\text{OH})\) for PE has a wide shoulder at around \(30^\circ\), which is commensurate with the formation of the well-known ``dangling OH bond’’\({}^{119,120}\) that facilitates the formation of a hydrogen-bond network; such phenomena has been noted at the water-air interface and for other water-hydrophobic surfaces.\({}^{121,122}\)”
Examples
Introduction
Below, we will supply a couple of example Introduction sections (without formatted references). I will also provide some commentary in parentheses that follow the paragraphs.
Intro Example 1
The following example is from “Systematic Computational and Experimental Investigation of Lithium-Ion Transport Mechanisms in Polyester-Based Polymer Electrolytes”. It’s notable that this was my second first-author paper as a graduate student. At this stage, I had not formally conceptualized how to write good papers. Nevertheless, I think the introduction (and paper) are both very well-constructed. At the time, most of this had been achieved via mimicry and constant editing rather than conscious organization.
Solvent-free, solid polymeric electrolytes (SPEs)\({}^1\) are of interest for the development of safe, stable, and cost-effective battery technologies. Candidate SPEs typically require both a strong coordinating affinity for the conducting cation and a suitable distance between coordinating centers.\({}^{2,3}\) Consequently, poly- (ethylene oxide) (PEO) and PEO-based polymers have been extensively characterized, although ambient temperature ionic conductivities in such polymers are not satisfactory for many practical applications.\({}^{4,5}\) (Here, we immediately establish the general topic area of SPEs and highlight a key need for the field based on the inefficiencies of the most well-characterized systems)
Significant theoretical evidence suggests that ion transport in polymers is intrinsically coupled to polymer motion.\({}^{6−15}\) In particular, numerous theoretical studies of ion transport in PEO-based SPEs have shown that lithium cations are typically coordinated by 4−7 oxygen atoms (from one or two independent chains) and diffuse via three principal mechanisms: interchain hopping, intrachain hopping, and codiffusion with short polymer chains (<10 000 g/mol). Efforts to improve lithium-ion conductivity in PEO-based polymers have thus mainly focused on disrupting polymer crystallinity and lowering the glass-transition temperature Tg, such as through the use of plasticizing additives,\({}^{16,17}\) cross-linked, comb, or graft polymer architectures,\({}^{18−22}\) incorporation of comonomers into the PEO backbone,\({}^{23−30}\) and polymer blends.\({}^{31,32}\) Despite these efforts, ionic conductivities in state-of-the-art,PEO-based SPEs remain limited at ambient temperatures.\({}^{21}\) (This paragraph does heavy lifting and is careful with its language based on the results of the present study. There are two components. First is the emphasis on the “coupling to polymer motion,” which we present in the context of PEO-based polymers. This is a central notion that we challenge in the manuscript, but our observations are made for non-PEO-based polymers. The second point that can be gleaned is from the last sentence, which comments that conductivities remain limited despite all these strategies. This point serves as a bridge to the next paragraph, which motivates a specific aspect of the current study.)
Non-PEO-based polymer architectures provide new opportunities for enhancing ionic conductivity by altering ion−polymer and polymer−polymer interactions and are thus of interest for the design of next-generation SPEs. Ionic conductivity characteristics have been experimentally investigated in several novel polymers that include polyesters, polyphosphazenes, polyamines, polysilanes, polysiloxanes, and polycarbonates.\({}^{33−40}\) However, few theoretical studies on the mechanisms of ion transport in such polymers have been performed, and it is not known to what extent the transport mechanisms present in PEO are shared in other polymer architectures. The design of new SPEs requires an improved understanding of the mechanisms that facilitate lithium-ion transport in polymers and the identification of new polymer architectures that efficiently realize these mechanisms. (This paragraph clearly articulates the motivation for examining non-peo-based polymers and provides context to later appreciate our specific contirbutions. We are also emphasizing a need for mechanistic understanding as can be supplied from MD but have yet to be performed.)
Here, experimental synthesis and electrochemical characterization are combined with long-timescale molecular dynamics (MD) simulations to investigate lithium-ion transport in six new SPEs (Figure 1). Modular synthesis produces six polyesters that have either of two backbone motifs and one of three side chains (Figure 1, top). These polymers are then characterized using both simulation and experiment (Figure 1, middle), which demonstrates the effect of polymer composition and architecture on ionic conductivity (Figure 1, bottom). By comparing experimental observables with the corresponding quantities from simulation, we identify the primary trends regarding polymer architecture and conductivity. Agreement between simulation and experiment then provides a connection between macroscopic properties and molecular-level processes, which enables a detailed theoretical analysis of the molecular processes that give rise to the observed trends. This complementary approach provides a better understanding of ion transport in novel polymer electrolytes than would be obtained from either an independent experimental or theoretical study. (This is the obvious “This study” paragraph. Note that we are totally laying out the scope and essence of our approach. Without being too detailed about the specifics of the results, we are conveying what we are able to achieve and what this means.)
Brief Intro Example 2
The following is pulled from “Graph-Based Approach to Systematic Molecular Coarse-Graining”. To cut-down on length and moreso emphasize the structure of the introduction, I am only going to include first and last sentences for the review of relevant literature portion. Notice how the narrative flow still “works.” This is because the body of the paragraphs are just literature support for the bracketing sentences. The paragraphs are also logically arranged such that one feeds into the next. This need not be seamless, but the delivery of information should be logical in its progression.
Molecular simulations are widely used in the study of physical, chemical, and biological systems. However, many interesting phenomena–protein folding, macromolecular self-assembly, polymer rheology, etc.–involve spatiotemporal scales that are generally inaccessible by naive simulations with atomistic models. To reach such scales, a number of coarse-graining strategies have been proposed in which groups of atoms are combined or lumped into individual interaction sites.\({}^{1−5}\) Coarse-grained (CG) models are widely used to explore behaviors at mesocopic length and time scales, to engage in high-throughput studies in both chemical space and state space, and to make insightful interpretations of experimental observables.\({}^6\) Coarse-grained models are also critical components of multiscale simulation techniques that actively employ atomistic and CG representations in the same simulation, such as in adaptive resolution and hybrid resolution modeling.\({}^{7−9}\)
Development of CG models generally involves two interrelated challenges.\({}^{10,11}\) …XXX… The representations and interactions in these models are frequently derived from higher-resolution models, generally atomistic, as part of bottom-up or multiscale modeling strategies.
Over the past two decades, significant efforts have been devoted to the issue of determining interactions in CG descriptions. …XXX… In contrast, the issue of generating suitable CG representations, or mapping schemes, has received relatively little attention, despite it being a keystone to any coarsegraining problem.\({}^{25}\)
In the majority of applications, CG representations are introduced as an arbitrary ansatz that maps groups of atoms to CG sites for which appropriate coarse-grained potentials can then be developed. …XXX… Existing tools\({}^{22}\) that facilitate producing ansatz CG representations become cumbersome for macromolecules and polymers. Few tools or methodologies are available for automated, systematic generation of CG representations.
Efforts to develop systematic methodologies for defining CG representations for simulations have largely been restricted to applications concerning large biomolecules.\({}^{26−31}\) …XXX… Although these strategies are viable for biomolecular complexes, strategies for general molecular coarse-graining are still needed.
In this work, a graph-based coarse-graining (GBCG) approach is proposed to systematically generate CG representations. The essence of the methodology is to represent the chemical connectivity of a molecule as a molecular graph, mapping atoms (or CG sites) to nodes and bonds to edges, and to derive successively coarser representations through the basic graph operation of edge contraction. The result of this procedure is a consistent, hierarchical set of possible CG representations that may be further parametrized and employed in stand-alone CG simulations or in multiresolution modeling applications that can exploit the use of CG representations with varying spatial resolution. One attractive feature of GBCG is that the CG representations naturally preserve the chemical topology of higher-resolution representations because the graph-reduction process is intrinsically tied to the chemical connectivity. Consequently, GBCG typically produces intuitive representations, but as a byproduct of a simple, robust, systematic, and unambiguous protocol rather than by ansatz. Several illustrative examples are presented here, ranging from simple molecules to complex polymers, to demonstrate the generic capabilities and implementations of the method.
Brief Intro Example 3
Let’s do the same again, now for “Molecular Dynamics Investigation of Nanoscale Hydrophobicity of Polymer Surfaces: What Makes Water Wet?”
From droplets on lotus leaves\({}^1\) to thin layers on laboratory substrates,\({}^2\) water is ubiquitous in everyday experience, and water−surface interactions are highly varied.\({}^3\) For example, interfacial water is known to mediate interactions involving biological macromolecules,\({}^{4−6}\) while microscopic water droplets have been linked to the phenomena of surface flashover7 and contact charging.\({}^{8−10}\) Understanding water− surface interactions may crucially advance sustainability efforts, as these can govern figures-of-merit associated with the catalytic efficiency of hydrogen evolution11 or water harvesting and purification by membranes.\({}^{12,13}\) Furthermore, tailoring surfaces with chemistry and geometric factors to modulate water interactions may yield novel coatings for antifogging technology, advanced textiles, and underwater electronics.\({}^14\) Overall, understanding interactions and properties of interfacial water is of significant interest, in terms of both fundamental physics as well as technological importance.
The concept of hydrophobicity\({}^{15,16}\) is commonly invoked to describe the nature of water−surface interactions and explain wettability. …XXX… Methods to predict, characterize, and understand the hydrophobic behavior of polymers would thus have broad utility.
Numerous experimental methods are available to characterize surface hydrophobicity and water−surface interactions. …XXX… Nonetheless, experiments may be nontrivial or provide signals that are based on the ensemble behavior of water, which obfuscates clear molecular-level understanding.
Molecular dynamics (MD) simulations provide a powerful way to investigate water−surface interactions with molecularlevel resolution.\({}^3\) …XXX.. Studies of water interactions on chemically distinct and flexible polymer surfaces appear relatively nascent.
Surface hydrophobicity can also be quantified via MD simulations. …XXX… nanoscale structural and thermodynamic measures of hydrophobicity are rarely compared, to each other or to macroscopic observations.
In this study, we characterize the wetting behavior of water and its mechanistic origins on six chemically distinct commodity polymers (Figure 1a,b): polytetrafluoroethylene (PTFE), polyethylene (PE), polyvinyl chloride (PVC), poly(methyl methacrylate) (PMMA), Nylon-66 (N66), and poly(vinyl alcohol) (PVA). By comparing results across different polymers, we broadly elucidate the role of surface chemistry, while additional selected systems are prepared to contrast behavior in crystalline versus amorphous (Figure 1c) as well as atactic versus isotactic (Figure 1d) systems. Across all systems, a variety of nanoscale hydrophobicity metrics are computed and compared to macroscopic experimental observations to assess the utility and relevance of characterizing hydrophobicity at the nanoscale with MD simulations. Meanwhile, spatially resolved structural and dynamic analysis is used to rationalize the hydrophobic ordering of polymer surfaces and further inform a molecular-level understanding of what typifies hydrophobic behavior for water−polymer interactions.
Conclusions
1-paragraph conclusion, example 1
In conclusion, we have presented a new practical paradigm for soft materials design that combines CG modeling, ML, and model optimization. This unique combination addresses technical challenges related to experimental synthesis and characterization as well as soft materials modeling. The approach is exemplified through the mapping of sequence to structure relationships in a nontrivial region of the CG polymer genome. Although this paradigm only relies on simulation data, we anticipate that integration with experimental data will be both possible and highly effective in certain applications. Overall, the results reported here highlight significant potential for enhancing efforts to design polymer-based materials via the combination of CG modeling and ML.
1-paragraph conclusion, example 2
The work presented here provides a unified Rouse-based description of how ion-polymer complexation manifests as increased polymer friction across the full range of relevant salt concentrations. Whereas previous work emphasized that friction was locally increased by an order-of-magnitude, we show that this leads to substantial deviation from MD results for subdiffusive motion in the dilute Li+ regime, and a 3-fold increase is more plausible. However, this 3-fold increase for complexed polymer segments severely underestimates polymer relaxation times at finite concentrations, and interestingly, MD simulations show suppressed polymer motion even when polymer atoms are farther than 10 Å from any Li+. By analogy with strong electrolyte solutions, we include a term in the beadfrictionthat phenomenologically accounts for neglected effects such as ion−ion interactions and associative effects. While such effects might be captured by introducing length- and timedependent friction contributions to the Rouse matrix using a Generalized Langevin Equation Rouse model,\({}^{38,39}\) this approach brings additional complexity and multiple parameters. Here, by using one empirical parameter, the simply modified IFRM achieves good agreement with MD simulations and recent experimental data. These results demonstrate the importance of factors other than direct ion-polymer complexation in the suppression of dynamics in polymer electrolytes and invite further study to understand the origins of this phenomena.
Longer (too long?) conclusions, example 1
In this work, we examined how sequence patterning of intramolecular cross-linking moieties (linkers) in polymer chains impacts the formation of single-chain nanoparticles (SCNPs). To do so, we simulated the formation of 7,680 unique SCNPs from precursor chains, which altogether comprehensively spanned a parameter space defined by blockiness \(\beta\) and the fraction of linkers in the chain \(f\). The topologies and morphologies of the SCNPs were then subsequently characterized using graph-theoretic and machine learning methods with subsequent analysis elucidating the general roles of \(\beta\) and \(f\) on morphological outcomes of SCNPs. With respect to graph theory, we show that the algebraic connectivity–a common metric in spectral analysis–provides a very powerful, descriptive topological descriptor that can be quantitatively linked to SCNP morphologies. Importantly, this descriptor provides complementary information to more traditional descriptors based on domain-assignment, particularly informing on the structure of single-domain and globular morphologies. With respect to machine learning, we show that manifold-learning techniques can distingusih subtle variations in SCNP morphology and regression algorithms can construct useful relationships between topological descriptors and shape parameters. Finally, we also introduce analyses of morphological dispersity in SCNPs, considering measures on both three-dimensional structure as well as the population of local environments.
Overall, we identify several major trends with respect to SCNP topology, morphology, and precursor parameters. Low-\(f\) precursors tend to adopt topologically diffuse structures with several domains and large chain segments between domains; in this regime, increasing \(\beta\) bias structure-formation towards fewer and larger domains but a higher proportion of the polymer chain that is not within any topological domain. By contrast, precursors with high \(f\) consistently give rise to SCNP with globular structures that show weak sensitivity to \(\beta\). In addition, while low-\(\beta\) precursors tend to result in SCNPs several domains and dense nanostructures, resulting in rod-like morphologies, high-\(\beta\) precursors result in fewer domaiins that adopt more ball-and-chain morphologies. By examining the dispersity of structures formed from every precusor, we found that low-\(f\) precursors yield SCNPs with substantial diversity in size but overall consistent local environments, while high-\(f\) precursors generate SCNPs with the opposite trend. Furthermore, comparing the distributions of SCNPs generated from specific sequences that all correspond to the same \(\beta\) and \(f\) highlighted the potential to leverage sequence patterning for morphological control, manifest in either specific average values or distribution characteristics.
Ultimately, the methods and results herein may spawn several directions of future research. First, the strong connection between the spectral properties of the SCNP and their morphological properties is intriguing. In future work, we aim to quantify the physical implications of spectral properties on SCNP structure and understand their relationship to other SCNP properties (e.g., response to shear flow, mechanical unfolding, etc.). Furthermore, there is significant opportunity to establish how precursor patterning dictates the mechanism(s) or pathway(s) of SCNP formation. Examination of formation pathways may facilitate additional understanding as to how to more finely control SCNP morphology. Finally, our results point to the possibility of tuning the dispersity of SCNP morphologies in numerous ways. Approaches to navigate such a design task, particularly in the context of experimentally verifiable systems will be needed.