Background

Section 1 Background

The term cyberinfrastructure traces its roots back to Presidential Decision Directive NSC-63 [25], and is commonly used today to refer to “computational systems, data and information management, advanced instruments, visualization environments, and people, all linked together by software and advanced networks to improve scholarly productivity and enable knowledge breakthroughs and discoveries not otherwise possible” [13]. What cyberinfrastructure “looks like” varies greatly between disciplines, but a constant quality of it is the utilization of modern technologies that make the jobs of scholars more efficient, thereby producing more and better advancements in STEM. As such, the National Science Foundation [17] and other federal agencies have made the development of cyberinfrastructure a key priority in supporting STEM scholarship as noted below (emphasis added):

NSF invests in powerful cyberinfrastructure that enhances the ability of researchers and educators to access and use scientific data and infrastructure. These assets include high-performance computing systems, large-scale data repositories, software suites, networks, and digital access to research equipment and instrumentation as well as education resources. [21]

Subsection 1.1 Cyberinfrastructure in Mathematics Research

One of the features of many areas of mathematics, including the applicant's area of general topology, is the expansive breadth of open problems that still technically require just pencil and paper. Nonetheless, as technology advances, we are seeing more uses for tools beyond the blackboard for making advancements in even abstract mathematics. One obvious example is the use of high-performance (or even conventional) computing to brute-force the exploration of finite mathematical spaces [4].

But perhaps underrated are the systems used by mathematicians for sharing and collaborating bleeding-edge advancements with the community of research. The mathematics written on paper or the chalkboard cannot travel far on their own; generally, mathematicians use software such as LaTeX to typeset proofs and formulas for distribution in PDF format. Then, rather than asking colleagues from across the country to physically visit their office computer, the researcher uses technologies such as email, a personal website, and/or a preprint server to post their work for dissemination. High-quality work must then be submitted to a journal to be vetted by peers, a process handled sometimes through email, and other times via non-trivial content management systems hosted in the cloud. These results are then published, but this content is now burned into inaccessible (in many senses of the word) PDFs locked behind paywalls. A typical strategy for finding recent developments in mathematics is to Google key words, but a search for “selection games”, an area of mathematical game theory studied by the applicant, quickly reveals the limitations of this approach.

One approach to overcoming this limitation is the development of databases of mathematical objects. As of writing, the Catalogue of Mathematical Datasets [6] enumerates \(83\) such tools. One of these is the \(\pi\)-Base (a.k.a. pi-Base) community database of topological counterexamples [12], created by software engineer James Dabbs. Clontz joined the project in 2017 to serve as its lead mathematical editor. In the beginning the database was treated similar to a wiki, with unreviewed contributions made by several mathematicians, students, and other unknown contributors. To address this, changes were made to the software to enforce a level of peer review, and Clontz was awarded with an internal Faculty Development Council grant to hire a student to audit and expand the content of the database.

The result of this support was successful, and today the pi-Base is commonly cited on mathematics discussion boards used by students and researchers alike [26] [27]. However, it also exposed critical inefficiencies in the user experience (UX) of both contributing to the database and reviewing contributions.

Another key element of cyberinfrastructure in mathematics research is the tooling used for creating both collaborative spaces and venues for dissemination, both virtually and in-person. Of course, recent events have exposed the utility of virtual collaborations, and tools such as Zoom and Sococo have done well to meet this need both for small meetings and large conferences. But organization of large conferences, whether in-person or otherwise, is not easily done without the use of technology. Historically, the Topology Atlas [23] met this need, not just in topology but in many other areas of mathematics as well, but this abstract and conference scheduling platform was ended in 2020 following twenty-five years in service. Its shuttering has left a void that creates inefficiencies for researchers, distracting from their core work.

Subsection 1.2 Cyberinfrastructure in Mathematics Instruction and RUME

Logistics are frequently a limiting factor in the adoption of evidence-based practices in instruction, particularly in undergraduate mathematics education [22]. Often, faculty are willing, if not eager, to change instruction in ways that benefit students, but do not have the resources to implement such change.

Likewise, the authors of [24] observed the limitations of educational software that technically works, but isn't designed for platforms that are readily in the hands of students and instructors. For example, while CalcPlot3D has always been free and open-source software, it was originally limited in its reach due to being written in Java. By rewriting the application in (the similarly-named but unrelated programming language) Javascript, students and instructors were no longer required to be at a computer station with a Java runtime installed, but could instead utilize the program from any device with a web browser.

Partially supported by NSF DUE 2011807, Clontz has developed two software applications to support mathematics instruction and Team-Based Inquiry Learning (TBIL), a flavor of Team-Based Learning that was the focus of the University's most recent Quality Enhancement Project. The first is the CheckIt Platform [7], allowing instructors to write minimal code to generate randomized mathematics exercises that can be automatically exported not only to LaTeX/PDF for printing, but also published to the web as practice exercises, and to LMSes including Canvas, D2L, and Moodle. The second is Scratchee [9], a virtualization of the Instant Feedback Assessment Technique (IF-AT) [11] integral to TBIL. In addition, Clontz serves as collaborator on the PreTeXt project [5], developing a user-friendly platform for authoring the PreTeXt markup language that produces both PDF and accessible HTML documents (including textbooks and research manuscripts) from the same source, including this proposal [10]. Furthermore, docuemnts authored in PreTeXt can also be automatically published as Braille [2], an uncommon feature for commercial textbooks, much less free Open Educational Resources (OER), providing access to mathematics often out of reach to blind students.

In addition to supporting mathematics instruction directly, the CheckIt Platform is also being used to support Research in Undergraduate Mathematics Education (RUME). Exercises on the platform are designed to assess particular learning outcomes; in order to measure the effectiveness of instruction as part of DUE 2011807, CheckIt-generated assessments will be used at several campuses across the country. This allows instructors to administer as many versions of each exercise as needed for logistical purposes, while still ensuring that each version of the exercise measures exactly the same learning outcome.

Finally, the development of Checkit itself raises several interesting RUME questions that Clontz aims to explore in future collaborations with education researchers. For example, the process of authoring an exercise that aims to assess a particular learning outcome is much simpler than authoring an exercise template to be seeded with randomized data. In mathematics this is sometimes achievable by simply randomizing numerical elements of the exercise; for example, the template {{a}}x + {{b}}y = {{c}} expressing the standard equation of a line might be randomized to \(4x+5y=-2\text{,}\) \(-3x+y=0\text{,}\) and so on. However, what constraints are appropriate for this randomization to ensure it still serves as a valid assessment of a given outcome? Certainly, it seems unlikely that examples such as \(531284127x-4512874312y=341893123\) are necessary. But should the line occassionally be expressed in point-slope form \(y=-mx+b\) instead? And how can the stem of a question be randomized to ensure that students are synthesizing complete instructions, rather than only memorizing patterns developed from seeing solutions to similarly-generated exercises?

Subsection 1.3 Free and Open-Source Software (FOSS)

The focus of this project is to produce Free and Open-Source Software that will benefit scholars, instructors, and students of mathematics. Frequently, the NSF requires that software products it funds be FOSS. It's worth clarifying what is meant by this.

Open-source software is most easily defined. All code written as part of this project will be made available to the public via Clontz's GitHub [8] (or other publicly available repositories as appropriate). This means that anyone will be able to obtain a copy of any software developed during this sabbatical, use this software to benefit their research or instruction, and contribute corrections or improvements to the codebase to benefit others.

The word free in FOSS does the heaviest lifting. Primarily, it means that this software will be explicitly licensed for free use and adaptation by anyone who wishes, removing legal barriers that might prevent its adoption by other researchers or instructors.

But for the purposes of this project, “free” also implies that the software, whenever possible, will be developed mindfully to avoid dependencies on non-free infrastructures. For example, technically the Canvas Learning Management System is FOSS software [15]; however, that does not mean that it can truly be adopted without cost. Maintainance of a learning management system server incurs both technology costs and personhour costs, which is why many campuses, including the University, simply pay Instructure to provide the service rather than utilize its FOSS directly. Technical debt can never be completely avoided; however, by making smart design decisions in the development of software packages that aren't intended to turn a profit, this debt can be kept minimal. To this end, most of the software produced will either be written in HTML/Javascript, which can be freely hosted and run in any modern web browser, or will produce static such HTML/JS products for dissemination.