Skip to main content

Section 1 Background

The term cyberinfrastructure traces its roots back to Presidential Decision Directive NSC-63 [27], and is commonly used today to refer to “computational systems, data and information management, advanced instruments, visualization environments, and people, all linked together by software and advanced networks to improve scholarly productivity and enable knowledge breakthroughs and discoveries not otherwise possible” [14]. What cyberinfrastructure “looks like” varies greatly between disciplines, but a constant quality of it is the utilization of modern technologies that make the jobs of scholars more efficient, thereby producing more and better advancements in STEM. As such, the National Science Foundation [20] and other federal agencies have made the development of cyberinfrastructure a key priority in supporting STEM scholarship.

The NSF has doubled-down on this commitment with the establishment of its “Technology, Innovation and Partnerships” directorate in Spring 2022, the first new NSF directorate in over 30 years [24]. Included in this directorate is the Pathways for Open Source Ecosystems solicitation [22] whose first call for proposals was due in May 2022.

Subsection 1.1 Cyberinfrastructure in Mathematics Research

One of the features of many areas of mathematics, including the applicant's area of general topology, is the expansive breadth of open problems that still technically require just pencil and paper. Nonetheless, as technology advances, we are seeing more uses for tools beyond the blackboard for making advancements in even abstract mathematics. One obvious example is the use of high-performance (or even conventional) computing to brute-force the exploration of finite mathematical spaces [4].

But perhaps underrated are the systems used by mathematicians for sharing and collaborating bleeding-edge advancements with the community of research. The mathematics written on paper or the chalkboard cannot travel far on their own; generally, mathematicians use software such as LaTeX to typeset proofs and formulas for distribution in PDF format. Then, rather than asking colleagues from across the country to physically visit their office computer, the researcher uses technologies such as email, a personal website, and/or a preprint server to post their work for dissemination. High-quality work must then be submitted to a journal to be vetted by peers, a process handled sometimes through email, and other times via non-trivial content management systems hosted in the cloud. These results are then published, but this content is now burned into inaccessible (in many senses of the word) PDFs locked behind paywalls. A typical strategy for finding recent developments in mathematics is to Google key words, but a search for “selection games”, an area of mathematical game theory studied by the applicant, quickly reveals the limitations of this approach.

One approach to overcoming this limitation is the development of databases of mathematical objects. As of writing, the Catalogue of Mathematical Datasets [6] enumerates \(84\) such tools. One of these is the \(\pi\)-Base (a.k.a. pi-Base) community database of topological counterexamples [13], created by software engineer James Dabbs. Clontz joined the project in 2017 to serve as its lead mathematical editor. In the beginning the database was treated similar to a wiki, with unreviewed contributions made by several mathematicians, students, and other unknown contributors. To address this, changes were made to the software to enforce a level of peer review, and Clontz was awarded with an internal Faculty Development Council grant to hire a student to audit and expand the content of the database.

The result of this support was successful, and today the pi-Base is commonly cited on mathematics discussion boards used by students and researchers alike [28] [29]. However, it also exposed critical inefficiencies in the user experience (UX) of both contributing to the database and reviewing contributions.

In [7], Buzzard points to the \(\pi\)-Base as an important example of how semantic search helps researchers more quickly and thorougly query the literature than standard search engines. Furthermore, he points to the increasing use of in computer-verified proof techniques as another emerging element of cyberinfrastructure in mathematics research. In particular, the Lean Prover [19] from Microsoft Research and the mathlib [17] library of mathematics written in Lean are becoming the de facto standard for formalization of mathematical results that can be verified by computer.

Subsection 1.2 Cyberinfrastructure in STEM Higher Education

Logistics are frequently a limiting factor in the adoption of evidence-based practices in instruction, particularly in undergraduate mathematics education [25]. Often, faculty are willing, if not eager, to change instruction in ways that benefit students, but do not have the resources to implement such change.

Likewise, the authors of [26] observed the limitations of educational software that technically works, but isn't designed for platforms that are readily in the hands of students and instructors. For example, while CalcPlot3D has always been free and open-source software, it was originally limited in its reach due to being written in Java. By rewriting the application in (the similarly-named but unrelated programming language) Javascript, students and instructors were no longer required to be at a computer station with a Java runtime installed, but could instead utilize the program from any device with a web browser.

Partially supported by NSF DUE 2011807, Clontz has developed two software applications to support mathematics instruction and Team-Based Inquiry Learning (TBIL), a flavor of Team-Based Learning that was the focus of the University's most recent Quality Enhancement Project. The first is the CheckIt Platform [8], allowing instructors to write minimal code to generate randomized mathematics exercises that can be automatically exported not only to LaTeX/PDF for printing, but also published to the web as practice exercises, and to LMSes including Canvas, D2L, and Moodle. The second is Scratchee [10], a virtualization of the Instant Feedback Assessment Technique (IF-AT) [12] integral to TBIL. In addition, Clontz serves as collaborator on the PreTeXt project [5], developing a user-friendly platform for authoring the PreTeXt markup language that produces both PDF and accessible HTML documents (including textbooks and research manuscripts) from the same source, including this proposal [11]. Furthermore, docuemnts authored in PreTeXt can also be automatically published as Braille [2], an uncommon feature for commercial textbooks, much less free Open Educational Resources (OER), providing access to mathematics often out of reach to blind students.

In addition to supporting mathematics instruction directly, the CheckIt Platform is also being used to support Research in Undergraduate Mathematics Education (RUME). Exercises on the platform are designed to assess particular learning outcomes; in order to measure the effectiveness of instruction as part of DUE 2011807, CheckIt-generated assessments will be used at several campuses across the country. This allows instructors to administer as many versions of each exercise as needed for logistical purposes, while still ensuring that each version of the exercise measures exactly the same learning outcome.

Subsection 1.3 Free and Open-Source Software (FOSS)

The focus of this project is to produce Free and Open-Source Software that will benefit scholars, instructors, and students of mathematics. Frequently, the NSF requires that software products it funds be FOSS. It's worth clarifying what is meant by this.

Open-source software is most easily defined. All code written as part of this project will be made available to the public via Clontz's GitHub [9] (or other publicly available repositories as appropriate). This means that anyone will be able to obtain a copy of any software developed during this sabbatical, use this software to benefit their research or instruction, and contribute corrections or improvements to the codebase to benefit others.

The word free in FOSS does the heaviest lifting. Primarily, it means that this software will be explicitly licensed for free use and adaptation by anyone who wishes, removing legal barriers that might prevent its adoption by other researchers or instructors.

But for the purposes of this project, “free” also implies that the software, whenever possible, will be developed mindfully to avoid dependencies on non-free infrastructures. For example, technically the Canvas Learning Management System is FOSS software [16]; however, that does not mean that it can truly be adopted without cost. Maintainance of a learning management system server incurs both technology costs and personhour costs, which is why many campuses, including the University, simply pay Instructure to provide the service rather than utilize its FOSS directly. Technical debt can never be completely avoided; however, by making smart design decisions in the development of software packages that aren't intended to turn a profit, this debt can be kept minimal. To this end, most of the software produced will either be written in HTML/Javascript, which can be freely hosted and run in any modern web browser, or will produce static such HTML/JS products for dissemination.

Finally, NSF's recognition of the critical role FOSS products play in the cyberinfrastructure of STEM research is witnessed by solicitations such as its new Pathways for Open-Source Ecosystems solicitation mentioned earlier. Clontz's $266K one-year Phase I proposal as PI to establish an Open-Source Ecosystem for the PreTeXt community has been recommended for funding by an NSF program officer as part of the inaugural round of awards, and will lead to a $1.5M two-year Phase II proposal to be submitted in Fall 2023.