In the current age of computing, the value of software development and computer programming is without question. After a periodic lull following the dot-com era, the occupation outlook for programmers continues to be extremely strong. Software developers and engineers continue to be regarded as some of the best and most in-demand jobs by Glassdoor Best Jobs in America, LinkedIn Most in demand jobs, and US News 100 Best Jobs. The US Bureau of Labor Statistics projects that the job outlook for Software Developers, Quality Assurance Analysts, and Testers will increase 22% from 2020-2030, which much higher than the overall average. With the high demand for developers, companies have found it difficult to implement the changes they need to not only maintain their existing systems, but also to keep pace with competitors as the demand for digitization increases post-COVID.
It is unrealistic to expect that business and industry experts will spend additional time to learn the tools and processes necessary to develop software to implement the calculations and logic specific to their subject domain. This is where no-code platforms have come to the forefront – to help smaller teams or “citizen developers” to implement solutions that may be too costly or complex to be done as part of a large-scale IT project. However, the most complex part of no-code is not necessarily the user interface, the orchestration of different activities, or even the storage of data. The most challenging task is to be able to define the logic that drives the application in a way that is easy for users to maintain.
With Coherent Spark, we enable users to take Microsoft Excel spreadsheet files and to convert them into customizable and easily consumable APIs. Why the focus on Microsoft Excel? The adoption of Excel has been extremely rapid given its initial introduction to Windows with Excel 2.0 in 1987. In 1996, Microsoft reported that Excel was the world’s most popular spreadsheet program with more than 30 million users. In 2016 Microsoft reported that “More than 1.2 billion people use Microsoft Office in 140 countries and 107 languages around the world.” With Excel’s capability to define calculations and manage data, Excel is an accessible way to manage logic and calculations. Given its popularity and familiarity with so many users, Excel is an ideal candidate to be the starting point behind solutions that scale from small citizen developed no-code solutions to complex transactions for the world’s largest corporations.
This paper is separated into 2 parts. In Part I we establish why Excel is a programming language from a Computer Science and Software Engineering perspective. In Part II we describe how Coherent Spark can help users overcome End User Computing risk and empowers them to apply Continuous Integration / Continuous Development processes in an Excel context.
Part I: Excel is a programming language
Programming language concepts
“A programming language is a formal language comprising a set of strings that produce various kinds of machine code output. Programming languages are one kind of computer language, and are used in computer programming to implement algorithms.” (Wikipedia)
Programming languages have been around even before computers and have continued to evolve with advances in Computer Science. Although some people may only consider something a programming language if it is complex and requires coding, Excel itself is a programming language (even when excluding Visual Basic for Applications and the new Office Script). Felienne Hermans, Computer Science professor at Leiden University has conducted several presentations about Excel and why she believes it is important for traditional programmers to learn. In her talk “Spreadsheets are Code” she summarized why she believes Excel workbooks form a programming language:
- They are used for similar problems when compared to traditional programming languages.
- Excel formulas are Turing complete. This means that Excel formulas can “approximately simulate the computational aspects of any other real-world general-purpose computer or computer language” (Wikipedia). She represented the operations of a computer using Excel functions.
- They also suffer from the same problems – challenging documentation, potentially a broad set of users for workbooks, and having a long shelf life.
From a Coherent perspective, Excel is a programming language simply because it is a way for people define calculations and logic. If Excel workbooks are a programming language, then Excel users are programmers as well! With this mindset we can take a broader view of Excel and leverage concepts from Computer Science and Software Engineering to understand how spreadsheets can be better used and developed.
It is therefore useful to apply some programming classification to Excel to understand how it fits within the spectrum of programming languages:
Classification | Description |
---|---|
High-level (vs low-level) | The need to interact with how a computer operates and the use of machine code understandable by the microprocessor and memory has been abstracted from the user. Generally high-level programming languages are easier to write, debug and trace while sacrificing some potential performance. Performance concerns are alleviated with improvements in compiler developments and general advances in computing hardware. |
Declarative (vs Imperative) | In older iterations of Excel, it would be considered more as a declarative programming language as it focuses on the outcome of a calculation without so much consideration of how the result is delivered and the control flow. With the recent introduction of LET and LAMBDA functions, an argument could be made that Excel contains some traces of imperative programming concepts. More will be discussed in the section Recent Excel enhancements. |
Interpreted (vs compiled) | Excel workbooks are run by the Excel application which acts as an interpreter. In more general terms the interpreter then converts the program into machine code for the computer. Compare this to compiled programming languages which require conversion to machine code before execution. Compiled code technically performs faster however takes more time to debug and evaluate execution. |
General purpose (vs domain specific) | We consider Excel to be a general-purpose programming language which can be used for many different applications. Examples of domain specific languages would again be HTML and SQL which focus on specific applications. |
It is also an interesting comparison to understand how Python, one of the most popular programming languages used today, is classified as a programming language. Python is also a high-level interpreted general purpose programming language. Python is not popular because of its underlying speed but more for its relative ease of use, general purpose nature, and broad applications. Despite not being the fastest or the most efficient, it is used for extremely complex problems including acting as the primary interface for Big Data and Deep Learning applications.
Excel language evaluation1
In “Concepts of Programming Languages Eleventh Edition” by Robert W. Sedra, the author defines 3 core criteria readability, writability and reliability and 11 characteristics to evaluate programming languages (Sedra, 1.3 Language Evaluation Criteria). For defining calculations and logic, we assert that Excel is readable, writable, and a reliable programming language. The evaluation was also made with consideration of the programming experience with the Excel application as an Integrated Development Environment (IDE).
Readability
One of the advantages of Excel to more traditional code is that the “coding area” is 3-dimensional. Formulas can be arranged in 2-dimensions within a single worksheet or across multiple worksheets or even workbooks as an additional dimension. This makes it possible to layout the meaning and significance of the code in a manner that is easier for others to read but also intuitive to flip between different sections of code more easily or introducing visual elements. There are also many ways of commenting code in Excel, either directly in the worksheet or via notes and comments. This contrasts with the traditional program which is written as a top-to-bottom document and across different files. This is not to imply all Excel files are easy to read, these features can also be misused, like how regular code can be poorly written and documented. Finally Excel also includes interactive tools to make it easy to read, trace, and evaluate functions and programs.
Excel meets all the characteristics of being a readable programming language:
Characteristic | Description |
---|---|
Simplicity |
“A language with a large number of basic constructs is more difficult to learn than one with a smaller number” (Sedra, 1.3.1.1 Overall Simplicity)
Although Excel has over 480 functions, there are only 11 logical functions that are used to control the execution of program flow. User written formulas follow basic algebraic principles and the in-built Excel functions are called using a similar structure. |
Orthogonality |
“Relatively small set of primitive constructs can be combined in a relatively small number of ways to build the control and data structures of the language.” (Sedra, 1.3.1.2 Orthogonality)
Unless specified for particular functions, Excel formulas and functions can be applied against different constant values entered as parameters, individual cell references, range references (multiple cell references), and named ranges. This ability to mix and match makes the language more easily read and written. A good example of this capability is the SUM function accepts both individual numbers, individual cell references, range of cell references. This is superior to having a SUM function that would only work for each specific case. |
Data types |
“The presence of adequate facilities for defining data types and data structures in a language is another significant aid to readability.” (Sedra, 1.3.1.3 Data Types)
While Excel is not a strictly typed language, it does have different “Number Categories” present in the properties of a cell. These are important as certain functions and operators may only match to certain “Number Categories”. |
Syntax design |
“Special words. Program appearance and thus program readability are
strongly influenced by the forms of a language’s special words… Form and meaning. Designing statements so that their appearance at least partially indicates their purpose is an obvious aid to readability. Semantics, or meaning, should follow directly from syntax, or form.” (Sedra, 1.3.1.4 Syntax Design)
The Excel syntax follows algebraic principles using mathematical operators +-*/ and brackets (). Built-in functions usually have names that suggest what they do and are reserved words that are represented in CAPITAL LETTERS. Parameters of a function are always defined with brackets () following the named function. |
Writability
The important characteristics noted in readability also apply to the writability of a programming language. In addition to the above characteristics, a good programming language with writable characteristics would include:
Characteristic | Description |
---|---|
Support for abstraction |
“The abstraction principle can be generalized as the "don't repeat yourself" (DRY) principle, which recommends avoiding the duplication of information in general, and also avoiding the duplication of human effort involved in the software development process.” (Wikipedia)
This is an area in which Excel historically has been weak as formulas are repeated many times especially for applications related to data processing. This is significantly improved with the introduction of the LAMBDA function described in Recent Excel enhancements. |
Expressivity |
“Language has relatively convenient, rather than cumbersome, ways of specifying computations” (1.3.3.2 Expressivity)
Although Excel does not have necessarily many short forms in expressing formulas, the broad function set does allow for relatively compact methods of expressing a computation. Furthermore, the programming canvas of a worksheet enables additional capabilities not present in typical code. |
Reliability
The important characteristics noted in readability and writability also apply to the reliability of a programming language. Excel has been around for more than 30 years and provides backwards compatibility to deprecated functions more than 20 years old.
Characteristic | Description |
---|---|
Type checking |
“Type checking is simply testing for type errors in a given program, either by the compiler or during program execution.” (Sedra, 1.3.3.1 Type Checking)
Although as noted above that Excel is not a strictly typed language, type checking is still important as we need to ensure that formulas and functions are being applied to the correct type. For example, the SUM function cannot be applied to text. Excel resolves a traceable #VALUE! error when this occurs. |
Exception handling |
“The ability of a program to intercept run-time errors (as well as other unusual conditions detectable by the program), take corrective measures, and then continue is an obvious aid to reliability. (Sedra, 1.3.3.2 Exception Handling)
Excel reports formula errors and offers ability to calculate around or ignore these errors as defined by the user, e.g., using IFERROR, SUBTOTAL, AGGREGATE functions. |
Restricted aliasing |
“Aliasing is having two or more distinct names in a program that can be used to access the same memory cell.” (Sedra, 1.3.3.3 Aliasing)
In Excel it is possible to define overlapping Named Ranges that refer to the same range of values. The consequence of doing so is relatively transparent given the reference using Named Ranges is done by values rather than by reference. |
Software design perspective
If we consider Excel as a good programming language, it should also support good software development practices. In “Code Complete Second Edition” by Steve McConnell, he describes the “Desirable Characteristics of a Design” (McConnell, Chapter 5). A few key points are highlighted with respect to the how Excel is compatible with effective software design.
Characteristic | Description |
---|---|
Minimal complexity |
“Avoid making “clever” designs. Clever designs are usually hard to understand.”
The framework of functions and well-understood approaches in Excel makes the construction and understanding of Excel files less complex. Worksheets can also be laid out such that even “clever” designs can be explained in a logical manner than if they were laid out in code, especially for users who are not familiar with more traditional programming. |
Ease of maintenance |
“Ease of maintenance means designing for the maintenance programmer. Continually imagine the questions a maintenance programmer would ask about the code you’re writing.”
In a well-designed Excel file, assumptions and data can be clearly delineated from the formulas that depend on them while also seeing the inter-related changes. This makes it easy to manage changes to the different components. Because they can be contained together it is possible to trace calculations from end to end using Excel’s Formula Auditing tools. Even for the difficult spreadsheets, they are easier to trace than more complex software structures. |
Extensibility |
“Extensibility means that you can enhance a system without causing violence to the underlying structure.”
This is a corollary to the Ease of maintenance. It is easy to add on additional components to an Excel file, especially if the data and formulas are well defined. |
Portability |
“Portability means designing the system so that you can easily move it to another environment.”
Excel files are easily shared between computers and environments given the key dependency is only to have Excel. Microsoft also generally has maintained exceptional backwards compatibility for Excel. |
Standard techniques |
“Try to give the whole system a familiar feeling by using standardized, common approaches.”
Microsoft maintains the standard function library for Excel techniques and practices for Excel are well established and communicable to other users. |
Recent Excel enhancements
After the Office 2007 release which included the introduction of the XML file format and the ribbon user interface, the Excel application did not see too many enhancements over the next decade. The emphasis was placed more around Microsoft’s Power BI suite that worked around the capabilities of Excel. However this has changed in recent years with the introduction of some new elements that enhance the programmability of spreadsheets while also making them easier to maintain and extend.
LET and LAMBDA
The LET and LAMBDA functions offer a completely new way to write formulas in a more readable and reusable way by including Anonymous functions. In Computer Science, this refers to functions that do not need to be assigned an identifier. This means that they can be applied in a temporary manner.
The LET function makes it easier to define a formula by making it easy to define local variables within a formula. An example of a piecewise function for the sample spreadsheet, there are a number of terms that are repeated. Without using named ranges, formulas are not very natural to trace using cell addresses. With the LET function, new intermediate terms can be defined and reused, and cell addresses can be renamed to variable names. This also removes the need to link to dependent cells many times to facilitate tracing of formulas.
Piecewise function
Sample Excel worksheet
Formula without LET | With LET function |
---|---|
= IFS(
(B1*B2)^3>= 10, (B1*B2)^2+B1^2, (B1*B2)^3>=-10, (B1*B2)^3+B2^2, (B1*B2)^3) |
= LET(X, B1, Y, B2, XY, B1*B2,
IFS( XY^3>= 10, XY^2+X^2, XY^3>=-10, XY^3+Y^2, XY^3 )) |
The LET formula is useful for individual formulas, but if this formula needs to be repeated frequently, having the LET function does not making it easier to reuse this function. The LAMBDA function allows users to create reusable user-defined functions without the need for Visual Basic for Applications. An example of how the LAMBDA function could be used is shown below. This will empower much cleaner spreadsheets and the ability to centrally modify commonly used functions in a workbook.
Define a Named Range PIECEW_XY | Using PIECEW_XY in a formula |
---|---|
= LAMBDA(X, Y,
LET(XY, X*Y, IFS( XY^3>= 10, XY^2+X^2, XY^3>=-10, XY^3+Y^2, XY^3))) |
= PIECEW_XY(B1, B2) |
The LAMBDA function has been further extended by helper functions which can apply a LAMBDA function across a range of data. This eliminates the need to use workarounds that relied on the SEQUENCE function to perform some of these behaviours. Some other interesting applications of LAMBDA functions when combined with Dynamic Arrays include the ability to define PivotTable-like objects.
Show Changes
For Excel files that have been uploaded to OneDrive or SharePoint it is now possible to show the historical changes made to the file in Show Changes. It is not yet Git for Excel but this is a great step in improving auditability and collaboration in a workbook and addressing the issue of unaudited and untraceable changes.
Dynamic array functions
Dynamic array functions allow users to reduce the overhead needed to manage data that may change size depending on other data or inputs. This works along with the spilled range operator to make it easier to apply formulas dependent on a dynamic array. Below the Dynamic Array function SEQUENCE is used to create a list of 3 values and their square roots, with only 2 formulas defined by the user. Changing the value of ‘n’ in B3 can extend or shorten the square root table.
Data Types
Data Types allow the definition of object models which may have different properties. For example, the built-in Stocks Data Type includes properties such as price, previous close, 52 week high/low, etc.
Office Script
For users who require additional programming capabilities that work online in Office 365, Microsoft has developed Office Script. It has some overlapping capabilities with Visual Basic for Applications. Script Lab is an Add-in to help users learn and write Office Script.
Part II: Excel development with Spark
How Coherent Spark can help to overcome EUC risk
End User Computing (EUC) applications, such as Excel, are used in many organizations to develop new applications that are independent of traditional IT development processes. This relates to the concept of the Citizen Developer noted in the Overview. From Deloitte’s paper “End User Computing Solving The Problem”, they noted that: “User-developed and user-controlled applications, by definition, are not subject to the same development, monitoring, and reporting rigor and control as traditional applications. And often, management lacks visibility into exactly how pervasive the use of EUCs has become throughout the enterprise.”
With our cloud-based logic engine, Coherent Spark, we take Excel files and convert them into easily consumable APIs. This enables users to translate their business logic into something that can be more formally consumed in an application. To address EUC risk, we developed several features within Coherent Spark to help make Excel files more enterprise-level.
- Logging and hashing – All actions on Coherent Spark are logged and given a hashed identifier on the system. This enables the ability to trace actions that are made to the API services created.
- Versioning and effective dating – API services created on Coherent Spark are versioned with all historical versions accessible on the system. Furthermore, versions can have effective dates applied to them to control over which dates and times they will respond.
- Testing Center – Coherent Spark has a comprehensive framework for applying a model across every record of a dataset. This allows users to create test cases and easily compare them against the values they see in Excel.
- Data Dictionaries – The ability to define a consistent API across different Excel models helps to ensure there is consistency between work that is done across teams.
Excel development in consideration of Coherent Spark
When an Excel file is being used to developing an application that leverages Coherent Spark for calculation and logic, the way the file can be setup can be greatly simplified.
When Excel files are used to create EUC applications, a lot of additional work is required to build a user interface in the file. This may involve worksheets that look like user interfaces or the use of ActiveX controls and Visual Basic for Applications. Sheets also require additional protection to prevent users from interacting with sections of the spreadsheet they are not supposed to or to prevent logic from being inadvertently distributed to users. When an application is developed in conjunction with Coherent Spark, there is no longer the need to build the user interface elements into Coherent Spark as the UI would be built separately, either through Coherent’s Form Builder technology or any other no-code applications that can be integrated with Coherent Spark.
A key capability of Coherent Spark’s Testing Center is that it allows iteration of a model against a dataset. This capability can extend to simulations or scoring of models that may be difficult to define even using code in Data Science programming languages such as Python and R. This helps to reduce the need to create large spreadsheets that are capable of ingesting large datasets. The Excel calculation and logic can be streamlined to focus on how a single calculation is performed. This improves readability and auditability of the Excel files.
Where a calculation is relatively complex and is separated into multiple Excel files for easier management across teams, Coherent Spark also enables models to be chained such that the outputs from one model flow into a subsequent model. This enables some encapsulation capabilities into Coherent Spark in a more controlled manner. Therefore, models that may have been previously connected via External Links (which are inherently brittle) can be connected in a better way with Chaining.
Coherent Spark as CI/CD for Excel
“Continuous integration is a DevOps software development practice where developers regularly merge their code changes into a central repository, after which automated builds and tests are run.” (AWS)
“Continuous delivery is a software development practice where code changes are automatically prepared for a release to production.” (AWS)
Continuous Integrations/Continuous Delivery (CI/CD) is a concept that may be new to many, but very familiar to those who work in modern Development Operations (DevOps) teams. CI/CD practices have been implemented in many organizations to improve the quality of software delivery.
In a team that practices Continuous Integration, the changes the development teams make are frequently included in build systems and tested. After a developer has completed their code, they run unit tests of their changes on their personal environment. If these succeed, the code is then checked into a central build server for further unit testing. In practice this may be done more in daily cycles rather than in a truly continuous manner, however the key point is to maintain a fast and frequent process of including new code changes for testing. This aids in identifying issues earlier, especially those arising from conflicting code between developers. The opposite of Continuous Integration would be a development process that waits over a long period of time to accumulate different code changes before creating a testing build. There are risks that code built out over a long period of time will have more issues, like potential conflicts between code, and will therefore take much longer to test and resolve issues.
In Continuous Delivery, finalized code from the Continuous Integration process can be deployed in an automated fashion to higher level environments e.g., Staging, User Acceptance, Production. This facilitates testing from a broader view of users and the ability to deploy changes big or small to customers much faster than if they were conducted using more manual processes.
At Coherent we believe that Coherent Spark is the CI/CD for Excel. When users are satisfied with their Excel files (similar to the personal unit testing), the model can be uploaded to Coherent Spark and immediately converted into an API. This enables further unit testing as well as bulk testing via the Testing Center. The converted API is immediately documented and ready to be consumed. Any further iterations of models are also instantly converted into API updates that can be used by downstream applications without any interruption. By facilitating a rapid change and delivery process, this enables users to share their calculations and logic in a more consistent and broader way that also promotes rapid testing and deployment of new calculations and enhancements.
Conclusions
Excel deserves more attention as a programming language and one that can help users to get things done. Its properties as a programming language can be classified and analyzed using standard Computer Science principles. Not only that but Excel supports good software design principles to enable users to develop robust software. Microsoft in recent years have also invested resources into advancing Excel to improve its programmability and capabilities to keep evolving it with the needs of users and enabling new use cases and functions with Excel.
With Coherent Spark and our capabilities to convert Excel calculations and logic into APIs, we open the doors to making Excel a viable platform for enterprise-grade development. Coherent Spark helps to address end-using computing risk by allowing for collaboration within a framework that versions, hashes, and logs actions on the platform. We make it easy for Excel users to realize their development aspirations much faster and even easier than other no-code platforms and extending their capabilities to conduct continuous integration and continuous development with Excel.
Simon Tam
Coherent Spark Product Director
Simon is the Product Manager for Spark at Coherent, leading a team to develop the platform’s features and capabilities. He is a qualified actuary and with 15+ years’ experience in the Property & Casualty / General Insurance space with an exclusive focus on pricing, data and analytics. Having held roles in Canada, UK for a big 4 consultancy across banking and insurance, and a multi-national insurer based in Hong Kong, Simon has significant experience in understanding the challenges that analytical and business users face in executing and deploying calculations and logic.
-
This author does not have any more posts.