The Value of Making Your Data Sources Reusable across Test Automation Tools

[article]
Summary:
Many automation tools have a mechanism for storing data used in their test scripts. Typically, the specifics of this mechanism is different across tools, making it difficult to use this data outside the tool itself. Using an external, reusable data source allows organizations to avoid the cost of migrating or duplicating existing data, thereby future-proofing their frameworks.

For most traditional automation approaches, the tools and scripts built on them require data in order to interact with and check conditions on the system under test. This data typically falls into one of two categories: configuration data or oracle data.

Configuration data is just that: data that is used to configure the tool, the scripts, and the system under test (SUT), such as street addresses, credit card information, and user IDs. In the setup of more complex SUTs, the configuration data is typically also more complex and more voluminous. For example, a hospital management system could require many different patient types with unique names, identification numbers, diagnosis codes, lengths of stay, and other patient demographic information.

In testing, an oracle can be broadly defined as a way to determine whether a test has passed or failed. Oracles play this key role in automation as well, where the data in the oracle indicates whether a particular assertion or a script as a whole is considered to have failed. Though there are many types of oracles, the type that is of interest here is an oracle where we already have the expected result for specific inputs; based on Douglas Hoffman’s oracle descriptions in his article “Heuristic Test Oracles,” we’ll call this a sampling oracle.

As an example, let’s use needing to test the behavior of a sum function. The function takes two integers as input and then returns the sum of those integers as a result. Now, consider the following table:

Case

Input 1

Input 2

Expected Result

1

0

0

0

2

1

0

1

3

0

1

1

4

1

1

2

In the table, each pair of inputs has an expected result. If we are automating the checking of the previously described sum function, the tool or script could iterate through the input values, apply the sum function to the pair of inputs, and compare the actual result of the function call with the corresponding expected result. If any of the cases produces a result that does not match the expected result, that case is reported as a failure. In that context, the table above is being used as a sampling oracle.

Clearly, this is a simplistic and incomplete example. Most of the products and features we test would require far larger and more complex sampling oracles. As with configuration data, more complex applications tend to have more complex sampling oracles.

Reusing Data: Save Time, Cost, and Trouble

Many automation tools have a mechanism for storing data that will be used in their test scripts; typically, the specifics of this mechanism is different across tools, making each a proprietary mechanism. These proprietary mechanisms are convenient ways to store data that need to be used in test scripts. In addition, these mechanisms are frequently used to store both configuration data and data to be used as oracles. The challenge, however, is that it can be difficult to use this data outside the tool itself. Even if there is a mechanism to import and export this data, converting that data into a format that’s appropriate for use in another tool is the user’s responsibility. The effort for this conversion can be significant.

User Comments

4 comments
Bill Roske's picture

Nice article, Paul!  Just one additional thought.  I tend to stay away from "local" files or packages, like .CSV or Excel.  When running tests in parallel, Excel is not very resource friendly. You get a new instance of Excel for each test.  And it can be a challenge to deal with collisions when opening shared files.  In addition, when suites scale and you start wanting to run on multiple machines (VMs?), you have to deploy those files as well, AND you may not have licenses for Excel on those other machines, either.   When data is being shared between tests and frameworks, the most central place would likely be a database architected specifically for that data.

January 23, 2017 - 1:20pm
Paul Grizzaffi's picture

Thanks Bill!

I agree with your points on Excel, but I differ on CSV. I've seen teams do managable things with CSV files and avoid the need for a database. Typically they handle the local vs. non-local aspect by storing the CSV files in the repository with the test scripts so they are version controled and deployable.

January 23, 2017 - 2:12pm
Mark Bentsen's picture

Great article Paul. In reusing data, have you ever thought of it as a canary in the coal mine that lets you understand that expertise of the tester. Think of three scenarios. Someone who uses data that touches the boundaries of the logic or in equivalence partitions is likely more technical and understands what exactly is being decisioned in the code. A tester who uses data that is very "real world", likely understands the end user and has a solid customer perspective. Most importantly, testers who are just using varying non-realistic numbers signal they are looking at a black box and do not have expertise on what the customer needs and expects from the technology.

January 24, 2017 - 3:50pm
Paul Grizzaffi's picture

Thanks, Mark!

I never consciously through about assessing a tester's expertise based on how they approach test data. It is an interesting thought, though. I've always kind of taken a holistic view of a tester's (or automator's) skill.

January 24, 2017 - 4:49pm

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.