top of page

What is Software Testing?

Software testing is the practice of:

  • validating goodness and identifying badness

  • in software product code and features

  • for the purpose of enforcing high quality standards

There are 3 fundamental categories of software testing:

  1. Functional: Does the software work correctly?

  2. Performance: Does the software work within desired system metrics?

  3. Experimental: Does the software work with improved feature metrics after a change?

Functional Testing

Functional testing determines if software features work correctly. They yield a deterministic pass/fail result.

“Correctness” means mutually-agreed goodness. It may be determined by:

  • requirements

  • specifications

  • test cases and plans

  • domain knowledge

  • user expectations

  • common sense

Modes and Methods

There are two modes of functional testing:

  1. Scripted: Test cases are written, reviewed, and then run according to plan.

  2. Exploratory: Experts interact with product features without a script in attempts to uncover issues.

There are also two methods for running functional tests:

  1. Automated: Test software runs tests automatically without manual intervention.

  2. Best at being defensive to protect software against regressions.

  3. A good fit for scripted tests.

  4. Manual: A human tester exercises system behavior through direct interaction.

  5. Best at being offensive to find bugs in new features.

  6. A good fit for exploratory tests and for scripted tests that are difficult to automate.

The methods and modes are complementary: one does not supplant another.

Levels and Layers

There are two access levels for functional testing:

  1. White-box: Tests interact directly with product code, thus covering code.

  2. Black-box: Tests interact with a live instance of the built product, thus covering features.

There are also three layers for functional testing:

  1. Unit: Very short white-box tests for individual “units” (functions, methods, or classes) of code.

  2. Integration: Black-box tests that cover where two components come together (often service layer).

  3. End-to-End: Lengthier black-box tests that cover an execution path through a system (often Web UI layer).

These three layers form the Testing Pyramid.


The “Testing Pyramid” is an industry-standard guideline for functional test case development. Love it or hate it, the Pyramid has endured since the mid-2000’s because it continues to be practical. So, what is it, and how can it help us write better tests?


The Testing Pyramid has three classic layers:

  • Unit tests are at the bottom. Unit tests directly interact with product code, meaning they are “white box.” Typically, they exercise functions, methods, and classes. Unit tests should be short, sweet, and focused on one thing/variation. They should not have any external dependencies – mocks/monkey-patching should be used instead.

  • Integration tests are in the middle. Integration tests cover the point where two different things meet. They should be “black box” in that they interact with live instances of the product under test, not code. Service call tests (REST, SOAP, etc.) are examples of integration tests.

  • End-to-end tests are at the top. End-to-end tests cover a path through a system. They could arguably be defined as a multi-step integration test, and they should also be “black box.” Typically, they interact with the product like a real user. Web UI tests are examples of integration tests because they need the full stack beneath them.

All layers are functional tests because they verify that the product works correctly.


The Testing Pyramid is triangular for a reason: there should be more tests at the bottom and fewer tests at the top. Why?

  1. Distance from code. Ideally, tests should catch bugs as close to the root cause as possible. Unit tests are the first line of defense. Simple issues like formatting errors, calculation blunders, and null pointers are easy to identify with unit tests but much harder to identify with integration and end-to-end tests.

  2. Execution time. Unit tests are very quick, but end-to-end tests are very slow. Consider the Rule of 1’s for Web apps: a unit test takes ~1 millisecond, a service test takes ~1 second, and a Web UI test takes ~1 minute. If test suites have hundreds to thousands of tests at the upper layers of the Testing Pyramid, then they could take hours to run. An hours-long turnaround time is unacceptable for continuous integration.

  3. Development cost. Tests near the top of the Testing Pyramid are more challenging to write than ones near the bottom because they cover more stuff. They’re longer. They need more tools and packages (like Selenium WebDriver). They have more dependencies.

  4. Reliability. Black box tests are susceptible to race conditions and environmental failures, making them inherently more fragile. Recovery mechanisms take extra engineering.

The total cost of ownership increases when climbing the Testing Pyramid. When deciding the level at which to automate a test (and if to automate it at all), taking a risk-based strategy to push tests down the Pyramid is better than writing all tests at the top. Each proportionate layer mitigates risk at its optimal return-on-investment.

Test Automation

Test automation is indispensable for software development success. Tests should be automated when they give positive returns-on-investment.



  • Tests can be rerun at any time the same way.

  • Tests can run as part of CI/CD.

  • Teams can run more tests in less time.

  • Setup and cleanup can also be automated.

  • Frameworks can provide reports, logs, and screenshots.

  • Manual testers can focus on exploratory testing.

  • Test automation is software development and requires the same skills and practices.

  • Automation is inherently fragile because it depends upon ever-changing products.

  • Interruptions and intermittent failures can easily break automated tests.

  • Tests should be automated at the same time the feature is developed to give the best feedback.

Functional test frameworks and related packages are available in all major programming languages. Don’t reinvent wheels.




  • Catch

  • Google Test



  • Cucumber-JVM

  • JJBehave

  • JUnit

  • Serenity BDD

  • TestNG


  • CucumberJS

  • Jasmine

  • Karma

  • Mocha

  • Protractor

  • Serenity/JS\


  • Cucumber

  • RSpec

  • Shoulda

  • Test::Unit


  • Test Anything Protocol

  • Test:: More

  • Test::Simple


  • Behat


  • Behave

  • pyTest

  • unittest


  • ScalaTest


  • Each test should focus on one main behavior or variation.

  • Tests should run independently of each other. (They should be runnable in any order.)

  • Tests should cover behaviors as close to the point of origin as possible.

  • Tests should be self-descriptive and intuitively understandable.

  • Count(unit) > Count(integration) > Count(end-to-end)

  • Target near-100% code coverage for unit tests.

  • Do not automate every test – use a risk-based strategy with ROI.

  • Try to run automated test in parallel, especially in CI/CD.

Performance Testing

Performance testing determines if a software product works well. As a precondition, features must work correctly (meaning, functional tests pass).

Performance testing should not use functional testing frameworks. They should use performance-specific tools like JMeter or Visual Studio load testing.

“Good” performance means minimal impact to the 4 primary software performance metrics  in the system under test:

  1. Processor Usage

  2. Memory Usage

  3. Response Time

  4. Throughput


Lately, I’ve been doing lots of code reviews in our All of the reviews exclusively cover end-to-end test automation: new tests, old fixes, config changes, and framework updates.Below are the big things I emphasize in test automation code reviews, in addition to the standard review checklist items.

#10: No Proof of Success

Tests need to run successfully in order to pass review, and proof of success (such as a log or a screen shot) must be attached to the review. However, if the product under test is not ready or has a bug, this could also mean a successful failure with proof that the critical new sections of the code were exercised. Tests should also be run in the appropriate environments, to avoid the “it-ran-fine-on-my-machine” excuse later.

#9: Typos and Bad Formatting

Typos and bad formatting reflect carelessness, cause frustration, and damage reputation. They are especially bad for Behavior-Driven Development frameworks.

#8: Hard-Coded Values

Hard-coded values often indicate hasty development. Sometimes, they aren’t a big problem, but they can cripple an automation code base’s flexibility. I always ask the following questions when I see a hard-coded value:

  • Should this be a shared constant?

  • Should this be a parameterized value for the method/function/step using it?

  • Should this be passed into the test as an external input (such as from a config file or the command line)?

#7: Incorrect Test Coverage

It is surprisingly common to see an automated test that doesn’t actually cover the intended test steps. A step from the test procedure may be missing, or an assertion may yield a false positive. Sometimes, assertions may not even be performed! When reviewing tests, keep the original test procedure handy, and watch out for missing coverage.

#6: Inadequate Documentation

Documentation is vital for good testing and good maintenance. When a test fails, the doc it provides (both in the logs it prints and in its very own code) significantly assist triage. Automated test cases should read like test procedures. This is one reason why self-documenting behavior-driven test frameworks  are so popular. Even without BDD, test automation should be flush with comments and self-documenting identifiers. 

#5: Poor Code Placement

Automation projects tend to grow fast. Along with new tests, new shared code like page objects and data models are added all the time. Maintaining a good, organized structure is necessary for project scalability and teamwork. Test cases should be organized by feature area. Common code should be abstracted from test cases and put into shared libraries. Framework-level code for things like inputs and logging should be separated from test-level code. If code is put in the wrong place, it could be difficult to find or reuse. Make sure new code is put in the right place.

#4: Bad Config Changes

Even the most seemingly innocuous configuration tweak can have huge impacts:

  • A username change can cause tests to abort setup.

  • A bad URL can direct a test to the wrong site.

  • Committing local config files to version control can cause other teammates’ local projects to fail to build.

  • Changing test input values may invalidate test runs.

  • One time, I brought down a whole continuous integration pipeline by removing one dependency.

As a general rule, submit any config changes in a separate code review from other changes, and provide a thorough explanation to the reviewers for why the change is needed. Any time I see unusual config changes, I always call them out.

#3: Framework Hacks

A framework is meant to help engineers automate tests. However, sometimes the framework may also be a hindrance. Rather than improve the framework design, many engineers will try to hack around the framework. Sometimes, the framework may already provide the desired feature! I’ve seen this very commonly with dependency injection– people just don’t know how to use it. Hacks should be avoided because test automation projects need a strong overall design strategy.

#2: Brittleness

Test automation must be robust enough to handle bumps in the road. However, test logic is not always written to handle slightly unexpected cases. Here are a few examples of brittleness to watch out for in review:

  • Do test cases have adequate cleanup routines, even when they crash?

  • Are all exceptions handled properly, even unexpected ones?

  • Is the Selenium WebDriver always disposed of?

  • Will SSH connections be automatically reconnected if dropped?

  • Are XPaths too loose or too strict?

  • Is a REST API response code of 201 just as good as 200?

#1: Duplication

Many testing operations are inherently repetitive. Engineers sometimes just copy-paste code blocks, rather than seek existing methods or add new helpers, to save development time. Plus, it can be difficult to find reusable parts that meet immediate needs in a large code base. Nevertheless, good code reviews should catch code redundancy and suggest better solutions.

17 views0 comments


Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page