Parameterized JUnit Tests

Useful when multiple tests have the same structure, but have different inputs and produce different results. Here’s the skeleton:

import static org.junit.Assert.assertEquals;

import java.util.Arrays;
import java.util.Collection;

import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import org.junit.runners.Parameterized.Parameters;

public class SumTests
	@Parameters(name="{index}: {0}+{1}={2}") // name is optional, {index} will be used by default; arguments used here must be declared in constructor
	public static Collection<Object[]> data() 
		return Arrays.asList(
			new Object[][] 
				{1, 1, 2},
				{3, 5, 2}, // fails
				{-1, 1, 0}

	public SumTests(int a, int b, int sum)
		this.a = a;
		this.b = b;
		this.sum = sum;

	public void sum()
		assertEquals("Sum is not correct", sum, a + b);
	private int a, b, sum;
Parameterized JUnit Tests

Selenium WebDriver on Firefox: Working with Add-Ons

Running Selenium WebDriver on Firefox with Static Add-Ons

  1. Create a special profile for Firefox
  2. Install add-ons on that profile
  3. Start Firefox as described here

Installing Add-On when Starting Selenium WebDriver on Firefox

import org.openqa.selenium.firefox.FirefoxProfile;
import org.openqa.selenium.firefox.FirefoxDriver;
// ...
final String addOnPath = "C:\\Temp\\youraddon.xpi";
File addOnFile = new File( addOnPath );
FirefoxProfile profile = new FirefoxProfile();
profile.addExtension( addOnFile );
WebDriver driver = new FirefoxDriver( profile );

Getting List of Installed / Active Add-Ons with Selenium WebDriver on Firefox

There’s no easy way to achieve this unfortunately. So the method below is really an ugly hack, but it get the job done:

  • Firefox is loaded on about:addons page
  • The page contains list of add-ons in JSON format, which can be parsed
FirefoxProfile profile = new FirefoxProfile();
profile.setPreference( "browser.startup.homepage", "about:addons" ); 
WebDriver driver = new FirefoxDriver( profile );
driver.get( "about:addons" );
String source= driver.getPageSource();
final String addOnsSectionStart 		= "<browser id=\"discover-browser\"";
source = source.substring( source.indexOf( addOnsSectionStart ) + addOnsSectionStart.length());
source = source.substring( source.indexOf( "{%22" ) );
source = source.substring( 0, source.indexOf( "\" clickthrough" ) );
source = URLDecoder.decode( source, "UTF-8" );
JSONObject addonObjects = new JSONObject(source);
JSONArray jsonAddonArray = addonObjects.names();
		"user disabled",
		"is compatible",
		"is blocklisted",
for(int i = 0; i < jsonAddonArray.length(); i++)
	JSONObject jsonAddonObject = addonObjects.getJSONObject(jsonAddonArray.getString(i));

The output will look like this:

type      	version        	user disabled  	is compatible  	is blocklisted 	name
plugin        	false          	true           	false          	QuickTime Plug-in 7.7.1
plugin      	false          	true           	false          	Java(TM) Platform SE 6 U26
theme     	17.0.1         	false          	true           	false          	Default
plugin    	2.4.2432.1652  	false          	true           	false          	Google Updater
extension 	2.28.0         	false          	true           	false          	Firefox WebDriver
Selenium WebDriver on Firefox: Working with Add-Ons

Test Automation vs. Mechanization

Wiki has a clear definition for both of those terms:

Mechanization provided human operators with machinery to assist them with the muscular requirements of work. Whereas automation is the use of control systems and information technologies reducing the need for human intervention. In the scope of industrialization, automation is a step beyond mechanization.

In testing, however, I see those two concepts often dangerously mixed: people tend to mechanize testing, thinking they are automating it. As a result, the estimated value of such “automation” is completely wrong: it will not take risks into account, it will not find important regressions, and it will give a tester a false sense of safety and completeness.  Why? Because of the way mechanization is created.

Both automation and mechanization have their value in testing. But they are different, and therefore should always be clearly distinguished. Whereas automation is a result of application risk analysis, based on knowledge of the application and understanding of testing needs (and thus finding tools and ways to cover those risks), mechanization  goes from the opposite direction: it looks for out of box tools and easiest ways to mechanize some testing or code review procedures, and makes an opportunistic usage of those methods. Mechanization, for instance, is great for evaluating a module/function/piece of code. It is, if you will, a quality control tool, but it does not eliminate a need for quality assurance testing, either manual or automated.

It’s like making a car: each detail of the car was inspected (probably in mechanized, or very standardized way) and has a “QC” stamp on it. But after all details were assembled, do they know if and how the car will drive? If car manufacturers did, they would not have to pay their test drivers. Yet, test drivers are the ones that provide the most valuable input, that is closest to actual consumer’s experience. In software, testers can play a role of both, quality control personnel, and test drivers. Taking away one of those roles, or thinking that test drive can be replaced with “QC” stamp is not the way to optimize testing. And bare existence of mechanical testing should not be the reason to consider application “risk free” or “regression proof”. Otherwise you may end up in a situation where your car does not drive, and you don’t even know.

Test Automation vs. Mechanization

First test your testers, then trust your testers

Quite frankly I am lucky: developers I worked with (well, most of them) were quite open to interactions with testers, didn’t take the “bad news” testers brought personally, and were able to learn to work with testers efficiently. In other words – they valued services testers provided, which I think is one of the most important factors in this relationship.
However, even those, who valued testers, often pointed that testing lacks some predictability, and the path tester follows looks almost accidental. And this is where I think the major difference between developers and testers is: while developers’ work is algorithmic by nature (define-design-do), and thus it’s possible to develop a fairly detailed plan ahead of time, same level of planning may not be possible when you try to break things. It doesn’t mean that testers always “shoot in the dark”: they have to have a master plan and initial evaluation of risks. But very often the most efficient next step tester takes depends to the large extend on the findings and outcomes of the previous one. And thus it can be difficult for a tester to specify precisely what and how he or she will be testing in several days from now.
Another aspect: developers have a clear finish line. Developer is done, when he or she implemented the required functionality, and it works (whatever this means for particular culture, project or a developer). For testers the finish line is always arbitrary (you can barely run out of test cases), and can only be defined as an abstract moment when tester ran out of important test cases. The term important is very well defined in the context of testing: important to stakeholders, company reputation, etc. But since it doesn’t have an associated precise mathematical formula, testers often have hard time explaining the others when they will be done, causing developers to feel uneasy. It takes some skill and experience (not to mention talent, which I think is the main factor of any work) for tester to choose the best point in time when tester can say “I’m done”. And it takes some skill and experience for developers to trust tester’s judgment. First test your testers, then trust your testers. The first part of this statement can be a subject of a separate post, and the second is described well in Product Bistro: Love Developers, and Trust QA by Mitchell Ashley.

First test your testers, then trust your testers

3 simple rules of information sharing

1. Anything reusable must be in a shared location. Personal email accounts, local file folder, or someone’s head are not places where reusable information can be stored. Also the format of the information, stored in a shared location has to be readable by anyone.

2. Store only unique information, that is information that may take time to recover (results of the research, internal information, abstracts from multiple sources, etc.). But do not duplicate or link anything, that is a “Google search away”, unless it’s used so often, that it makes sense to store it in a well known location. Another exception is rare information, or information, located on the site that may disappear or change its contents (e.g. someone’s personal page, forum, etc).

3. When you run into outdated or incorrect information inside the shared repository, update it or at least mark it as such. This will save some time for others, and also will make sure that the quality of the information does not degrade over time.

3 simple rules of information sharing

On regressions and excuses

Two of the most popular (and least pleasant) conversations between testers and developers, go like this:

    • Testers: “It turns they changed it, but they didn’t bother to tell us!”
      Dev: “Oh, we didn’t think it was important!”
    • Testers: “We told them about this issue long time ago, but they didn’t care”
      Dev: “From their explanation it didn’t sound like an important issue”

      However those conversations are easily eliminated, by following two simple rules:

      • Any change that can affect some portion of the code somewhere should cause a discussion with the goal to understand the risks. Change can be “accepted” with no further investigation/testing if it’s not considered to be risky by all sides, or it can cause the need for additional investigation and testing. I don’t call for discussion around each line of code: I’m sure every team can come up with a unit that makes sense for a particular product (e.g. component, module, or any other definition of functional area). And such discussion is especially important when no other changes are done, no extensive testing is planned in the same functional area, and thus regressions can slip undiscovered.
      • Similarly any issue or concern that is being raised, should not be discarded without understanding how it affects the product, which risks it involves, whether it requires further investigation and testing, etc. This especially applies to those “last day” findings, which often are discarded too fast, because everyone is in “Let’s ship it!” mood.
      On regressions and excuses

      Performance testing: how to survive terminology and start thinking about the goals

      When it comes to those types of testing, where we measure how application performs in different conditions, and compare its metrics against different other applications or standards, almost each testing culture has its own definitions for the same terms, which is very confusing. Even performance testing itself may mean completely different things to different people. Commonly there are three groups of perceptions:

      • Performance testing seeing as an umbrella term for any type of testing related to application quantifiable limits, stability, throughput, etc. Thus it includes load, stress, volume, endurance, soak, peak-rest, spike, storm and other types of testing. An example of such view is expressed Software Performance Testing article on Wikipedia or Microsoft’s Fundamentals of Web Application Performance Testing.
      • Performance testing seen as a specific type of testing, separate from other types, such as load, stress, etc. One of the most ardent defenders of this point of view, explains his position in his blog post.
      • And in the middle there are all those, who believe that performance is an umbrella term for some types of testing, but does not include others. For example I once had a lengthy discussion with a colleague, who believed performance testing to be any type of testing that deals with operations on application (for example load, stress, endurance), but not with operations on data (e.g. volume).

      Advocates of each approach have their supporting arguments, of course, and references to literature. So when working in a specific company, it’s always a good idea to either find out an existing or to establish a new common vocabulary. But besides that, does it really matter how you call it? Not at all. Even though I do have my preferences, and I would prefer to have one common umbrella term (and performance testing is seems perfect for this role), and I would like to see an agreement on what each of the other terms means, I am ready to give up on any definitions, if it shifts focus from reaching the testing goals to fighting the terms.

      And as opposed to definitions, the goals of performance testing are usually quite clear.

      The first goal is to make sure anticipated workload can be supported: application will not “break”, and its performance characteristics (time, throughput, or any other measurements relevant for the application) will not degrade below acceptable limits. Such testing is 80% planning, and 20% execution, as environment, transaction/event distribution, amount of operations/events, and volumes of data must be carefully planned to represent typical production environments with maximal possible proximity. This involves two important preparative steps:

      • Understanding the performance goals, finding the appropriate and relevant metrics, defining typical distribution of the operations, and typical environment, etc. This step involves interviewing multiple stakeholders, getting confusing “wish lists”, or an answer “I don’t know”… All of which could be a topic for the separate post.
      • Each transaction or operation should be tested by itself (including concurrency test) to eliminate the obvious problems (e.g. functional bugs, memory leaks, or other inefficiencies) within the transaction itself.

      Usually first outcome of this type of testing is the necessity to deal with resolvable bottlenecks, e.g. inefficiencies within the environment and application itself. On later stages some further fine tuning can be required (e.g. database / application maintenance procedures and policies, hardware recommendations, etc.). This type of testing completes, when

      • Application is able to support all anticipated workload scenarios: it doesn’t break, and its performance characteristics are acceptable.
      • We collected performance characteristics of the application for each of those scenarios. Those characteristics can be used as a benchmark for other tests, or other versions of application.
      • We can specify hardware requirements and maintenance procedures required to support an anticipated workload.

      Once we know that anticipated workload can be properly supported, the workload on the application is increased to and beyond the limits, which can be subdivided into two sub-goals:

      • Finding workload limit: maximal workload application can handle with acceptable performance characteristics. This is the point before application breaks or its performance significantly degrades. This point may exhibit non-optimal performance characteristics, just acceptable.
      • Finding what happens when workload exceeds the limit, that is the application breaks, its performance characteristics degrade to unacceptable levels, or one of the “unresolvable” bottlenecks is reached

      This testing can use the same environment and transaction/event distribution as the load testing, but the amount of operations and volumes of data must be increased gradually to reach and exceed the limit. One of the most important goals of this testing is to make sure that anticipated workload is not dangerously close to workload limit, and that application fails predictably and somewhat gracefully.

      Sometimes understanding whether application performs well with anticipated load is impossible, since nobody seems knows what the anticipated load is. In such case, working backwards (i.e.: finding the application limit, and understanding whether it’s acceptable, and close to typical workload) can help.

      At the same time we can start the lengthiest of all test: test, that verifies whether anticipated workload can be sustained for a long period of time. This type of testing is commonly called endurance testing. “Long” is defined differently for different types of applications, but in enterprise environment we should be talking about at least few weeks. As a basis we can still use the same scenarios as in the first test, but it’s good to extend them with some additional “real environment” features, such as periodical issues with an underlying structure (e.g. lost network connections) and erroneous inputs, if those are not part of the original tests already. This testing can reveal “hidden” issues, like memory corruptions, caused by multiple failures, or insignificant memory leaks, that turn into a real problem over the time. It also allows to find out a magical “number of hours application can run without a failure” measure, so popular with managers.

      When it’s already known how the anticipated workload is handled, and what are the application limits, the workload can be sharply or slowly increased from regular to maximum and then decreased back to regular, as if it goes through the rush-hour, peak or spike. Another version of this test, takes the workload from none to maximum and then back to idle. The goal of both tests is to observe how the application behaves and how long does it recover after a / peak. This test might be more important for certain applications, where such waves of workload should be expected on regular basis. In such case it might be a good idea to combine this type of testing with testing for long period of time, by creating alternating anticipated-maximal-anticipated-low-… workloads on an application for an extended period of time. Also distribution of the transactions or operations during the spike might be different from distribution of operations with regular workload (for example: in the morning many people try to login, while very few started to do something else).

      Another common test that a distributed enterprise application may require is a test that determines when to scaling up or scaling out will be more beneficial to handle an increasing workload the hardware on which applications runs. Scaling up (also called vertical scaling) adds more hardware resources to an existing machines (e.g. more memory, faster hard drive, etc.), while scaling out (also called horizontal scaling) adds additional machines and distributes work between them. Here’s an example of such testing performed by Pentaho.

      Another interesting area is the size and growth rate of the data in application back-end storage (e.g. database, or file system). Naturally this testing is part of all of the above tests, as all of them will produce large quantities of data, and thus resolution of many bottlenecks will require estimation and adaptation for data growth, or we may want to run tests on a large developed database, rather than on an empty one. However this testing can also be done separately, with the goal to provide recommendations specifically targeting DBAs / system administrators, who may not be involved or familiar with application itself.

      Did I forget anything? I most surely did.

      Performance testing: how to survive terminology and start thinking about the goals

      What to Automate (possible approach)

      1. Choose “low hanging fruits”: automate test cases that are represent most popular use cases and look easiest to automate

      2. Go by “breadth first” principle, rather than “depth first”: it’s better to have some basic coverage for majority of the areas in the project, than a very in-depth automation for only one area:

      • If no automation exists for the project at all yet, start from creating automation for 1 most obvious use case for the majority of the areas (except for those areas, where automation creation is significantly more complicated). This type of automation can answer a question whether functionality works at all or not. At this stage, no need to worry about reusable libraries, or for tests to be extensible: you are likely to make enough mistakes to want to rewrite them later. But by making those mistakes, you will also understand what you actually need, and how you could organize things better.
      • Once you have a basic automation for the majority of the features, look at:
        • Most popular features
        • Most popular use cases (including positive once, and the basic errors)
        • “Cheapest” areas to extend automation
        • Target at least 50-70% savings in time as an exit criteria from this step.
      • If you still have some time and no other project in hand, look at the remaining functionality, and the way automation is organized:
        • Are there any areas where existing automation could be extended to cover all or most of the known test cases?
        • Can you improve the organization of your automation, to allow a “one click” automation, as well as binary (pass/fail) result reports?
        • Is you automation organized in the way that allows other people to pick it up and use it effectively?

      This is also a good time to invest in test infrastructure, reusable libraries, etc. Ideally while working on this stage, you gradually get rid of primitive tests you created before, and replace them with more sophisticated ones.

      3. Maximize time savings, and minimize maintenance time:

      • How much time automation will save per day/week/iteration/month/release/year?
      • How many times are you likely to run it? How many times you must run it and how many times you would like to run it, for further confidence for example?
      • For how long automation will run comparing to similar manual test?
      • How long does it take to create the automation?
      • How much additional work is expected every time before automation can run
      • How much maintenance will be required if tested feature is changed (slightly or significantly), and how likely the feature is to change in the nearest future?
      What to Automate (possible approach)

      Firefox Profiles

      In essence Firefox profile encapsulates user information: bookmarks, cookies, history, private information, e.g. passwords and so on. Which makes them very useful in many occasions, e.g.

      • Different people sharing the same computer and not willing to create different users on operating system level. Especially with blooming social networks, blogs and other sites where logging in is required, it’s much easier to keep different profiles, than log-out and log-in each time, to each site.
      • Even for the same person who is using the same machine for different purposes, this feature can allow to separate the roles of the machine, from web browsing perspective, e.g. “home”, “work”, etc.
      • Sometimes creating separate profile is also good when working on a specific project which requires a large amount of bookmarks, which can be deleted once the project is complete.
      • Finally profiles are very useful in testing, as they allow to simulate different users without the pain of creating different sessions on OS level

      Without further configuration Firefox usese a default profile (in early versions of Firefox you could see it every time you opened a browser). In Firefox 2 or 3, however, you need to start Profile Manager purposely to see the profile you are using and to create/delete additional profiles.

      Summary of the commands:

      1. Start Profile Manager
         firefox.exe -ProfileManager

        The switch is not case-sensitive, thus you can type -profilemanager. And in addition you can use -p or -P switch: their meaning is slightly different, according to the command line reference, but they open the same Profile Manager anyways. Usually there’s no need to specify the full path to Firefox (e.g. C:\Program Files\Mozilla Firefox\). With -ProfileManager switch alone, Profile Manager will only start if you don’t have another instance of Firefox running. Thus

      2. Start Profile Manager, while another instance of Firefox is already running, or open Firefox with multiple profiles simultaneously (works with Firefox 2 or higher):
            firefox.exe -P -no-remote

      3. Start Firefox with a specified profile:
            firefox.exe -P "profile name" [-no-remote]

        A -no-remote switch in this case again is only required if another instance of Firefox with different profile is already running.

      4. Create a new profile:
            firefox.exe -CreateProfile "profile name" [-no-remote]

        Once again, -CreateProfile switch is not case-sensitive, and -no-remote switch is only needed if another instance of Firefox is already running.

      5. Create a new profile in the non-default location:
            firefox.exe -CreateProfile "profilename profile_path" [-no-remote]

        Here profile name and path must be quoted together.

      6. Finally, you can start Firefox with a profile, that is not defined through the Profile Manager (useful when you need to test something, with a profile that you recieved, say, from a customer):
            firefox.exe -profile "X:\myprofile" [-no-remote]

      7. And this is how to recreate a default profile

      Profiles are defined in the file called profiles.ini, on Windows located in %APPDATA%\Mozilla\Firefox folder. An explanation of the file structure and the meaning of the values is provided on Profiles.ini_file Mozillazine page.

      I couldn’t find an official reference to confirm this number, but from what I’ve seen, Firefox allows up to 20 profiles per Windows user.

      Firefox Profiles

      “End User Experience” Testing

      Sometimes running load, usability, functional and UI testing separately is not enough, as it operates on certain sub-set of variables, assuming the others to be static. It’s like projecting a cube in 2d. This is why one of the tests I like to do is “End User Experience” testing: simulating a real user, performing a real set of tasks.


      1. Choose a few transactions or scenarios most commonly performed by the users. Say, if I did this type of testing for, I would probably choose “Add New Post”, “Search on site” and few more.

      2. Define an overall goal for each transaction. It’s best if the goal is close to what typical real user would do. For example: if an average post length on WordPress is about 240 words, tested transaction “Add New Post” may have an overall goal of creating post with 240 words.

      3. Break transactions into steps, and define data for each step: what exactly will you do during the transaction? How will you navigate from step to step? Which options, features, shortcuts you will use? And so on. Since there’s usually more than one way to accomplish the same task, defining those actions is very helpful for the analysis: it takes away the guessing game of “how did I actually accomplish it?” and it also allows to later concentrate on some transactions that are seems problematic. For example: in order to add new post, I may go to Dashboard, or I may just click a “New Post” button from the top menu. My final results may be different depending on how did I accomplish it, and thus it’s important to remember which way it was done.

      At this step, we have something like the following table:

      Transaction Goal Actions & Data
      Add New Post Create post with 240 words
      1. Navigate to Dashboard
      2. Click Add New in Left-side menu
      3. Provide title (4 words)
      4. Type 100 words
      5. Provide link with 10 words
      6. Type 130 more words
      7. Click Publish
      etc. etc. etc.


      There are many ways to perform this test, for example:

      • Single “experienced” user runs the designed test in normal (not too fast, not too slow) pace, noticing time it took him or her to accomplish different steps of the testing, different inconveniences (was the scrollbar present? Was font too small?) and issues.
      • Same “experienced” user runs the same test, but this time automatic load test is running on background.
      • Same as above, but this time let “novice” user to run the test (how fast he or she will discover how to accomplish steps? How much time the mistakes this user will make will cost him or her? Will their wrongdoing cause any additional problems?)
      • and so on.
      “End User Experience” Testing