Why do you use perl for unit tests?

Apr 9, 2014 at 6:50 PM
I'm trying to understand the Visual F# infrastructure?
What is perl doing here exactly?
Coordinator
Apr 14, 2014 at 7:43 PM
The "functional tests" (to keep them distinct from the "unit tests" which use NUnit) use Perl for a few reasons. Mostly historical, but also practical.

The main driver script RunAll.pl, has roots going back to the '90s, and some form of it has been used for testing C#, VB, F#, and other languages at Microsoft for many years. In fact, it was borrowed from the C# team when F# first started making a home in the Visual Studio organization around VS 2010 timeframe - that's what the C# and VB team used, so it made sense for F# to use that same thing. With Roslyn, C# and VB made a clean break and rebuilt their entire testbed in XUnit, which is what you see today in their open source repo.

Of course, it might seem odd to use Perl for testing today, for a managed language. We are well aware of various actual or perceived shortcomings:
  • It isn't "fashionable"
    • Mostly just a perception issue, but when fewer developers have experience with the language, you get real issues such as...
  • It's difficult to maintain
    • The intersection of experienced F# and Perl developers isn't especially large, so fixing or updating our test infra isn't easy. Not to mention RunAll.pl itself is in general a house of horrors...
  • It's slow
    • Almost totally a perception issue, at least for our purposes. The overall design of the tests is what makes execution slow, not the Perl driver.
  • It doesn't natively integrate with the technology under test
    • It would make things much simpler if we could use F#/.NET components directly in the test infrastructure.
Using Perl does afford us some legitimate upsides:
  • It's cross-platform
    • With a few environment tweaks, the majority of the FSharpQA test suite can be (and has been) run on Linux/Mono using the same Perl script drivers.
  • It allows for fast iteration
    • Using a script-based system allows us to edit and re-execute tests much more quickly than if all tests were compiled into an NUnit or XUnit test assembly. The "core unit tests" today take about ~10 seconds to compile on my (pretty strong) laptop. The set of functional tests is 3-4 times larger in terms of number of test cases, but those test cases are generally much more involved. Compiling them all into an NUnit-style test assembly would take at least 2-3 minutes, probably more.
Hopefully that gives some context.
Apr 14, 2014 at 9:51 PM
Thanks Lincoln. That really helped.

The one reason why I asked was the additional dependency. Of course this alone is not really a problem, but there are really a lot of additional steps for this project. From your comment I get this might not be the easiest part to improve ;-)

Cheers,
Steffen
Apr 15, 2014 at 9:06 AM
Edited Apr 15, 2014 at 9:09 AM
OK, I looked a bit around.

I know this has probably very low (zero?) priority for the F# team, but if you are interested in changing (and hopefully improving ;-) ) this, then crowdsourcing via "up-for-grabs" might help here.

I see a couple of interesting tasks:
  • Basic tests for records, unions, tuples, ... should be moved to the "unit test" suite
  • complicated "functional tests" should stay as .fsx files in this folder
  • a small .NET .exe (commited or built during the bootstapper phase) could work as a test runner and remove the dependency.
  • the output of the test runner should be improved. It seems it is only reporting passed or failed (based on the exit code).
What do you think?
Coordinator
Apr 17, 2014 at 7:04 PM
It's a priority to make our repo usable for developers, and that includes an effective test system. The current system is useable, but not easy to get started with. Migrating to a better (and standard, hopefully) system is definitely a possibility (and something we have been looking at internally for a while), but will take time.

For now, to address your individual comments:
  • The current "unit tests" are focus on validating the runtime functionality of what's exposed by FSharp.Core. Runtime functionality is only one slice of what's captured by the phrase "basic tests". Validating syntax, compiler diagnostics, generated IL, execution via FSI vs compiled, etc are still probably "basic tests", but don't really fit in the current unit tests. But yes, it's definitely useful to fill in gaps in runtime functionality testing, as your recent PRs have been doing, and that indeed belongs in the unit tests.
  • Eventually even "complicated tests" could be handled by some kind of unit test framework, even if that means shelling out to start external scripts. But for now, the current system does ok.
  • Using a standard test driver (e.g. nunit, xunit) would be useful as an eventual goal, due to widespread familiarity, support, potential integration into IDEs, etc. I don't personally see much to be gained from replacing perl + custom perl scripts with a new custom .NET driver with 100% duplicate behavior. I'd rather see us put that effort into migrating to a standard framework.
  • The test runner drops detailed failure logs to disk (FSharp_Failures.log, FSharpQA_Failures.log), but only outputs basic pass/fail info to the terminal. That's in line with most test frameworks, in my experience. Did you have a different scenario in mind?
May 15, 2014 at 2:39 PM
Lincoln -- you could use NUnit to run these tests (instead of scripting them with Perl), without compiling everything into an NUnit-compatible test assembly.

I'm thinking of something very similar to the way I've implemented my compiler tests for the tools within my Facio project; for reference: https://github.com/jack-pappas/facio/blob/master/FSharpYacc.Tests/TestCases.fs

The basic idea is to write a "dynamic" NUnit test assembly which does not actually include any of the code you want to test. Instead, you implement a TestCaseSource backed by code which discovers the tests at run-time (i.e., when the test assembly is loaded by NUnit) and creates a TestCase for each one. Once that's done, you implement a test or tests which use this TestCaseSource and invoke each test as a separate process, checking the process' exit code (and it's stdout/stderr streams, if applicable) to determine whether the test passes or fails.

This approach would be very compatible with the current approach, since it already launches the tests in a similar way via the command-line and checks the exit code to determine pass/fail. The benefits of my proposed approach over the existing approach are:
  • The test setup should be much less fragile than the existing setup.
  • It doesn't require Perl to be installed on the machine in order to run the tests. The only non-built-in tool required for the tests is the NUnit runner, which can even be fetched via NuGet.
  • Running the tests via NUnit means we'll be able to take advantage of the plethora of tools which provide NUnit integration (e.g., TeamCity). NUnit also outputs all pass/fail information into a single XML file, so it'll be easy to perform any additional processing of the results if there's a need to do so.
Coordinator
May 15, 2014 at 9:21 PM
Jack - This is very interesting. I was not aware of native support for dynamic generation of test cases in NUnit - do I understand correctly that your test DLL is loaded, the "Source" elements are constructed up front and executed (sniffing around and returning some collection of test cases, which NUnit keeps track of and exposes via UI), then you can execute any of those individual tests on demand? If so, that's pretty slick :-)

That could be a really nice approach, thanks for bringing it up. The main thing keeping my wary of moving to full NUnit was the compilation cost, but this works around that issue nicely. I'm not sure I agree with all your advantages, though. I don't see why it would be particularly more or less fragile than the current situation - it operates in the same basic way. And it's just replacing the requirement to install one 3rd-party tool with another. Though I grant you that NUnit is likely an easier pill to swallow than Perl.

For our case, we would not want to execute test cases in separate processes. Requiring spinup and teardown of fsc.exe thousands of times was one of the big perf drags in the current system, which is now mostly mitigated by our hosted compiler infrastructure. We'd want to keep everything in-proc as much as possible, and able to support parallel execution.

The big downside, of course, is that almost all of the Perl driver/parsing/execution code would need to be reimplemented in F#, which would take time and lead to instability for some period. But any overhaul will likely have a similar cost...
May 17, 2014 at 3:30 PM
Lincon - Yes, that's correct. When your test assembly is loaded by NUnit, it'll see that you have a test method (marked with [<Test>]) which is also annotated with [<TestCaseSource(...)>]. The test runner creates an instance of the type you specify as the argument to [<TestCaseSource(...)>], loads all of the test cases from it, then executes the test method once for each TestCase returned by the TestCaseSource. This means you can run arbitrary code to dynamically build up the list of test cases you want to run against a test method. In the current test setup for Facio, I recursively traverse the directory structure of the TestCases folder in the Facio repository to find all of the *.fsl and *fsy files then return each one as a TestCase; the test method constructs the command-line arguments for the tool from the data in the TestCase, then runs the tool against each file in a separate process and determines whether the test passes or fails based on the exit code of the process. This approach makes it easy to build a test infrastructure where you can just drop in repro cases for bugs or stress tests for the compiler into some folder and they'll automatically be included in the next test run without having to do any additional work.

I made a mistake when I said the current setup for the fsharpqa tests was fragile. I confused them with the tests in the 'tests/fsharp' folder, which are constructed as a series of batch (*.bat) files; when these tests were merged in, I had a heck of a time trying to get them to run correctly on my machine (they couldn't locate certain tool paths, for example), and even once they did, it wasn't straightforward to comprehend the results of the tests. The upside to the approach I described above (using NUnit) would make it straightforward to have a single, robust test setup that incorporates any kind of tests you want to run -- whether they be standard unit tests compiled into the test assembly, snippets you want to run through a specific part of the compiler infrastructure, or bug repro cases you want to invoke the full compiler on from the command line.

You don't have to execute the test cases in a separate process if you don't want to. The approach I'm proposing makes the test methods are generic (in the general sense of the word) and they're essentially lightweight test-runners in their own right; you could implement two versions of the test method, one which runs all of the tests in-proc and another which runs each test in a separate process, then apply the [<Category(...)>] attribute to them (e.g., [<Category("InProcess")>], [<Category("OutOfProcess")>]) so you can choose which tests to run (or not) at run-time. As for running tests in parallel -- I don't think NUnit supports this yet, but xUnit and MbUnit have some support for it (though their support for dynamically discovering test cases doesn't seem to be as complete as NUnit's).

I agree that the downside of all this is that it'd take some non-trivial amount of time and effort to implement such a setup. However, it would provide an excellent opportunity for the F# community to contribute to the compiler / core libraries, especially since it doesn't require contributors to have a working knowledge of compiler implementation. IMO it's worth it overall, because in the end we'd have a much more streamlined approach to testing the compiler and libraries, which means it'll be easier to integrate contributed repro cases into the test suite and more likely that everyone will run the full test suite (as they should) when contributing changes to the compiler.