Sunday, February 28, 2010

Standardized Build Procedures

When working on any software development project, it is fundamental to be able to compile, package, deliver, and track the project output. But quite often, this area is overlooked and all of the design efforts are put toward managing the complexity of the software itself. This often leads to an ad hoc build script that gets more and more tangled and unmanageable as time goes on leading to valuable development time being wasted on non-deliverable items. As the project moves into production, the problem usually grows even more out of control as additional branches are required in the source control system, and nearly duplicate build scripts are required for each of the additional branches. Eventually a build strategy without a defined architecture or clear procedures will collapse under its own weight.

By putting some forethought into your build process, you can avoid a lot of confusion and frustration. Here is a list of areas that should be focused on to get you started.

Use Source Control
Using a source control system should be a given. This article will not cover the use of source control other than to say that you cannot effectively construct software (especially in a team) without a central source control system.


Determine a Branching Strategy
Successful software projects always have more than one version. If you have a QA department, you will have more than one version before you even get to production. Plan for success upfront, and determine how you will deal with multiple versions. There are many different strategies for dealing with this problem, and your particular needs will vary based the size of your organization, how your product gets installed, as well as many other factors. Be sure to research your options and work through various scenarios before deciding. This decision will have a huge impact on not only future development, but it will also affect you daily if your teammates are always confused about which branch they should be working in.

Version the Product

How you version your product has a profound effect on your ability (or inability) to determine what release the code was from, what branch the code came from, when the code was compiled, what exactly was in the version, whether or not it was a major release, minor release, patch, or hot-fix, etc. Brainstorm about possible scenarios your project might face in the future, and consider what you might need to remember or know about each event.


Defining All Desired Build Steps

Brainstorm on what your ideal build process would accomplish, and define the steps. You do not have to implement all of the steps at once, but being aware of your goals will help you make the right architectural and design decisions. Here are some high-level areas to consider:

Clearing old build content
Be sure to remove all old content from the source directory before getting the latest from source control, otherwise unexpected results will start to appear in the build output. It is also a good idea to clear out the "last output" directory at the beginning of the build. It is a waste of time to have a good build go bad at the last step because someone is doing a manual install directly from this folder and the files are locked.

Create Missing Directories
The build should not fail because of missing folders. Define a folder structure, and proactively verify it at the beginning of each build.

Set Version Information
Coders are famous for saying "it works in my environment" when something they own fails in someone else's environment. Assuming that this is the case, you should always first ask what was deployed to the failed environment. If you have an adequate versioning strategy and you tag all of you assemblies, stored procedures, configuration files, etc. with this version information, you can instantly know the version, were it came from, and what is supposed to be in it.

Manipulate SQL Scripts
SQL scripts can be very problematic and there is a lot that can be done in the build process to lessen the stress (e.g. consolidating files to minimize installer changes, or adding standardized logging at the beginning/end of scripts).

Run Static Analysis Against SQL Scripts
SQL scripts can be validated for common errors using regular expressions. For example, when a T-SQL segment is run in an environment that uses a linked server, it cannot use the notation [database]..[table] and must use [database].dbo.[table] instead. A regular expression like [^\.\s]\.\.[^\.\s] can be used to detect these problems. (Please note, this particular regex is an over simplification and will incorrectly flag text in comments as well as in T-SQL.)

Update Configuration files
The configuration needs on development environments are different then the needs of the installer. Use the build process as a buffer between the developers and the installer, which will minimize the amount of work needed to be performed in the installer. Here are some areas to consider:

  • Tokenize Installer Variables - Often installers will deploy the same configuration file that was checked into source control and replace specific values. By tokenizing these values in the build via embedded logic (e.g. regular expressions, xpaths, etc.) rather than in the installer, it loosens the coupling between the installer and the contents of the file and allows flaws to be found during build-time rather then during install-time.
  • Set Correct Initial Logging Levels -Developers often change the logging level while diagnosing issues, but there is likely a specified level during the initial install. Instead of relying on the correct setting to be checked-in, set it in the build process. Just remember that it is set in the build to avoid being perplexed when it needs to be changed.
  • Removing Development Environment Information - Make sure development credentials are not accidentally visible in QA or customer environment. Be sure to also consider removing comments in configuration files because developers will often toggle between environments by copying and pasting from comments stored in the configuration files.
  • Adjust for Strongly Typed Assemblies - When strong typing is required, version information may need to be updated in configuration files for frameworks such as Spring.Net, NHibernate, and others.


Compile
This might be an obvious step, but do not forget to consider which configuration to compile with (e.g. Debug or Release).
Run Unit Tests
There are many testing frameworks available, but no matter which you choose, proper unit testing leads to higher code quality and better separation of concerns. (Do not confuse this with integration tests.)

Run Static Analysis of Assemblies
Tools such as FxCop and NDepend can help flag poor development practices early so that architects and team leads can perform targeted code reviews in problem areas.

Copy Folders to Intermediate Output Folder
See Organize Build Output below.


Create Installer
Typically this step will only be performed on a main build server or on the build engineer's machine.

Deploy Integration Environments Automating the deployment of your project can be extremely useful. This is especially true when setting the stage for integration tests. However, it is important to also perform manual installs as well. Gradually, hacks can creep into the build process that alter the target system. This can inadvertently mask deficiencies in the installer. Make your installer do it's own upgrading.

Run Integration Tests
Combine a unit testing framework (e.g. NUnit, MBUnit, MSTest, etc.) and an integration testing suite (e.g. Selenium) to test the entire stack within your application. Just be careful in this area because integration tests can be extremely difficult to maintain as your application grows and changes. Also, do not rely on these automated tests for all of your development testing. Automation is a great tool, but it will never completely replace manual testing.

Purge Old Content
When the build is broken, it can be very disruptive to the entire team. Keep your build from failing due to something as trivial as disk space by purging old build output. For most organizations that produce "shrink-wrapped software," it makes sense to maintain all publicly released/supported versions and releases to QA for the current version, but automatically purging all but the last 2-5 development builds.

Consider Multiple Builds
I am a huge fan of continuous integration (CI), but I feel that it is often seen as "just the build." This leads to very stripped down implementations of CI that are focused on little more than compilation because time consuming tasks (e.g. running integration test) and tasks that reduce the likelihood of getting a "clean build" (e.g. unit tests failing, static analysis rules being broken, etc.) are eventually removed to achieve short-term time gains. Consider building modularity and reusability into your scripts, and offering multiple builds for multiple scenarios.

  • CI Environment - This is used to monitor the health of the system, enforce governance, produce the actual deliverables, etc... The CI environment should be considered a fail safe measure. Ideally this environment should be triggered most often by an "IfModified" timer rather than being forced by a developer.
  • Developer Build - This is a stripped down version of the CI build that gets run directly on the developer's machine. This allows the developers to test and get the latest schema changes without going through the hassle of waiting for the CI build. It also lessens the scenario where two developers check-in in close proximity. They both want to run the build to make sure they did not break the build, and the second developer did not get his/her changes checked-in in time for the first developer's build.
  • Local Only Developer Build - It is very similar to the "Developer Build," but instead of building to a build structure that automatically retrieves the latest from source control, it builds directly in the developers working directory. This gives developers the ability to test their changes before checking in, and alleviates the problems of "compiling by continuous integration."

Choose a Failure Model
It is useful for continuous integration builds to collect as many errors and warnings (e.g. running all unit test, performing all static analysis rules, etc.) as possible to minimize the iterations needed to fix the errors. However, for developer builds, it may be more beneficial to fail on the first error.

Choose a Build Technology

There are plenty of build technologies out there, and many of them duplicate each other. Before you choose one or a combination, you need to determine what your needs and preferences actually are (e.g. custom extensions, web interface, console interface, etc). Only then can you make a good choice of build technology.

Define a Build Folder Structure

Build servers are frequently shared between multiple branches and even projects. Determining a predefined folder structure will prevent all of these individual entities from corrupting each other. Base your structure on variables that can be used within your build framework. (e.g. The source directory could be defined as "$(root)\products\$(product)\$(branch)\$(source)")

Organize Build Output

Whether your build process creates an installer or not, the output of your build should be organized. But, when the build process creates an installer, you should also organize intermediate files that are consumed by the installer project. When an installer has to pull random files from multiple locations, it becomes extremely confusing overtime, and it is hard to determine what is actively being used. Additionally, it becomes frustrating for developers as file paths are tightly coupled to the installer project and cannot be changed without a change in the installer. Have your build create an output folder structure, and organize all of your files (e.g. content files, SQL scripts, etc.) within this structure. Then have the installer project pull directly from that folder structure. This not only makes things more discoverable and easier to understand, but it also loosens the coupling between the developers and the installer because the build process can rename and/or modify files on the fly before they are consumed by the installer.

Expose Output
Do not make your teammates guess where the build output is for current or past iterations. Also try to avoid long UNC paths (e.g. \\BuildMachine\c$\builds\products\MyProject\01.01.1002280723\installer\setup.exe). Open shares on the build server or another common repository with short meaningful names (e.g. \\BuildMachine\DevBuilds, \\BuildMachine\QaReleases, \\BuildMachine\GARelease).

Ship Deliverables in a Repeatable Way
It is very important that deliverables are shipped in a consistent way, and that manual verification is incorporated into this process. When the team thinks it is ready to turn over an iteration, go through a published turnover checklist that the entire team is aware of and involved in. This list should include items such as:

  • Performing documented manual turnover tests
  • Creating release notes using a standard template - Notes should contain:
    • Included features, enhancements, and fixes
    • Known issues not in work/bug tracking system
  • Copying the deliverable to a specific location
  • Creating an email based off of a standard template - The email should contain:
    • The delivered version number
    • The path to the deliverables (attachments are often too large)
    • The release notes
    • The upcoming release schedule
  • Sending the email to a predefined distribution list instead of a random list of recipients

Track Releases

Maintain a change list containing dates, version numbers, and included changes for all releases but especially hot-fixes and patches. You never know when you will need this information in the future, and when it is needed, it is usually an emergency.

Summary
It is just as easy, if not easier, to have spaghetti code in your build process than it is to have it in your application code. Use standard architecture principles to create a more maintainable system (i.e. gather requirement, abstract the details, identify areas of reuse, identify steps that can be run in parallel, etc.). Also, always remember to maintain consistency in your process, and do not rely on your automation to do all of the work for you.

No comments:

Post a Comment