Bring up to standard PHP legacy code - Application Status

In this post we will look at how to get a clear vision of the project's status before starting refactoring the application and plan accordingly.

Bring up to standard PHP legacy code - Application Status

Before we're doing any work and start migrating the project, we will need to have a good idea with what we are dealing:

  • PHP versions?
  • Dependencies? What versions?
  • Is there a framework?
  • Existing unit tests?
  • Were the developer before us following the best practices?

Let's be frank, most of the time when we take over a legacy application there won't be any documentation for us, it would have been great and all, but unfortunately if such was existing, I would bet the application wouldn't have decayed to be in the state it is right now.

Now, it would be easy to blame on the previous developers, that they were all idiots and didn't know what they were doing. But who can honestly say they always coded the perfect application without any errors or failures, all documented, all tested within the deadlines? It's very tempting when reading legacy application to think:

What were they thinking? They don't know how to write code *ugh*

But unless you know exactly what the developers went through (maybe there was conflicts in the project management, maybe there was a shortage of man power, maybe the market was so volatile that the product needed to shift many times to allow to company to survive, etc), you can't judge or blame.

But rejoice, now are better times! You can migrate this big ol' monolith to brighter futures.

To get a better idea of what the project is, there a few things to look at first. Maybe you just got the source files without the dev environment or just an access to a git repository without any README.md.

First things first

The first things I would be looking at would be if there are any composer.json or composer.lock files. The first one should give a ton of infos, the second one would gives us more details if needed, if let's say the composer.json is a bit weak. composer.lock would give us exactly the what the application is depending on and which versions. But the composer.lock file is a bit tedious to read, so that's why I suggest to go over the composer.json file. Depending on how rigorous the previous dev you may found out several things:

  • PHP version restrictions
  • Any frameworks dependencies (now Composer has been around long enough for dev to use it whenever they needed a framework) (hopefully for you)
  • Any tests
  • If the developer were using or not tools in the require-dev section

With all those info you should have a really good idea of what's going on here.

Uh oh!

If the project is really old, and there are no composer files, it will be a little bit trickier, you may have to look for custom scripts or any vendor, vendor_custom, vendor2, etc folders.

Another great find would be to find the point of entrance of the application, any index.php or web/index.php or public/index.php. Or maybe even a main.php if previous developer had a strong C or Java influence.

If you start seeing .php4 or .php5 extension file, be careful it may be a very old projects where the developers had to support PHP4 and maybe upgraded to PHP5. If it is the case I would recommend a different strategy for migrating the application as it might be simply too much.

Guzzle 3

In order to show you examples and explain things in a more concrete way I'll make use of an old famous project: Guzzle 3 (it has been deprecated, the last meaningful commit is from Apr 29, 2015). The rest of the series will be based on it.

PHPLoc

A quick and handy tool is PHPLoc and gives away some little details about the project, they might be details but they hold a lot of knowledge.

To install it follow the instructions:

$ wget https://phar.phpunit.de/phploc.phar
$ chmod +x phploc.phar
$ mv phploc.phar /usr/local/bin/phploc

As results for Guzzle3

guzzle3 on master
$ phploc src
phploc 5.0.0 by Sebastian Bergmann.

Directories                                         43
Files                                              233

Size
  Lines of Code (LOC)                            25221
  Comment Lines of Code (CLOC)                    8070 (32.00%)
  Non-Comment Lines of Code (NCLOC)              17151 (68.00%)
  Logical Lines of Code (LLOC)                    5081 (20.15%)
    Classes                                       4311 (84.85%)
      Average Class Length                          18
        Minimum Class Length                         0
        Maximum Class Length                       162
      Average Method Length                          2
        Minimum Method Length                        0
        Maximum Method Length                       42
    Functions                                        0 (0.00%)
      Average Function Length                        0
    Not in classes or functions                    770 (15.15%)

Cyclomatic Complexity
  Average Complexity per LLOC                     0.34
  Average Complexity per Class                    8.41
    Minimum Class Complexity                      1.00
    Maximum Class Complexity                     86.00
  Average Complexity per Method                   2.25
    Minimum Method Complexity                     1.00
    Maximum Method Complexity                    34.00

Dependencies
  Global Accesses                                    0
    Global Constants                                 0 (0.00%)
    Global Variables                                 0 (0.00%)
    Super-Global Variables                           0 (0.00%)
  Attribute Accesses                              1754
    Non-Static                                    1683 (95.95%)
    Static                                          71 (4.05%)
  Method Calls                                    1992
    Non-Static                                    1849 (92.82%)
    Static                                         143 (7.18%)

Structure
  Namespaces                                        44
  Interfaces                                        50
  Traits                                             0
  Classes                                          183
    Abstract Classes                                13 (7.10%)
    Concrete Classes                               170 (92.90%)
  Methods                                         1515
    Scope
      Non-Static Methods                          1430 (94.39%)
      Static Methods                                85 (5.61%)
    Visibility
      Public Methods                              1353 (89.31%)
      Non-Public Methods                           162 (10.69%)
  Functions                                         25
    Named Functions                                  0 (0.00%)
    Anonymous Functions                             25 (100.00%)
  Constants                                         75
    Global Constants                                 0 (0.00%)
    Class Constants                                 75 (100.00%)
PHPLoc results for Guzzle3

This gives multiple info on the code health and quality, it gives us rough ideas on the size, comments existence, complexity of the code, length of classes (to me this is the most important one, as long classes are often a chore to read and understand).

If you start seeing a lot of static classes/methods, or global constants/variable you may have to deal with a project that relies on magic and a stateful behavior. It will be maybe on the first things you'll have to change, both for security and development purposes.

Having a lot of interfaces may indicates developers intended to have a lot of swappable objects that would indicate they tended to follow a SOLID practice, which is always a good news.

Here we can see the project isn't too big and can be easily taken on.

Exakat

Exakat is a tool that I like very much as it can gives a ton of info by analyzing the code base in an isolated environment. It does its best to find anything on any given code. It has many tools embedded. I would also recommend this tool if you're looking at keeping tracks of the technical debt.

Following the documentation to get it ready:

$ docker pull exakat/exakat
$ mkdir projects
$ docker run -it -v $(pwd)/projects:/usr/src/exakat/projects --rm --name my-exakat exakat/exakat exakat init -p guzzle3 -R https://github.com/guzzle/guzzle3.git
$ docker run -it -v $(pwd)/projects:/usr/src/exakat/projects --rm --name my-exakat exakat/exakat exakat project -p guzzle3
Init Exakat for Guzzle3

The last command will take a while. From experience between platforms and if you use a native install or docker or vagrant, Exakat is quite unstable. This is definitely a plus to have, but I wouldn't spend too much time on it if it doesn't work for you.

After ~5 minutes of run for me, an HTML report gets generated in projects/guzzle3/report. It will give an index.html file and some assets. For convenience I use the PHP local server to serve it as some cross scripting issue may occur by just trying to open the file. You may use whatever you want to achieve that (python, go, apache, etc):

$ php -S locahost:8500
Exakat default dashboard

Exakat will give you a comprehensive list of what was found: code issues, performances, security, compatibility for PHP versions.

You should know that the default report (Ambassador) created on the first run is the most complete one, if you want more specific report to be generated and/or different format to be processed in a CI pipeline for instance, you may want to take a look at all the different reports available. I also invite you to tweak the config file to suit your needs.

You'll find also some metrics that we found with PHPLoc. Exakat also comes with an extensive documentation on each error it thinks it finds and with the contextual code which helps a lot.

In and all Exakat is a really nice tool if you want to have a deeper look into a PHP code base. You can even generate incremental reports, it is very useful when you run this in a CI pipeline to get an idea on your project progression. Be careful though, for big code base it may require a lot of CPU/RAM resources and time. Here for 25k+ LoC it took around 5 minutes for my machine, but for a 500k+ project I had to inspect once, it took over 3 hours.

Do note that you may have to install extensions for Exakat for it to not find false positives as well, especially if you use any framework in your project.

Unit Tests

Next the thing I will try to look at is tests. Tests are really important to a project, not only we know we can change the files and be assured we are still provinding the same outputs with the same inputs, but it also gives an idea on how to use the classes. And when  we add the pieces together, it can gives us a rough idea on how to the project even.

Unfortunately, most of the time when we take over a legacy, chances are that you won't find any tests, or well writen tests.

Any configuration files

Finally, it would look at any configuration files present in the project root folder. It can be anything from .ini to .yml to .xml files. Those files indicate presence of meta tools for the project. They help gives us an idea on what were the standards back then, if any. It might help us (given we can run them) to understand the build process or the development process.

Conclusion

To have plan when taking on big project is really needed because if we are not careful enough the rabbit hole can be endless. This is particularly true for legacy projects, it is usually hard to know where to start, what are the tasks to take on and in what order. Getting an idea of what we will be face against is very helpful, not only for you but also for people around that you will need to communicate with.

The two tools presented here will help you to get that and elaborate an accurate strategy. Looking at thousand of files manually isn't the best strategy, you'd have to be really lucky or very experimented to know what you're looking for. Those tools helps us to find that, generate lists, and metrics.

Then snooping around for the most important tool: unit tests. It gives us an idea on how to use the code which is documentation. Then you can look around for additional meta tools.