Before we're doing any work and start migrating the project, we will need to have a good idea with what we are dealing:
- PHP versions?
- Dependencies? What versions?
- Is there a framework?
- Existing unit tests?
- Were the developer before us following the best practices?
Let's be frank, most of the time when we take over a legacy application there won't be any documentation for us, it would have been great and all, but unfortunately if such was existing, I would bet the application wouldn't have decayed to be in the state it is right now.
Now, it would be easy to blame on the previous developers, that they were all idiots and didn't know what they were doing. But who can honestly say they always coded the perfect application without any errors or failures, all documented, all tested within the deadlines? It's very tempting when reading legacy application to think:
What were they thinking? They don't know how to write code *ugh*
But unless you know exactly what the developers went through (maybe there was conflicts in the project management, maybe there was a shortage of man power, maybe the market was so volatile that the product needed to shift many times to allow to company to survive, etc), you can't judge or blame.
But rejoice, now are better times! You can migrate this big ol' monolith to brighter futures.
To get a better idea of what the project is, there a few things to look at first. Maybe you just got the source files without the dev environment or just an access to a git repository without any
First things first
The first things I would be looking at would be if there are any
composer.lock files. The first one should give a ton of infos, the second one would gives us more details if needed, if let's say the
composer.json is a bit weak.
composer.lock would give us exactly the what the application is depending on and which versions. But the
composer.lock file is a bit tedious to read, so that's why I suggest to go over the
composer.json file. Depending on how rigorous the previous dev you may found out several things:
- PHP version restrictions
- Any frameworks dependencies (now Composer has been around long enough for dev to use it whenever they needed a framework) (hopefully for you)
- Any tests
- If the developer were using or not tools in the
With all those info you should have a really good idea of what's going on here.
If the project is really old, and there are no
composer files, it will be a little bit trickier, you may have to look for custom scripts or any
vendor2, etc folders.
Another great find would be to find the point of entrance of the application, any
public/index.php. Or maybe even a
main.php if previous developer had a strong C or Java influence.
If you start seeing
.php5 extension file, be careful it may be a very old projects where the developers had to support PHP4 and maybe upgraded to PHP5. If it is the case I would recommend a different strategy for migrating the application as it might be simply too much.
In order to show you examples and explain things in a more concrete way I'll make use of an old famous project: Guzzle 3 (it has been deprecated, the last meaningful commit is from Apr 29, 2015). The rest of the series will be based on it.
A quick and handy tool is PHPLoc and gives away some little details about the project, they might be details but they hold a lot of knowledge.
To install it follow the instructions:
$ wget https://phar.phpunit.de/phploc.phar $ chmod +x phploc.phar $ mv phploc.phar /usr/local/bin/phploc
As results for Guzzle3
This gives multiple info on the code health and quality, it gives us rough ideas on the size, comments existence, complexity of the code, length of classes (to me this is the most important one, as long classes are often a chore to read and understand).
If you start seeing a lot of static classes/methods, or global constants/variable you may have to deal with a project that relies on magic and a stateful behavior. It will be maybe on the first things you'll have to change, both for security and development purposes.
Having a lot of interfaces may indicates developers intended to have a lot of swappable objects that would indicate they tended to follow a SOLID practice, which is always a good news.
Here we can see the project isn't too big and can be easily taken on.
Exakat is a tool that I like very much as it can gives a ton of info by analyzing the code base in an isolated environment. It does its best to find anything on any given code. It has many tools embedded. I would also recommend this tool if you're looking at keeping tracks of the technical debt.
Following the documentation to get it ready:
The last command will take a while. From experience between platforms and if you use a native install or docker or vagrant, Exakat is quite unstable. This is definitely a plus to have, but I wouldn't spend too much time on it if it doesn't work for you.
After ~5 minutes of run for me, an HTML report gets generated in
projects/guzzle3/report. It will give an
index.html file and some assets. For convenience I use the PHP local server to serve it as some cross scripting issue may occur by just trying to open the file. You may use whatever you want to achieve that (python, go, apache, etc):
$ php -S locahost:8500
Exakat will give you a comprehensive list of what was found: code issues, performances, security, compatibility for PHP versions.
You should know that the default report (Ambassador) created on the first run is the most complete one, if you want more specific report to be generated and/or different format to be processed in a CI pipeline for instance, you may want to take a look at all the different reports available. I also invite you to tweak the config file to suit your needs.
You'll find also some metrics that we found with PHPLoc. Exakat also comes with an extensive documentation on each error it thinks it finds and with the contextual code which helps a lot.
In and all Exakat is a really nice tool if you want to have a deeper look into a PHP code base. You can even generate incremental reports, it is very useful when you run this in a CI pipeline to get an idea on your project progression. Be careful though, for big code base it may require a lot of CPU/RAM resources and time. Here for 25k+ LoC it took around 5 minutes for my machine, but for a 500k+ project I had to inspect once, it took over 3 hours.
Do note that you may have to install extensions for Exakat for it to not find false positives as well, especially if you use any framework in your project.
Next the thing I will try to look at is tests. Tests are really important to a project, not only we know we can change the files and be assured we are still provinding the same outputs with the same inputs, but it also gives an idea on how to use the classes. And when we add the pieces together, it can gives us a rough idea on how to the project even.
Unfortunately, most of the time when we take over a legacy, chances are that you won't find any tests, or well writen tests.
Any configuration files
Finally, it would look at any configuration files present in the project root folder. It can be anything from
.xml files. Those files indicate presence of meta tools for the project. They help gives us an idea on what were the standards back then, if any. It might help us (given we can run them) to understand the build process or the development process.
To have plan when taking on big project is really needed because if we are not careful enough the rabbit hole can be endless. This is particularly true for legacy projects, it is usually hard to know where to start, what are the tasks to take on and in what order. Getting an idea of what we will be face against is very helpful, not only for you but also for people around that you will need to communicate with.
The two tools presented here will help you to get that and elaborate an accurate strategy. Looking at thousand of files manually isn't the best strategy, you'd have to be really lucky or very experimented to know what you're looking for. Those tools helps us to find that, generate lists, and metrics.
Then snooping around for the most important tool: unit tests. It gives us an idea on how to use the code which is documentation. Then you can look around for additional meta tools.