At the heart of the Newslinn platform is image validation. One aspect we are researching is ELA, Error Level Analysis. This is specific to the JPEG image file format and a niche area for fraud image detection.
I’m going to attempt to describe ELA and how Newslinn is hoping to use it within our platform. We will need to cover some ground first….
What’s a JPEG?
So a JPEG is a type of file format for saving photographs. There’s a lot of history to it, but in short, it was designed for the internet, designed purposefully for storing photographs and is the main file format that digital camera and smart phones use to store photos.
A JPEG file is a really cool file format – because it’s cool, it does things that other file formats don’t.
As a side note, other image file formats are GIF, PNG, BMP, TIFF.
The JPEG file format was really made for the internet back when the internet was “low bandwidth”. So everything on the internet needed small in file size and so the JPEG file format had a feature of ‘compression’ so that when a photograph was stored as a JPEG file, the file size of that was much smaller than if it was stored as another image file format, eg. a Bitmap (BMP) file.
So for this photograph
JPEG File Size = 377 KB
BMP File Size = 5,343 KB
The JPEG file format is able to do that file size reduction because it uses compression.
How does JPEG Compression work?
So for the JPEG file format to compress a photo, it splits up the photo into tiny squares of 8×8 pixels. When a JPEG is saved with a low compression you can see the tiny squares.
What it does is quite genius.
To lower the file size, the JPEG file format reduces the amount of colours in the photograph. So say the original example photograph contains about 45,265 individual unique colours, the JPEG file format, reduces those to only 20,000 colours.
How it goes about that is connected with these tiny 8×8 square. Ultimately, the JPEG file format takes one of the 8×8 squares, figures out how many colours are in it, decides what colour is the average and uses that average colour to replace other colours in the 8×8 square. Thus reducing the amount of colors.
So that’s all well and good. It’s easy enough to understand how a JPEG file format makes a photographic image file size small.
It is this compression method that enables ‘Error Level Analysis’ on the JPEG file format.
What is Error Level Analysis (ELA)?
Error Level Analysis is a way to see what areas of a photograph have been changed.
So if someone took a photo with their smartphone, opened it up in Photoshop and changed something about that photo – Error Level Analysis is a way to try and detect what was changed.
It’s not an exact science yet but it’s useful to bring about suspicion if nothing else.
How does ELA work?
ELA works because of how JPEG compresses photographs into 8×8 tiny squares – and it works because each time a JPEG image is saved, it gets compressed again.
So that is where the magic is.
So when a JPEG photograph is first saved, it compresses the photo for the first time.
If the image is then opened into Photoshop, edited and saved again as a JPEG, it gets compressed again.
What this means is that the “original” parts of the photographic image have been compressed twice – once by the camera that took the photo and again by Photoshop.
Whereas, the “edited” part of the photographic image, was only compressed once, by Photoshop.
To the human eye, you can not notice the difference by looking at the image. However, you can comparing the two images together and looking at the differences.
This is the basis of JPEG Error Level Analysis.
(The editing of the car is crude, but just go with it, imagine it was perfect)
While ELA isn’t an exact science, it’s a useful tool to add into the mix for fraud verification. It still requires a trained eye as the resulting ELA images can produce a wide range of variations that might trigger a level of suspicion.
But combing that with other factors for verification it can be quite interesting – this is what the Newslinn platform does.
If the edited photograph moves around parts of the image instead of overlaying a new image. Then is is very hard to detect, as the compression levels are all the same.
The same can be said if the image is air brush and part of it removed.
- ELA can compliment existing verification techniques
- ELA has problems when it comes to images that have high contrast colours beside each other.
- Understanding the output of ELA takes time and experience
- ELA is an interactive verification technique, it’s not enough to just look at a single static image, you need to be able to adjust settings and see output in real-time.
- You can get around ELA once you know how to.
- JPEG files that have not been ‘compressed’ when being saved will not work.
- If something gets removed from an image and replaced with another part of the same image, this will be very hard to detect.
Other things similar to ELA
I’ll write up later on on Edge Detection and Histogram Analysis, Air Brush Detection three other ways to investigate an image to see what might have changed.
Some Python Code
Let’s finish this off with some code.
If you want access to our actual code base / shared github just get in touch contact at newslinn dot com.