DRC Kinshasa 28th of November 2011. Elections Day, Voting Day and Ballots counting. MONUSCO / Myriam Asmani

How do we verify Journalists? (exactly)

To answer a question that has come up a few times.

A required part of what Newslinn is about is creating a network/apps/tools specifically for journalists – this is at the heart of Newslinn ie. improve communication coming and going from citizens and organisations to journalists.

This means that journalists are our core user. They get to post, promote and initiate chat messaging etc.

This means that when a “journalist” signs up – we need to verify who they are, if they are a journalist and to also build the features the enable them to be ‘crowd-verified’.

Newslinn implements a ‘4 star’ verification system.

Level 1 / Star 1

When a journalist initially signs up, we collect the typical meta data that is available during normal web browsing activity (IP Address, User Agent, Time, Day, duration of visit).

We also ask each journalist to fill out our sign up form – this (currently) works off their Twitter account and also asks for their mobile phone number and email address.

Both mobile phone and email address are ‘verified’ as part of the sign up process for journalists. A pin code is sms’d to the phone number and an email is sent to the email address.

Both need to be confirmed by the journalist for the journalist to receive a 1 star.

We have futures features planned for this sign up process, including sync’ing with linkedin, uploading of photographic certs of journalism degrees et. al. most of which will be 100% optional for the journalist.

When someone uses Twitter to sign up to a website (Newslinn or any other) – Twitter as standard shares a lot of data with the third part (how many followers, when the account was created et. al) – this is used as part of the validation for journalists. When a journalists Twitter account is created only recently, Newslinn gets an email alert.

The end point here – is that everything that Newslinn can do technically and automatically can and will contribute to the 1 star verification. This is by no means 100% fool proof or scam proof but it requires direct interest in the person to go through the onboardng process – including sharing the mobile phone number, email address and IP address (albeit automatically).

Part of Star 1 is a whitelisting the domain name for verification. ie. if someone signs up using a work email address from their newsroom (which we have on our whitelist) – this stands to credit as opposed to using a yahoo or gmail email address.

Level 2 / Star 2

The second star gets attributed to journalists that have been manually approved by someone internally in Newslinn. This means that someone in here has looked at their application, searched on social networks for that person (for LinkedIn or alternative profiles) and checked to see if that user has an internet footprint. We also use FullContact API to see more information. Ultimately this is a review of what was sent in by the journalist. In most cases its an easy process as journalists tend to have a strong social network footprint.

Level 3 / Star 3

This star is currently theoretical as we have not yet built the systems. This star is going to be a rating system based on external users/journalists upvoting or verifying a journalist. ie. people external outside of Newslinn that recognise a journalist as ‘a journalist’. This will be approching the concept of “quality” over “verification” which is something we want to be sensitive about. When we develop this system we will be working with journalists on our Slack chat to work through potential factors to use.

Level 4 / Star 4 – PTSD

Outside of how ‘verified’ or ‘true’ a person is as being a journalist – there is a huge responsbility for Newslinn to protect journalists from PTSD – Posttraumatic stress disorder. Regardless of whether or not a journalist is ‘verified’ as being a journalist – they may or may not have the support systems or processes involved in managing PTSD when managing photos being shared of war or stress. The Newslinn level 4 star is restricted mostly to Newsrooms that support and have support for their journalists for PTSD. Although we label this as a ‘star’ – it’s more of a restriction than a verification level.

Why do this?

Initially when Newslinn was being developed – we needed to verify people – forgetting about if that person is a journalist or not. Newslinn is not a social network – we are a veritical network – designed to serve a purpose – hence, we don’t want random free email accounts and people to join and create free accounts without purpose. So to that effect we needed to verify people signing up. Verifying a mobile phone, email and IP address works very well for this.

Beyond that (ie. beyond the level 1 star) – citizens and organisations want the control to be able to share with journalists they feel they can trust. Sometimes this means ‘newsroom’ journalists – ie. those employed in a newsroom – sometimes it means journalists with X years of experience – other times it means those they can trust with sensitive material. Our purpose is to faciliate citizens and organisations communicating with groups of jouranlists that can be classified under a level of trust – by doing this – incredible features can be built on top of this trust system. That being of someone wanting to share something deep and sensitive with only a select few ‘types’ of journalists – and purposefully journalists they don’t know – this is something social networks can’t achieve. In this situation a person would need to build contact list of journalists, connect with them (or harvest their email) and only then be in a position to share with them. This is ineffective at best. There needs to be better ways for citizens and organisations to share with journalists – nin particular with citizens whereby there exists a use case of community whistle blowing and the need to share with only ‘newsroom’ journalists – but the catch 22 is that the citizen doesn’t know and isn’t aware of which journalist they can and should share with.

This is the deeper reason why Newslinn exists – to solve the unkown unkown. People can spend time searching on Twitter or Linkedin for journalists – to then connect and follow them – or they can have confidence in sharing with groups of journalists – that they decide. Newslinn does enable 1-to-1 communication – but our key benefit is that a citizen/organisation can share without directly knowing who the journalist is – and instead group journalists and discover new journalists before sharing – and ultimately enable news to be shared with much more journalists and much more easily than before.

blue-car-no-compression_diff

Practical JPEG Error Level Analysis

At the heart of the Newslinn platform is image validation. One aspect we are researching is ELA, Error Level Analysis. This is specific to the JPEG image file format and a niche area for fraud image detection.

I’m going to attempt to describe ELA and how Newslinn is hoping to use it within our platform. We will need to cover some ground first….

What’s a JPEG?

So a JPEG is a type of file format for saving photographs. There’s a lot of history to it, but in short, it was designed for the internet, designed purposefully for storing photographs and is the main file format that digital camera and smart phones use to store photos.

A JPEG file is a really cool file format – because it’s cool, it does things that other file formats don’t.

As a side note, other image file formats are GIF, PNG, BMP, TIFF.

JPEG Compression

The JPEG file format was really made for the internet back when the internet was “low bandwidth”. So everything on the internet needed small in file size and so the JPEG file format had a feature of ‘compression’ so that when a photograph was stored as a JPEG file, the file size of that was much smaller than if it was stored as another image file format, eg. a Bitmap (BMP) file.

So for this photograph

original

JPEG File Size = 377 KB

BMP File Size = 5,343 KB

The JPEG file format is able to do that file size reduction because it uses compression.

How does JPEG Compression work?

So for the JPEG file format to compress a photo, it splits up the photo into tiny squares of 8×8 pixels. When a JPEG is saved with a low compression you can see the tiny squares.

Capture

What it does is quite genius.

To lower the file size, the JPEG file format reduces the amount of colours in the photograph. So say the original example photograph contains about 45,265 individual unique colours, the JPEG file format, reduces those to only 20,000 colours.

How it goes about that is connected with these tiny 8×8 square. Ultimately, the JPEG file format takes one of the 8×8 squares, figures out how many colours are in it, decides what colour is the average and uses that average colour to replace other colours in the 8×8 square. Thus reducing the amount of colors.

So that’s all well and good. It’s easy enough to understand how a JPEG file format makes a photographic image file size small.

It is this compression method that enables ‘Error Level Analysis’ on the JPEG file format.

What is Error Level Analysis (ELA)?

Error Level Analysis is a way to see what areas of a photograph have been changed.

So if someone took a photo with their smartphone, opened it up in Photoshop and changed something about that photo – Error Level Analysis is a way to try and detect what was changed.

It’s not an exact science yet but it’s useful to bring about suspicion if nothing else.

How does ELA work?

ELA works because of how JPEG compresses photographs into 8×8 tiny squares – and it works because each time a JPEG image is saved, it gets compressed again.

So that is where the magic is.

So when a JPEG photograph is first saved, it compresses the photo for the first time.

If the image is then opened into Photoshop, edited and saved again as a JPEG, it gets compressed again.

What this means is that the “original” parts of the photographic image have been compressed twice – once by the camera that took the photo and again by Photoshop.

Whereas, the “edited” part of the photographic image, was only compressed once, by Photoshop.

To the human eye, you can not notice the difference by looking at the image. However, you can comparing the two images together and looking at the differences.

This is the basis of JPEG Error Level Analysis.

Practical Example

original

Original JPEG Photograph, saved as a JPEG, “compressed once”

 

blue-car-no-compression

Edited JPEG photograph, “compressed twice”

(The editing of the car is crude, but just go with it, imagine it was perfect)

original_diff_smaller

ELA on original image

 

blue-car-no-compression_diff_smaller

ELA on edited image

 

Exact Art

While ELA isn’t an exact science, it’s a useful tool to add into the mix for fraud verification. It still requires a trained eye as the resulting ELA images can produce a wide range of variations that might trigger a level of suspicion.

But combing that with other factors for verification it can be quite interesting – this is what the Newslinn platform does.

Tips

If the edited photograph moves around parts of the image instead of overlaying a new image. Then is is very hard to detect, as the compression levels are all the same.

car-direction

Edited image, car direction changed

2car-direction_diff

ELA of edited photo

The same can be said if the image is air brush and part of it removed.

airbrushed

Edited image, car removed

2airbrushed_diff

ELA of image with car removed

 

  1. ELA can compliment existing verification techniques
  2. ELA has problems when it comes to images that have high contrast colours beside each other.
  3. Understanding the output of ELA takes time and experience
  4. ELA is an interactive verification technique, it’s not enough to just look at a single static image, you need to be able to adjust settings and see output in real-time.
  5. You can get around ELA once you know how to.
  6. JPEG files that have not been ‘compressed’ when being saved will not work.
  7. If something gets removed from an image and replaced with another part of the same image, this will be very hard to detect.

Other things similar to ELA

I’ll write up later on on Edge Detection and Histogram Analysis, Air Brush Detection three other ways to investigate an image to see what might have changed.

Some Python Code

Let’s finish this off with some code.

code

If you want access to our actual code base / shared github just get in touch contact at newslinn dot com.

 

Capture

Dropbox + Twitter = Newslinn Profile

We have made available a new feature for journalists that are part of our LinkedIn Early Access Group

Dropbox + Twitter = Newslinn Profile

It enables a journalist to receive photos from anyone directly into their own organised photo library – without having to expose their direct email address, manage FTP details, create shared google doc/dropbox etc.

Users still go through two factor mobile authentication (if they haven’t already) – and photos are validated in real-time – however we don’t restrict photos as much when they are shared directly with a journalist (our rule engine becomes much more like a spam filter).

The Newslinn Profile allows for better photo classification, photo captioning and tagging people that are in the photos – because of which there are more steps involved to make this happen – contrast that to the Newslinn ‘city’ email address where sharing photos is immediate and easily accessible.

Example Newslinn Profile

You can see an example profile here – Newslinn Profiles are Sync’d with your Twitter profile so that there is consistency

Example:
www.newslinn.com/shanedevane

Short URL Example:
nws.li/shanedevane

Next Steps

There is still some fine-tuning, testing and more features to add onto our Newslinn Profile.

Screenshots

Capture

Newslinn Profile

photo search

Newslinn Photo Dashboard “Journalist Inbox”

meta

Photo Meta Data as part of ‘Photo Page’

gps

GPS and mapping data as part of ‘Photo Page’

22222

Supporting Protests, Activists and Local Citizens

First off – what is Newslinn?

Newslinn is a really easy way for someone to take a photo with their phone and share it with multiple journalists in Dublin by using a simple email. There’s more to it than that, but that’s the bones of it (what newslinn doesour short history, validation technology).

So how do you ‘support’ protesters exactly?

We’ve made this completely free for anyone that’s part of a protest or that wants to share news photos of events happening in their community. The tools we’re building allow you to see which newsrooms are viewing your photos and why they are using them. There are no smartphone apps to download and no social network profile requirements – so you can share photos without first having to make them public on a social network – this means you can share protest photos directly with journalists to support the protesters even if you’re not directly involved in the cause.

How can I use it?

Using your smartphone, take a photo and email it to dublin@newslinn.com

Include any information you want in the subject line and body of the email (the more information the better).

Behind the scenes your photo and email address will get validated automatically and then be presented in our Photo Stream for journalists to see and use immediately. By sharing photos using Newslinn you are agreeing to let journalists download, use and write about your photo.

Send a Test Photo

You can see how things work by sending a test photo. This photo will appear in the Newslinn Photo Stream ‘test’ area that journalists can access (yes, journalists will see your test photo).

1. Using your smartphone, take a photo
(most people just take a photo from out the window or a pet etc.)

2. Email it to dublin@newslinn.com using the normal email app on your phone

3. Subject line can be ‘test’

4. Body of the email can be ‘test’

After a few minutes you will get an automatic email reply that will guide you through the Newslinn validation system and you’ll be able to see how things are working. Later on, you’ll be able to login and see who viewed your photos and what’s happening with them.

Which journalists are using you?

We have about 45+ Dublin based journalists as part of our early-access program see our LinkedIn Early Access Group, they are from a range of newspapers including The Irish Times, Irish Independent and many freelance journalists; and the list is steadily growing.

Do you sell my photos? Do you make money from my photos?

We don’t sell photos. Our mission is to build a free and open platform that facilitates trusted communication with journalists using photos – enabling local communities to report and share what’s happening with journalists in their city.

If you do want to sell photos you should read our blog post on how to sell photos.

SMALLER mockDrop_reading the news

What We Do – ‘The Condensed Version’

What

We make it easy for protesters, activists, advocates and anyone in the local community to share news photos in real-time with multiple journalists in their city using nothing more than a simple email.

To compare ourselves to industry players – we are trying to be an open version of the likes of CNN iReportGuardian Witness or Chicago Tribune Community – but as a platform that’s open, built from the ground up on our validation technology and free for trusted journalists.

How

By building a trust-based platform that’s free for journalists and local communities to share newsworthy photos – built on the idea of ‘trust and validation first’ – and making this platform really easy and simple for anyone to use.

Why

Our mission is to enable anyone around the World to communicate in real-time with journalists and to make it easier for a single journalist to manage more crowd-powered news from any topic or industry.

Future

We will be advancing our technology to work in low-band internet areas while still being easily-accessible by email, mms or web. Our research is focusing on ways to use technology to enable communication across all levels of devices while still providing trust and validation for journalists. 

We’re also researching user anonymity and validation technology for sensitive news photos.

IMAG0049

Quick History of Newslinn

Newslinn initially started in October 2014 when we began researching the idea – it started with the technology for photo validation – and grew into the vision of making a trusted open crowd-reporting platform for local communities and journalists.

Our research lead us to applying machine learning concepts into UGC (User Generated Content) photo-validation and this lead us to our first patent.

After surveying and talking to over 130+ journalists in Ireland we set about creating our initial prototype based on their thoughts, feedback and patience.

We tested a live prototype in early January 2015 – under the name of ‘On The Spot Photos’.

Since then we have grown from a research project into a ‘social’ research project and we’re beginning the next part of the research in partnership with one of Enterprise Ireland’s registered knowledge providers (news about this later).

Over the coming months we’ll be working to further develop our technology with an aim to launch a startup in early 2016.

Timeline

  • Started customer development in October 2014
  • Started initial technical research in December 2014
  • Attended DublinBeta in December 2014
  • Launched first prototype in January 2015
  • Got early stage users in February 2015
  • Began work on version 0.1 in March 2015
  • Filed first patent in June 2015
  • Incorporated in June 2015
  • Launched early version 0.1 in August 2015
  • Started early beta trials in September 2015
full journ image show

Sell your Videos to Newspapers

…without giving away any commission!

At the heart of Newslinn’s mission is to build the technology to facilitate trusted news communication using photos. Because of that, we don’t stand in the way of people wanting to sell video.

This is a bit counter-intuitive – especially if you compare Newslinn to existing video footage ‘marketplaces’ – whereby you submit your video and they try to sell it to the media. Giving you a percentage from the cut of what they take in (some of them say it’s 50:50 split, but most cover their ‘costs of business’ before splitting comission, so it’s more like 70:30).

Our business model is very different.

What this means – is that as long as you are willing to share screenshots of your video – ie. share photos – for use by journalists and bloggers then what you decide to do with the original video is up to you.

What does this really mean?

Providing that you are happy to upload screenshots of your video for use by journalists – you can stipulate that you are also selling the original video within the description of the material you are uploading. This means that you deal with the media agencies on your own terms.

Alternatively you can upload your video to YouTube, and you share video screenshots with us – but you stipulate that you want ‘link attribution’ – which means if a journalist or blogger wishes to use your photos, they are required to link to the original video.

This works best of course when you use a short URL service that links to your YouTube hosted video.

Newslinn Social Project

To answer a common question, Newslinn as of September 2015, is a social project. What this means is that we’ve gone past our ‘research phase’ and we’ve actually built something! A lot of people played a role to make Newslinn what is now – this was based on talking to many journalists, surveying writers and photographers and face-to-face meetings with a handful of editor.

So it means we’re onto the next phase of our research – which is to do with applying and learning what our technology can do and what value it brings to those using it – the next 3 months will be very important for us. It will define where we go and how we are going to get there – and hopefully identify the community of people that will join us and benefit from Newslinn.

Our ultimate aim is to become a start-up – we’re not there yet as there is more experimenting to be done and more learning to do.

engine

Our UGC Technology

A lot of people have asked – and challenged us – in what we mean when we say ‘proprietary UGC technology’. What do we really mean and have we really developed something of worth or are we just a glorified online photo album??

It’s hard to answer this in the correct frame – as everyone I know, data scientists, developers, investors and journalists have all asked it and they each have their own dept of technical ability.

Regardless, I’ll start at the top and work down

Artificial Intelligence System

At the heart of our system is a something called a ‘rule engine’ or ‘expert system’ (wikipedia here). Rule engines tend to exist in systems whereby many factors come into play in making a decision(s), typical examples for applying rule engines are in medical diagnostics and in financial lending fields. ie. areas whereby numerous pieces of data are combined to form a final decision.

What they are exactly is a clever way to store and manage many different predefined decisions in a single place.

As a basic high level example of a decision or ‘rule’ to use the correct term – Newslinn has a rule whereby if the image is too small – it is not accepted into our photo stream. So that’s a very simple rule and very easy to program.

But rules can get far more complex and involved an infinite about of data points. As another example (without going too extreme). Below is an actual rule we currently use – this validates the dimensions, the ‘dpi’ of a photo and also if the user is a known user to us.

Capture

Example Rule in PyKE

This is an example of a single rule – currently we have over 23 rules (as of September 2015) and growing, our aim is to have about 70-100 rules. Each rule caters for a particular use case, and for us declines or accepts photos. The rules can be as creative and inventive as possible and use a mixture data points from user internet devices, user session data, image data, photo metadata, photo object data et al.

What the rule engine allows us to do, is create an infinite amount of decisions, tie them together, and management them in a really efficient way.

There are 2 majors challenges with rule engines.

1) gathering all the little data points you need – we address this using our own proprietary image data extraction system which ranges from basic metadata collection to ELA and face detection.

2) identifying the rules you need ahead of time – this is the magic of the rule engine. It needs ‘experts’ to understand the domain and the application of the rule engine in the system. This is what we actively do each time we talk to journalists or analyse what photos people are sharing – and it’s the focus of our ‘Dublin Social Project’.

Machine Learning System

The Rule Engine aids in fraud measures and validation of photos – sitting outside of the rule engine is our machine learning application this is part of the validation workflow but can be thought of as separate from the rule engine.

So while the rule engine manages fraud, the machine learning system manages ‘classification’ of whether or not a photo is ‘more like’ a ‘good’ photo or if a photo is ‘more like’ a ‘bad’ photo.

To do this we’re using something called ‘supervised classification’ – which is a top level term for a method of grouping things together based on comparing it with other things that you already know about (wikipedia here).

This is part of our further research into the application of machine learning into news validation and UGC – so I’ll go into it further in another post.

We are currently experimenting with Sci-Kit learn SGD Classifier and Linear SVC.

Exact Technologies we use

Python, PyKE, Sci-Kit Learn, Django, PostgreSQL, PHP, MySQL, Celery, Redis, Mongodb, Nodejs, Socket.io, Javascript, Jquery, HTML, Css, Less, Vagrant, Jenkins, S3, EC2, Ubuntu, Bash, Supervisor.

800 mobile journo usage

Online Beta Testers

We are looking for public beta testers to take part in our first public beta tests for Newslinn, this is open for participants over the next two months (until January 2016).

This will require sharing a photo and logging into the site.

You can also see our progress on LaunchSky and our CEO is doing an AMA on Reddit for those interested in posting a question.

Feedback Requirements

Feedback given can be given on any of the topics below – whichever is your strongest area. In particular we are looking for ‘honest first impression’ feedback. UX, Overall concept, Web Design, A feature on the site, Business Model.

Feedback will be on our LinkedIn Beta testing group.

Apply for Public Beta Test

Public beta testing is open to anyone. If you want to join our public beta testing you will need to first join our LinkedIn Beta testing group those accepted will be able to participate in testing.

Irish Postcard – Token of Appreciation

As a small token of appreciation for public beta testers we will send a postcard from Ireland to anywhere in the World with a custom message.

Unfortunately we can’t offer a selection of postcards to pick from but will do our best if you are looking for something in particular.