🖋️
This article was contributed by Rizky Satrio, R&D Product Manager at ITSEC Asia.

Introduction

According to Wikipedia, Digital signature (DS) is a mathematical scheme for verifying the authenticity of digital messages or documents. It has been used as a non-repudiation tool.

The type of digital signature used in PDF varies from a message digest to another type of signature (CMS Advanced Electronic Signatures or CAdES). This article will explain more about how to validate the digital signature inside a PDF file based on two following problem statements:

  • Digital signature validation on PDF
  • Type of signature: CAdES Detached, PKCS7 Detached or ETSI.RFC3161

Digital signature in PDF

If you open a PDF file in a text editor, you will find structures with tags in them, similar to an xml file. The digital signature is placed inside the tag of type Sig (Signature Dictionary). Try to find this kind of tag /Type /Sig in a PDF file. Below is an example:

Figure 1. Example of Digital Signature Inside PDF

There you will also see a Filter and SubFilter tag. For further explanation of the value in this tag, you can refer to section 12.8.1 of the PDF 32000-1:2008 document. Looking further down in the Sig dictionary you will find a Contents tag. Inside this tag is the Base64 value of the digital signature. The example below shows a CAdES signature inside the Contents tag. Note also the ByteRange tag, we will also use this in the digital signature validation process.

Figure 2. Example of digital signature contents tag

Digital signature validation

We will focus on PAdES-B signature on PDF (/SubFilter type ETSI.CAdES.detached). It basically uses a CAdES-detached signature placed in the Contents tag. The validation steps are shown in the figure below.

Figure 3. PDF Signature Validation Process

Here are the steps:

  1. Parse the content in the /Contents tag in CMS (Cryptography Message Syntax)
  2. Verify that the signature in CMS is valid (compare it to the signed attribute in CMS, using the certificate in the signerInfo type).
  3. If valid, get the message digest from the signed attribute inside CMS. Also get the digest algorithm used (SHA1,SHA256,etc)
  4. Calculate the actual message digest inside the PDF (using the byterange and digest algorithm from #3)
  5. Compare no(3) and no(4), if it is valid then the digital signature value is valid

Note that the steps above do not deal with digital certificate validation, CRL checking or OCSP checking. This will be covered in another article. So let's break down the above steps into Java code. The actual code can be found on my github. We will be using 2 external libraries: pdfbox and bouncycastle.

1-Parsing CMS content inside the /Contents tag

First open the pdf and get the signature with this line of code:

ByteArrayInputStream pdfBytes=new ByteArrayInputStream(
    Files.readAllBytes(Paths.get(pdfFile.getAbsolutePath())));

pdfDoc=PDDocument.load(pdfFile);

pdfDoc.getSignatureDictionaries().forEach(signature-> {
try {
    //Get PKCS#7 Data
    CMSSignedData signedData=new CMSSignedData(signature.getContents());
}

OpenPdf.java

Then get the /Contents tag inside the signature:

//Get PKCS#7 Data
CMSSignedData signedData=new CMSSignedData(signature.getContents());

2-Verify CMS Signature is valid

Firstly, we acquired the signerInfo within the CMS:

//Get SignerInfo
SignerInformation signerInfo=signedData.getSignerInfos().iterator().next();

Then we acquired the public key inside CMS:

//Getting PublicKey        
Collection<X509CertificateHolder> matches = signedData.getCertificates().getMatches(signerInfo.getSID());
byte[] pubByte=matches.iterator().next().getSubjectPublicKeyInfo().getEncoded();

X509EncodedKeySpec keySpec=new X509EncodedKeySpec(pubByte);
KeyFactory kf = KeyFactory.getInstance("RSA");
PublicKey pubKey=kf.generatePublic(keySpec);

GetCMSPublicKey.java

Then we acquired the signature algorithm:

if(signerInfo.getEncryptionAlgOID().trim().equals("1.2.840.113549.1.1.1")) {
            encAlgo="RSA";
 }

if(encAlgo!=null)   {
if(digest.getAlgorithm().equals("1.3.14.3.2.26")) {
    encAlgo="SHA1withRSA";                
}
else if(digest.getAlgorithm().equals("2.16.840.1.101.3.4.2.1"))    {
    encAlgo="SHA256withRSA";  
}
else if(digest.getAlgorithm().equals("2.16.840.1.101.3.4.2.2"))    {
    encAlgo="SHA384withRSA";  
}
else if(digest.getAlgorithm().equals("2.16.840.1.101.3.4.2.3"))    {
    encAlgo="SHA512withRSA";  
}

GetSignatureAlgorithm.java

We then check the validity of the signature within the CMS:

Signature rsaSign=Signature.getInstance(encAlgo);       
rsaSign.initVerify(pubKey);
rsaSign.update(signerInfo.getEncodedSignedAttributes());
boolean cmsSignatureValid=rsaSign.verify(signerInfo.getSignature());

3-Get the message digest algorithm and message digest data inside CMS

MessageDigest digest=MessageDigest.getInstance(signerInfo.getDigestAlgOID());

//Get Attribute
Attribute attribute1 =signerInfo.getSignedAttributes().get(PKCSObjectIdentifiers.pkcs_9_at_messageDigest);
Attribute attribute2=null;

if(signerInfo.getUnsignedAttributes()!=null) {
    attribute2 =signerInfo.getUnsignedAttributes().get(PKCSObjectIdentifiers.id_aa_signatureTimeStampToken);
}

messageDigest=Base64.getEncoder().encodeToString(
Hex.decode(attribute1.getAttributeValues()[0].toString().substring(1)));

MDAlgorithm.java

4-Calculate the Message Digest in PDF

First, we calculate the byte range of the PDF.

byte[] contentToSigned=getByteRangeData(pdfBytes, signature.getByteRange());

private  byte[] getByteRangeData(ByteArrayInputStream bis,int[] byteRange)    {
    int length1=byteRange[1]+byteRange[3];
    byte[] contentSigned=new byte[length1];
    bis.skip(byteRange[0]);
    bis.read(contentSigned, 0, byteRange[1]);
    bis.skip(byteRange[2]-byteRange[1]-byteRange[0]);
    bis.read(contentSigned, byteRange[1], byteRange[3]);
    bis.reset();
    return contentSigned;

}

CalculateMDPdf.java

We then calculate the Message Digest on the PDF.

//Calculate MD in PDF
String mdPdf=Base64.getEncoder().encodeToString(digest.digest(contentToSigned));

5-Compare the message digest from CMS and the calculation in PDF

If it is the same, the signature is valid. If it is not the same, the signature is not valid.

if(mdPdf.equals(messageDigest)) {
    logApp.info("Message Digest Signature ID {} is valid, data integrity is OK",signatureSID);
}
else    {
    logApp.info("Message Digest Signature ID {} is invalid, data integrity is NOT OK",signatureSID);
}

Conclusion

Basically, what we are discussing in this blog is a very simple example of digital signature validation within a PDF file. We hope that this simple example will be enough to provide a starting point for understanding how validation works, and also how digital signatures in PDF work. If you have any questions or suggestions, please leave your comments below.

References:

Share this post