Frequently Asked Questions
How do I get started on the platform?
Automating extraction of data using the platform involves the following process.
Creating a project
Naming the types of tables you want to extract (for example, Balance Sheet, Income Statement, etc.)
Uploading documents to label
Assigning human labellers to the project and
Training the system using the platform
Each of these steps are elaborated below.
How do I create a project?
When you log in to the platform, you will land on the ‘Projects’ page. You will see a list of projects created already in the grid.
To create a new labelling project, click on the blue + icon.
You will now be prompted to enter the details of the project.
Project Name – Enter the name of the project
Input – Select the way you will be making documents available to label. The current options are ‘Direct Upload’ and ‘AWS’. The option ‘AWS’ helps you channel your documents in AWS to the project to label.
Once you enter the fields, save your project using the ‘Save’ button. A new project will now be created.
You could now upload the documents to be processed using the ‘Upload Files’ button. If you want the documents to be uploaded from your AWS, enter your AWS URL, user name and password.
When the upload of files is complete, define the tables that you want to extract for your use-case from the uploaded documents. Every table in the document that you extract should carry a name indicating what the table represents. Click on ‘Add Processor’.
You will be presented with the following window.
Extract – Enter the name of the table you want to extract
Instructions – Enter instructions to labellers on finding and extracting the table, if any.
Color – Every table can be assigned a certain color for users to rapidly and more efficiently locate tables. Click on the checkerboard icon to the left in the dropdown and assign a color.
Repeat the process for all tables that you are interested in extracting for your use-case.
Once you have added all relevant tables for the project, save the project. The project is now ready to be executed.
How do I start extracting data?
In order to extract data, you need to train the platform. Users can contribute to the training of the platform by generating labelled/annotated training sets using the platform. Here is how you could do that.
On the ‘Projects’ page, click on the project you want to train data extraction models for.
You will now see a list of documents that are available to train in the grid.
Click on a document that you want to use to train. The document will now open in a PDF viewer.
Scroll to the table you want labelled. Now, select the name of the table in the ‘Extractions:’ dropdown.
This tells the platform that you are about to label the ‘Financial Highlights’ table.
Now, click on the ‘Shape tool’ icon on the far right of the screen.
Draw a box around the relevant table.
Performing this action labels the selected table as ‘Financial Highlights’.
How do I see the parsed data from the selected table?
Right-click on the selected table border and click on ‘Edit’.
You will be taken to a page that displays the table contents parsed into a grid.
I see the data parsed from my table, but the data does not look clean. What should I do?
You can clean the parsed data using a number of controls on the form. Here are a few clean-up functions explained.
To remove unwanted rows, select the rows you want deleted using the checkbox on every row and click on the ‘Delete’ button.
To change the name of the columns so they reflect the column names of the table, right-click on the column and go to ‘Change Header’.
If a column from the table is missed in the extraction, right-click on the columns and click ‘Add Column’ to add the missing column.
If the grid has any unwanted columns, right-click on the columns and select ‘Delete Column’ to remove the columns.
To change the extracted items and values, click on the cell and make updates.
To add a missing row, click on the ‘Clone Record’ button for a new row to be added.
I have cleaned the table. What do I do now?
Click on ‘Save’ to save your work.
Your table is now ready. You can now close the window. When you are done labelling all tables you want in a document, click on ‘Publish’ at the top left of the screen.
All tables labelled will now be published and will be part of the output generated for this document.
Can I download the data from a document into an Excel file for me to preview?
Yes. Click on the ‘Download’ button.
Data from the document will now be downloaded in an Excel file.
I have a question that this FAQ does not answer. How can I get help?
Please contact firstname.lastname@example.org with your questions for support.