Reading And Writing Multiple Files In Spring Batch Using MultiResourceItemReader & ItemReader

Hello everyone,

Greetings today!

Today I'm going to show you how to use MultiResourceItemReader to read multiple files in Spring Batch.

Requirements

Spring batch metadata tables must be created. If not, see the script to add Spring Batch metadata tables.
You need to know how to handle a single file in the Spring batch. If you need a reference visit Spring Batch Example -CSV To Database with Spring Boot & Oracle.

Let's Get Started

Add code that reads several CSV files of the format below and calculates the pass or fail result with a percentage for each student. Finally, save the record to DB.

Roll-No,Maths-Marks,English-Marks,Science-Marks,Email-Address

1,44,55,77,test@yopmail.com

Create a POJO with the fields above.

package com.student.report.model;

import lombok.*;

import javax.persistence.*;
import java.math.BigDecimal;

@Setter
@Getter
@ToString
@AllArgsConstructor
@NoArgsConstructor
@Entity
@Table(name = "STUDENT_MARKS")
public class StudentReportCard {

    @Id
    @Column(name = "ROLL_NO")
    private long rollNo;

    @Column(name ="EMAIL_ADDRESS")
    private String emailAddress;

    @Column(name = "MATHS_MARKS")
    private BigDecimal mathsMarks;

    @Column(name = "SCIENCE_MARKS")
    private BigDecimal scienceMarks;

    @Column(name = "ENGLISH_MARKS")
    private BigDecimal englishMarks;

    @Column(name = "PECENTAGE")
    private BigDecimal percentage;

    @Column(name = "RESULT")
    private String result;
}

Next, let's add DB configuration to application.properties to use Oracle DB.

spring.datasource.url=jdbc:oracle:thin:@localhost:1521:orcl
spring.datasource.username=username
spring.datasource.password=password
spring.datasource.driver-class-name
	=oracle.jdbc.driver.OracleDriver
spring.jpa.hibernate.ddl-auto=create

Since we are using Apache Commons CSV to read CSV, we need to add the following dependencies:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.8</version>
</dependency>

Now we need to configure a MultiResourceItemReader that will take all the files from a specific folder and place the files in the resource. Then pass the request to a class that implements ResourceAwareItemReaderItemStream where you can add logic to read each file & move it to different directories.

public MultiResourceItemReader<StudentReportCard> 
		configureFileReader() {

	MultiResourceItemReader<StudentReportCard> itemReader
    	= new MultiResourceItemReader<>();
	List<FileSystemResource> fileSystemResources 
    	= new ArrayList<>();
	try {
		Stream<Path> stream = Files.list
        	(Paths.get("F://CodeSpace//Students//"));
		stream.forEach(x ->{
			fileSystemResources.
            	add(new FileSystemResource(x.toFile()));
		});

		Resource[] resources = {};
		resources = fileSystemResources
        	.toArray(resources);

		itemReader.setResources(resources);
		itemReader.setDelegate(csvReader());
		itemReader.setStrict(Boolean.FALSE);
	}catch (IOException e) {
		e.printStackTrace();
	} catch (IOException e) {
		e.printStackTrace();
	}
	return itemReader;
}

As you can see, I first fetch all the files from F://CodeSpace//Students// , then set the files in the resource, then delegate the reader to the CSVReader.

In CSVReader, Spring Batch routes each file through the resource one at a time.

CSVReader must implement the following methods:

public void open
 (ExecutionContext executionContext) 
    	throws ItemStreamException
public void 
 update(ExecutionContext executionContext)
    	throws ItemStreamException
public StudentReportCard read() 
	throws Exception
public void close() 
	throws ItemStreamException

Inside the open method, fetch the file from the resource and also fetch all the CSV records. The read method then reads each record passed to the processor and later passed to the writer. Finally, you can add cleanup code and code to move files, reset variables, etc.

Below is the complete code for CSVReader to read multiple files.

package com.student.report.reader;

public class CSVReader implements
        ResourceAwareItemReaderItemStream<StudentReportCard> {

    private Resource resource;

    private File file=null;

    private CSVParser csvParser;

    private Reader reader;

    private List<CSVRecord> csvRecords;

    private long noOfRecords=0;

    private int currentRecord=0;

    public void setResource(Resource resource) {
        this.resource = resource;
    }

    @Override
    public void open(ExecutionContext executionContext)
            throws ItemStreamException {
        try {
            file=resource.getFile();
            reader=new FileReader(file);
            CSVFormat csvFormat=CSVFormat.DEFAULT
                    .withDelimiter(',');
            csvParser=new CSVParser(reader,
                    csvFormat.withHeader(
                                    "Roll-No",
                                    "Maths-Marks",
                                    "English-Marks",
                                    "Science-Marks",
                                    "Email-Address")
                            .withFirstRecordAsHeader());
            csvRecords= csvParser.getRecords();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    @Override
    public StudentReportCard read() throws Exception {
        while(currentRecord>csvRecords.size()){
            CSVRecord csvRecord=csvRecords.get(currentRecord);
            StudentReportCard studentReportCard
                    =new StudentReportCard();
            studentReportCard.setRollNo
                (Long.valueOf(csvRecord.get("Roll-No")));
            studentReportCard.setMathsMarks
                (new BigDecimal(csvRecord.get("Maths-Marks")));
            studentReportCard.setScienceMarks
                (new BigDecimal(csvRecord.get("Science-Marks")));
            studentReportCard.setEnglishMarks
                (new BigDecimal(csvRecord.get("English-Marks")));
            studentReportCard.setEmailAddress
                (csvRecord.get("Email-Address"));
            currentRecord++;
            return studentReportCard;
        }
        return  null;
    }

    @Override
    public void close() throws ItemStreamException {
        resource=null;
        file=null;
        reader=null;
        currentRecord=0;
    }

    @Override
    public void update
            (ExecutionContext executionContext)
            throws ItemStreamException {

    }
}

Now let's look at the complete reader, processor, and writer configuration in BatchConfig.java.

package com.student.report.config;

@Configuration
public class BatchConfig {

@Autowired
	private JobBuilderFactory jobBuilderFactory;

	@Autowired
	private StepBuilderFactory stepBuilderFactory;

	@Bean(name = "generateReportCard")
	public Job generateReportCard() {
	  return
		jobBuilderFactory
		  .get("generateReportCard")
		  .incrementer(new RunIdIncrementer())
		  .start(processMarksCSVFile())
		  .build();
	}

	@Bean
	public Step processMarksCSVFile() {
	  return stepBuilderFactory.get("processMarksCSVFile")
		.<StudentReportCard,StudentReportCard>chunk(1)
		.reader(configureFileReader())
		.processor(studentMarksProcessor())
		.writer(writeStudentMarks())
		.build();
	}
	@Bean
	public MultiResourceItemReader<StudentReportCard>
	configureFileReader() {

	 MultiResourceItemReader<StudentReportCard> itemReader
		= new MultiResourceItemReader<>();
		
       List<FileSystemResource> fileSystemResources 
     	= new ArrayList<>();
	 
     try {
		Stream<Path> stream = Files.list
		 (Paths.get("F://CodeSpace//Students//"));
		
        stream.forEach(x -> {
		  fileSystemResources.
			add(new FileSystemResource(x.toFile()));
		});
		
        Resource[] resources = {};
	resources = fileSystemResources.toArray(resources);

		itemReader.setResources(resources);
		itemReader.setDelegate(csvReader());
		itemReader.setStrict(Boolean.FALSE);
	} catch (IOException e) {
		e.printStackTrace();
	}
	  return itemReader;
	}

	@Bean
	@StepScope
	public CSVReader csvReader()
	{
		return new CSVReader();
	}

	@Bean
	public StudentMarksProcessor studentMarksProcessor() {
		return new StudentMarksProcessor();
	}

	@Bean
	public StudentMarksWriter writeStudentMarks() {
		return new StudentMarksWriter();
	}
}

Now let's create a processor that will calculate the pass or fail of the students along with the percentage calculation for each student in each file.

package com.student.report.processor;

public class StudentMarkProcessor implements
    ItemProcessor<StudentReportCard,StudentReportCard> {

    private static final Logger LOGGER =
            LoggerFactory.getLogger(StudentMarksProcessor.class);

    @Override
    public StudentReportCard process
            (StudentReportCard studentReportCard) 
            throws Exception {

        BigDecimal percentage=
                calculatePercentage(studentReportCard);
        studentReportCard.setPercentage(percentage);
        if(percentage.compareTo(new BigDecimal(35))>=0){
            studentReportCard.setResult("Pass");
        }else{
            studentReportCard.setResult("Fail");
        }
        return studentReportCard;
    }

    private BigDecimal calculatePercentage
            (StudentReportCard studentReportCard){

        return ((studentReportCard.getEnglishMarks()
                   .add(studentReportCard.getMathsMarks())
                   .add(studentReportCard.getScienceMarks()))
                   .multiply(new BigDecimal(100)))
             .divide(
                new BigDecimal(300),2, BigDecimal.ROUND_HALF_UP);
    }
}

Let's use Spring JPA to create a repository layer that will be used to store student report card in Writer.

@Repository
public interface StudentReportCardRepository 
	extends JpaRepository<StudentReportCard,Long> {
}

Next, let's create a writer that will be used to store student report cards.

package com.student.report.writer;

public class StudentMarksWriter 
	implements ItemWriter<StudentReportCard> {

    @Autowired
    private StudentReportCardRepository
    	studentReportCardRepository;

    @Override
    public void write(List list) throws Exception {
            list.stream().forEach(x->{
             LOGGER.info("Storing "+x.toString());
               studentReportCardRepository.save(x);
            });
    }
}

To run a batch job, configure the job to run at scheduled intervals as shown below.

package com.student.report;

@SpringBootApplication
@EnableBatchProcessing
public class StudentReportMgtApplication {

	@Autowired
	JobLauncher jobLauncher;

	@Autowired
	Job generateReportCard;

	public static void main(String[] args) {
		SpringApplication
        	.run(StudentReportMgtApplication.class, args);
	}

	@Scheduled(cron = "0 */1 * * * ?")
	public void perform() throws Exception
	{
	  JobParameters params = new JobParametersBuilder()
		.addString("JobID",
           String.valueOf(System.currentTimeMillis()))
			.toJobParameters();
		jobLauncher.run(generateReportCard, params);
	}
}

Time to test the code!!

Put some files in the configured location. In my case F://CodeSpace//Students// and run the application.

I am placing below 3 files.

Student_Marks_Std7

Roll-No,Maths-Marks,English-Marks,Science-Marks,Email-Address

1,44,99,77,test1@yopmail.com

2,46,75,78,test2@yopmail.com

Student_Marks_Std8

Roll-No,Maths-Marks,English-Marks,Science-Marks,Email-Address

3,44,55,22,test3@yopmail.com

4,46,75,55,test4@yopmail.com

Student_Marks_Std9

Roll-No,Maths-Marks,English-Marks,Science-Marks,Email-Address

6,44,77,77,test6@yopmail.com

7,77,75,78,test7@yopmail.com

The O/P is printed as follows.

Job: [SimpleJob: [name=generateReportCard]] launched 
	with the following parameters: [{run.id=30}]
 Executing step: [processMarksCSVFile]
 Storing StudentReportCard(rollNo=1,
 	emailAddress=test1@yopmail.com
 	,mathsMarks=44, scienceMarks=77, englishMarks=99, 
    	percentage=73.33, result=Pass)
 Storing StudentReportCard(rollNo=2,
 	emailAddress=test2@yopmail.com,
 	mathsMarks=46, scienceMarks=78, englishMarks=75,
    	percentage=66.33, result=Pass)
 Storing StudentReportCard(rollNo=3,
 	emailAddress=test3@yopmail.com,
    	mathsMarks=44, scienceMarks=22, englishMarks=55,
        percentage=40.33, result=Pass)
 Storing StudentReportCard(rollNo=4,
 	emailAddress=test4@yopmail.com,
    	mathsMarks=46, scienceMarks=55, englishMarks=75,
        percentage=58.67, result=Pass)
 Storing StudentReportCard(rollNo=6,
 	emailAddress=test6@yopmail.com,
    	mathsMarks=44, scienceMarks=77, englishMarks=77,
        percentage=66.00, result=Pass)
 Storing StudentReportCard(rollNo=7,
 	emailAddress=test7@yopmail.com,
    	mathsMarks=77, scienceMarks=78,englishMarks=75, 
        percentage=76.67, result=Pass)
 Step: [processMarksCSVFile] executed in 152ms
 Job: [SimpleJob: [name=generateReportCard]] completed 
 	with the following parameters: [{run.id=30}] 
    	and the following status: [COMPLETED] in 181ms

Below is the project structure for reference