Development of a performance model for classifying broiler farms

Broiler meat is the second world's most consumed meat, and the increase in consumption by 2027 is forecasted to be near 35 kg/capita/year. Brazil ranks third in broiler production globally and is the world's largest exporter of chicken meat. To reach proper rearing conditions, broiler farms need to meet good practices of husbandry and welfare. The present study aimed to develop a performance classification model using data mining to evaluate broiler farmers based on detailed flock housing and performance information. The input dataset from 49 broiler farms from a cooperative in Northeastern Brazil was organized with details on the housing characteristics, rearing environment, management, and performance data from flocks. We also added the cooperative technical classification retrieved from the housing conditions and the production index. The input classification had weights attributed to each housing feature. The output variable (target) was defined as the performance classification (PC) index. The dataset was processed using Rapidminer software using 80% of training and 20% for implementing the random forest algorithm. The prominent variables in classifying the performance were the feed conversion, the daily weight gain, the productivity index, and the cooperative classification criteria. The developed model pointed out a way to auto-classify farms and allow the cooperative to evaluate the farmers' production based on the broiler production and management practices. It was possible to create 'If-Then' rules that enable appropriate decisionmaking by broiler farmers to comply with good practices' norms.


INTRODUCTION
Consumption of chicken meat has increased worldwide in recent years, and consumers have become more demanding regarding this product's quality and safety. The broiler meat supply chain varies between countries and regions, presenting significant structural disparities and inequalities in product quality and market access (Carron et al., 2017). Performance and health data are rec-orded regularly in the broiler farms, and there is a trend towards increasing data volume since farmers adopt new recording equipment (Van Hertem et al., 2016. Nowadays, many technologies can potentially help broiler farmers in real-time monitoring of flocks, improving the recording of events, and making possible the forecast of health issues (Van Hertem et al., 2016).
Brazil is the third country in broiler produc-tion and is the world's largest chicken meat exporter (ABPA, 2020). The structure of the broiler supply chain in Brazil is quite simple. The farmer has technical help to build the housing and receives the one-day chick and feed ration to growing the birds until slaughter age, usually 42 days. Such arrangement is generally followed by the enterprise or cooperative registering information on their flocks' growth and mortality (Baracho et al., 2019;Caldas et al., 2021;De Jong and Van Riel, 2020). However, the profile of broiler farmers changes from one region to another. While some broiler farmers adopt modern technology, others simply do the basics to profit from broiler meat production. Good production practices (GPP) lead to essential health security and adequate housing conditions (Menezes et al., 2010;Baracho et al., 2019). Among the good practices in broiler production, thermal comfort during rearing is very important (Bhadauria et al., 2017). Irreversible economic losses may result from sudden changes in ambient temperature and relative humidity and improper adjustment of these variables. Broilers have a short life cycle and are particularly sensitive to hot weather (Mendes and Patricio, 2004;Moretti et al., 2020). The organizational framework of the broiler supply chain in Brazil follows the integration pattern. However, despite the promising outlook associated with the integration system, integrating companies (or cooperatives) still use governance mechanisms via hierarchy for broiler production. Some farmers acquire and maintain new facilities for raising birds (Caldas et al., 2021). Software resources and computational modeling have been increasingly used to investigate mitigation solutions for poultry management. Depending on the needs and complexities of the activity, modeling in conjunction with analytical or semi-empirical methods is used to develop new GPP procedures and methods (Nam and Han, 2016).
Data mining is the procedure of discovering information from a specified set of known data. The technique uses mathematical analysis to obtain patterns and trends in data (Witten et al., 2017). Such a method has been applied in predictive medicine (Asri et al., 2016;Chen et al., 2018;Dos Santos et al., 2019), animal production (Morotta et al., 2017, cybersecurity (Van der Walt et al., 2018), and risk management (Er Kara et al., 2020). Although the use of GPP is already known, using the data mining approach to solve classification problems and the prediction of non-linear results is relatively new (Karim and Rahman, 2013;Zhang et al., 2018). According to data mining methodology, the analysis was divided into five distinct phases: 1) Understanding the domain of knowledge to which the study refers; 2) Knowledge and understanding of the database for this domain; 3) Data preparation (cleaning, construction, selection, integration, and formatting); 4) Modeling; 5) Evaluation of results, allowing for reconsiderations and cyclical reassessments (Chapman et al., 2000). The trees' output format is characterized by a relatively fast construction process and straightforward interpretation of a final model (Lee et al., 2012). They are based on an 'If-Then' rules approach to learning from a set of independent observations. Random forest is a machine learning method that might be used to develop prediction models. It provides higher accuracy than a single decision tree model while preserving the qualities of tree models (Speiser et al., 2019). The benefit in prediction modeling is often to ascertain the main predictors that should be included in a more straightforward model. This task can be accomplished by identifying the optimal predictors for variable selection (such as model accuracy). The development of prediction models using variable selection may reduce data collection and enhance prediction efficiency.
The present study aimed to develop a data mining model for predicting broiler farms' performance classification based on data from the broiler production and management practices' characteristics and using a random forest approach.

Broiler farm scoring system
The study analyzed the data obtained from a poultry cooperative (and integration business framework) located in Teresina, State of Piaui, Brazil, which operates with members from the metropolitan region. It is one of the third largest cooperatives in the state and slaughtered nearly 130,000 broilers/day. It has 49 members classified according to the technological standard adopted according to attributes and scores established by the cooperative. Table 1 shows the used parameters for classifying farmers according to the technical level. A scoring system was adopted to achieve the farm classes. The score was obtained using a ranking based on the traits (mortality, conversion rate, quality, and health of the flock) and the elements of the rearing conditions (age of the building, conditions of feeders and drinkers, presence of fans, conditions of curtains, presence of roof lining, roof overhang, biosecurity conditions, fogging system, condition of the silo, and presence and condition of water reservoir). Each variable had a specific weight based on the impact that variable had on the farm's final profit. The weight attribute was given based on the good practices adopted by the broiler farmer, as suggested by Menezes et al. (2010). Such an approach is established by Santini and Pigatto (2008), who indicates that it is essential to adopt technologies for automatic control of temperature, humidity, water, and feed supply to increase the feed conversion coefficient of chickens that directly interfere in the broiler's development. Classes of broiler farm are detailed in Table 1.
To incorporate performance indicators, we added the data on the production index (PI, Eq. 1) where PI = productivity index, %; DWG = daily weight gain, kg; F = feasibility index (%); and FC = feed conversion.
The feasibility index (F) is the percentage difference between the birds housed and those removed for slaughter. The feasibility index is related to birds' mortality when F is up to 0.8% in the first week and 0.5% per week after being accepted as normal (Mendes and Patrício, 2004).
Feed conversion is the product of the division of feed consumption by the flock's total weight in the removal of birds (Eq. 2). The feed consumed by the dead birds is accounted for, so the higher the mortality, the worse the feed conversion.
where AFC = average broiler feed consumption; AWG = average weight gain per broiler Figure 1 shows the schematic of the broiler production supply chain and the studied limitation section.

Data Mining Approach
The data were organized into a spreadsheet containing the variables to be processed as attributes: (1) the farm's identification, (2) the respec- Total 51 Scores were given based on the technological level and good practices norms (Menezes et al., 2010;and Santini and Pigatto, 2008). (1) (2) tive score, (3) the classification regarding the technical level, and the data on the flocks' performance (4) average weight gain, and (5) average feed conversion. The target was the performance classification (PC), integrating all variables' effects in the final results. Figure 2 shows the schematic of the processing and data analysis. We used the range of productivity proposed by Caldas et al. (2021) to discretize the target based on the mixed flock possible performance classification (target). The arrangement is described in Table 2. The database was tabulated in a spreadsheet and processed using RapidMiner ® software (Mierswa and Klinkenberg, 2018). The confusion matrix was analyzed to get the model accuracy (Eq. 3) using the PC. The kappa statistic, which considers the model's inter-reliability, is defined in Eq. 4. As two measurements agree only at the chance level, the value of kappa is zero. When the two measurements fully agree, the value of kappa is 1.0.   Table 3 shows the descriptive statistics of the attributes analyzed. The target (PC) distribution is the outcome of the studied broiler farms (n=51). The 'Very high' class represents 6% of the farms, 'High' represents 41% of the broiler's farms, 'Average' refers to 26% of farmers, 'Low' represents 25%, and 'Very low' refers to 2% of the farms.

Data Mining Results
Data were processed using the random forest algorithm, and we obtained trees with an accuracy of 81.8% and kappa=0.747. Each tree generates a prediction by following the branches of the tree following the splitting rules. Class predictions are based on most examples, while estimations are obtained by reaching a leaf's average values. The resulting model is a polling model (of all random tree output) since all predictions are judged similarly important and based on samples' sub-sets (Speiser et al., 2019).
Three decision trees (Figures 3 to 5) were selected based on the importance of the main attribute in broiler production performance Figure 2. Schematic of the data recording, processing, and analysis to reach the performance classification. The scores were given based on Santini and Pigatto (2008) and Menezes et al. (2010). FC= feed conversion (efficiency); PI = productivity index, %; and DWG = daily weight gain, kg.  ( Baracho et al., 2019;De Jong and Van Riel, 2020). Figure 3 presents a tree where the main attribute was feed conversion (FC). If the FC is ≤ 1.753, then PI needs to be checked. If PI is ≤ 333.874, then PC is 'High,' representing 22.5% of the total sample. If PI > 333.874, then PC is 'Very high,' representing 5% of the total sample. In this case, the PC is identified in two instances, while in the way that FC > 1.73, PC is identifiable as a function of PI and DWG, and results are shown after three instances. However, in the second instance, PI represents 72.5% of the total samples. Figure 4 describes a tree where the main attribute was the production index (PI). If the PI is ≤ 301.249, then PC is 'Low,' representing 27.5% of the sample. If PI is > 301.249, then The DWG needs to be checked. If DWG > 61.293, then the PC is 'Very high' representing 5% of the sample and in a second instance. If DWG ≤ 61.293, then the score needs to be checked. If the score ≤ 3692.5, then PC is 'Average,' expressing 12.5% of the sample and in a second instance. If the score > 3692.5, then one needs to check PI. If PI > 313.946, then PC is 'High" reflecting 45.0% of the samples. If PI the score ≤ 313.946, then PC is 'Average,' describing 10.0% of the samples. Both predictions in a third instance. Figure 5 presents the tree using the overall classification of rearing conditions (A, B, C, D, and E) as a basis for the performance classification.
If the farm is classified initially as A, then one has to check the PI. If the PI ≤ 315.57, then PC is 'Average", representing 7.5% of the samples; otherwise, PC is 'Very high,' representing 5.0% of the samples. This is a weak branch (10.0%) and points out the PC in just one instance. If the classification is B, then one has to check the PI. If the PI ≤ 334.426, then PC is 'High,' representing 15.0% of the samples; otherwise, PC is 'Very high,' representing 5.0% of the Figure 5. Selected tree to predict the performance classification of broiler farm using the preliminary classification as the primary attribute. PI = productivity index, %; Score= a number representing the farm's technology level; and DWG = daily weight gain, kg.
samples. Both branches point out a PC after one instance, and this lat branch signifies 20.0% of the samples. As for classification C (35.0% of the samples), it is a larger branch, and if PI ≤ 296.349, then PC is 'Low,' indicating 27.5% of the samples. If PI < 296.349, then the score needs to be checked. If the score > 4219.250, then PC is 'High,' indicating 15.0% of the samples. On the other hand, if the score ≤ 4219.250, then the score needs to be re-checked. If the score > 4051.75, then PC is 'Average' (5.0%). If the score ≤ 4051.75, then PC is 'High,' signifying 10.0% of the samples. If the broiler farm classification is D, then the score needs to be checked. If the score > 3803.5, then the PC is 'Low.' If the score is ≤ 3803.5, then the PC is 'Average.' This branch is 7.5 and 12.5%, respectively. For the farm classification E, if the score is > 2462.5, then PC is 'Average' (7.5%). If the score ≤ 2462.5, then PC is 'High,' representing 7.5% of the samples.
The independent variables used were related to the rearing environment and management and broiler performance. Although the studied cooperative of broiler farmers is still limited to the local market, it represents nearly 20% of the broiler meat market of Teresina, which is the largest city in the state. It is one of the largest slaughter companies in the state (130.000 broiler/day).
Like other regions, the broiler production farms in the studied region attempt to improve applied technology in environment control. The cooperative members mostly use the intensive system of broiler rearing, whose objective is to guarantee a more significant number of birds per area, being more interesting for the producer due to the possibility of greater financial return, as mentioned by Paulino et al. (2019). Such initiative allows a higher flock density maintaining welfare patterns and resulting in greater economic viability.
The results suggest that the cooperative's present broiler farm technical classification method relies mainly on the rearing conditions and does not effectively assess the overall broiler farm performance. The proposed model takes into consideration the feed conversion and daily gain into account. The tree presented in Figure 3 shows that feed conversion plays a critical role in the farm performance classification, as previously pointed out by Mendes and Patricio (2004). However, Borotto and Freitas (2020) enhance that housing conditions might play a crucial part in determining productivity. In the proposed index, most samples are classified considering the relation to PI and DWG (Baracho et al., 2019), mainly when the FC is below 1.753. As for DWG's importance, Figure 4 indicates that for the studied farms (72.5%), when the daily gain is higher than 61.293, the productivity is 'High.' That tree' branch presents the minority classes that represents the most considerable portion of the farms, concerning the productivity index. Such results agree with the current literature (Moro et al., 2005;Rodrigues et al., 2011); however, the influence of the rearing conditions and health status might not be neglected (Lee et al., 2012;Bhadauria et al., 2017;De Jong and Van Riel, 2020).
The resulting tree ( Figure 5) suggests that the minority classes of C, D, and E, which aggregates most farms with low technical scores in rearing conditions, might be able to reach a 'High' performance index. This result agrees with Pinotti and Paulillo (2006). They identified some existing factors that are not aligned with the management and structures of governance in the organization of agents in broiler chains. Some management decisions are probably taken by the producers beyond the rearing conditions, which improve their output.
Beyond the appropriate productivity index, meeting the 2030 United Nations (UN) agenda is desirable. A unique characteristic is its broad view of the necessary means of implementation, which expands traditional financing for development to include new ways of facilitating least developed regions' access to markets, technology, capacity development, and policy support (Neven, 2014). The competitiveness of broiler farms under tropical climates depends mostly on governance and management (Rushton, 2009;Caldas et al., 2020). Nevertheless, health issues and animal welfare requirements must comply with consumers' concerns (Menezes et al., 2010;Brito et al., 2020;De Jong and Van Riel, 2020). Therefore, a productivity index focused mainly on the rearing environment's variables might not be the only straightforward solution.

CONCLUSION
Three models were developed to predict the performance classification of broiler farms. The first was based on broiler feed conversion, productivity index, and daily weight gain. The second was built on the productivity index, the farm score related to the farm's technology level, and broiler daily weight gain. The third was based on the farm productivity index, the farm score, and the broiler daily weight gain. The proposed data mining models for assessing the performance classification allow farmers to autoclassify themselves and permit the cooperative to evaluate their production based on the broiler production and management practices. The simple rules model also allows self-auditing ensuring the associates the possibility of constant monitoring and possible upgrade as farmers increase their production prospects and targets.