14.7 Information Technology In The Indian Statistical System
14.7 Information Technology in The Indian Statistical System
Historical Background
14.7.1 Before the introduction of electronic computers in India, Unit Record Machines with 80-column Punched Cards as the medium of data-input were extensively in use during the late forties at the Indian Statistical Institute (ISI) for processing of survey data. The first-ever electronic computer in India - the Hollerith Electronic Computer Model 2M (HEC 2M) from the UK was installed for statistical use at ISI in 1949. Limited input-output capabilities of this and the next computer - the Russian URAL II installed at the ISI made them unsuitable for survey data processing. These were used mainly for complex calculations in other types of statistical applications. Use of computers in the special tabulation work of the NSS began at ISI in 1965 on the more versatile IBM 1401 computer system.
14.7.2 A Computer Centre was created in the Department of Statistics in 1967. The Computer Centre was initially equipped with a Honeywell-400 computer system. It was used for the processing of survey data and for providing computing support to various ministries and departments of the Government of India. Major jobs carried out by the Computer Centre have already been described.
14.7.3 Later, with the growth in demands on the use of Information Technology (IT) tools, the Government of India established the Department of Electronics (DoE) and the National Informatics Centre as part of DoE to cater to the needs of different ministries and departments of the Government. The nodal role of the Computer Centre was changed and this was to cater mainly to the need of Department of Statistics.
14.7.4 It must be admitted, however, that till the late 1990s, the processing of NSS data was not very successful. There were long delays in completing the tabulation of any round of the survey and the backlog went on accumulating. There are good reasons why it happened. Raw data as obtained by enumerators from the field usually have many defects. Before tabulation can start, these have to be subjected to detailed scrutiny and rectification, a process called "data cleaning". This is a very difficult and time-consuming task, and may take up more than 80 per cent of the total computer time. Deep knowledge of the subject of investigation is essential to make an effective plan for data processing and tabulation. Mere software skills are not enough. This lesson was learnt at great cost, when worried by the delays in completion of tabulation at the Computer Centre, the Governing Council of NSSO entrusted the work of processing of the data of the 35th Round of the NSS to the National Infromatics Centre of the Department of Electronics. Not only was the job completely bungled, the entire volume of data were totally lost in the process. Success came only later when it was realised that survey data processing to be successful would have to be done in-house by the statisticians conducting the survey.
14.7.5 Earlier, the DPD procured 108 data entry machines of type D20 and three small computers of model UPTRON S-1650 for data cleaning. The DPD was responsible for data transcription and cleaning on the above equipment – later on the newly-acquired Personal Computers (PCs) – and the final tabulation was done on the mainframe H-400 computer at Computer Centre. This was not much of a success mainly because of problems of communication between the Computer Centre and the DPD located at two distant places. This procedure continued up to 50th Round (survey period: July 1993-June 1994).
Current Status
14.7.6 A new approach was taken for the processing of NSS data from the 51st Round onwards. The entire responsibility was given to the DPD, relieving the Computer Centre of any responsibility in the matter. The Data Preparation Centres of DPD were to transcribe data from schedules in small batches and generate error reports using a set of in-house software developed by the Division. After manual cleaning of the data by the Data Preparation Centres, these were sent to the DPD (Headquarters) located at Kolkata for automated scrutiny and final tabulation. With this new approach of in-house tabulation of data by the DPD, backlog in tabulation and report writing work has now been completely wiped out by the NSSO.
14.7.7 Besides routine data processing, computers are used in sample selection and as a desktop printing device in preparing manuals, presentation of tables, etc.
14.7.8 With the advent of PCs, statistical computation in the Department of Statistics has been largely decentralised. From the reference year 1995-96, summary and detailed tabulation of ASI data are now done at the Industrial Statistics Wing of the CSO at Calcutta. The FOD carries out at Faridabad the processing of agricultural data collected by it under the Timely Reporting Scheme and the Improvement of Crop Statistics Scheme. The National Accounts Division (NAD) has its own system of PCs for compilation of national accounts.
14.7.9 One important step in the process of computerisation of survey data processing in NSSO is the introduction in 1995-96 of Palmtop computers for the collection of data. As an experimental measure, in the 52nd Round of NSS, socio-economic data were collected in Haryana directly on palmtop computers. The basic idea was to download the data collected on palmtop computers directly on to main computers for processing and thereby avoid the intermediate step of data entry. The palmtop computer had just two lines of display of 16 characters each, and a memory of only 64 Kilo Bytes. The FOD used these gadgets subsequently to collect field data under the scheme of Improvement of Crop Statistics (ICS), Middle Class Price Collection (MCPC), etc. These projects were planned in a hurry. Operational problems under difficult conditions in the field were not examined carefully. Limitations of hardware and software were not taken into account. The project was a total failure.
14.7.10 Wrongly diagnosing the problem as mainly one of limited capacity of the equipment, another experimental project was attempted in the 54th Round of the NSS, this time using more efficient Palmtop computers with larger memory. The data were collected from Orissa and Maharashtra using these newer Palmtop computers. However, with the total failure this time again, the project has apparently been shelved. The 700 or so pieces bought at a cost of more than 1.5 crores of rupees are lying unused and may not be even usable any longer. But, there is now a proposal to undertake a pilot study for using even better and much more expensive modern laptop computers with large memory and hard disk capacity for data collection by the field staff!
14.7.11 A great step of forward to meet user requirements was the adoption by the Government in 1999 of a National Policy on Data Dissemination. According to this policy, the Government is committed to supply the user, at marginal cost, unit level data, from all surveys after the expiry of three years from the completion of fieldwork or after the reports based on survey data are released, whichever is earlier. To protect the privacy of information, all identification particulars of the informant would be removed from the data before making these publicly available.
14.7.12 By the above policy, the Computer Centre has been entrusted with the responsibility of creation and maintenance of a National Data Warehouse of Official Statistics. Under this project, the Computer Centre will preserve data generated by various Central and State Government departments and public sector undertakings on electronic media, organise them in the form of databases and provide remote access facilities to end-users through a network. The Computer Centre has already initiated action for the creation of such a Warehouse. The Computer Centre has been preserving a large volume of data generated through various socio-economic surveys conducted by NSSO, Follow-up Enterprises Surveys by the MoS&PI, and Annual Surveys of Industries conducted by the CSO. These data are being disseminated regularly to a large number of national and international users on Floppy and Compact Disk (CD).
14.7.13 The Computer Centre has also been given the responsibility of creating and updating the website of the MoS&PI which is hosted by the National Informatics Centre. The site is being regularly updated.
14.7.14 Computers are being used in almost all the Central ministries, departments and organisations in one-way or other. The Directory of Statistics published by the CSO gives information on computerised databases maintained by various organisations at the Centre as well as the State Directorates of Economics and Statistics.
14.7.15 Several attempts were made by the NSSO to use the National Informatics Centre managed communication network (NICNET) for transmission of Core Items of Monthly Progress Report (MPR) and Middle Class Price Collection (MCPC) data. However it was found that in many cases data transmission through NICNET was not at all satisfactory. As a result, the system of sending filled in schedules to the CSO through the traditional postal service was continued and transmission of Core Items of MPRs and MCPC using NICNET was discontinued. Similarly, an attempt by the FOD to transmit ASI summary data through NICNET had also to be given up because of duplication or loss of data in transmission and failure to install revised versions of software in a number of Centres of National Informatics Centre.
14.7.16 During 1998, a decision was taken to install e-mail in all the 172 field offices of FOD, NSSO, through the network of National Informatics Centre. After a period of two years, the connection could be provided only in 116 offices. Even in the offices connected, the transmission of data was not very successful.
General observations on the use of Information Technology in the Statistical System of India.
14.7.17 Statistical data processing involves the following types of work:
Transcription of data from filled in schedules to a computer readable medium. This is labour-intensive error prone work;
Verification of transcription, in which the transcription operation is repeated and the copy is mechanically compared with the original (100 per cent verification is the norm in NSS);
Computerised check of internal consistency of the transcribed data; preparation of list of errors and their rectification by reference to the original document, or even by revisiting the informant;
Consolidation of data files and check of completeness of coverage;
Computerised check for missing or inconsistent data and replacing these by rule-based imputed values;
Calculation of weights;
Preparation and scrutiny of Tables;
Calculation of standard errors of estimates – optional;
Printing the Tables and Survey Report in appropriate lay out.
14.7.18 Currently by and large, in survey data processing, a system of flat files is in use. Imputation of missing or wrong values is done through cold deck methods.
14.7.19 Survey Design involves the following activities:
Preparation and Maintenance of Sampling Frame of First Stage Units;
Selection of First Stage Units according to the sampling design;
Choice of Sampling Design;
Design of Schedule or Questionnaire.
Computers are being or can be used in each of these activities.
14.7.20 At present, a standard sampling design is used – two or three-stage stratified sampling using a circular systematic method of selection with probability of inclusion proportional to size. The sample size is determined on the basis of availability of field investigators. There is a great scope for imaginative use of computers in improving the sampling design.
14.7.21 A large number of Statistical Software Packages are now available for sophisticated analytical work. Though some of these are available, official statisticians in India seldom use these. As a matter of fact, routine statistical work involves very little of technical computation.
14.7.22 A sound statistical system should ensure speedy transmission of information at different levels: field to the data preparation centres, data preparation centres to the main data processing centre, main data processing centre to the data warehouse and finally to users. At present only a few of the above offices are electronically connected through a communication network, either though the NICNET or through e-mail provided by various Internet Service Providers. Though networking of all important statistical offices is necessary, it need not be a dedicated one at the present.
14.7.23 A number of standard classifications like the National Industrial Classification (NIC), National Classification of Occupations (NCO), Indian Trade Classification based on Harmonised Commodity Description and Coding System {ITC(HS)}, Standard Classification of Diseases, etc. are presently in use. There is a need to develop a computerised system to facilitate searching the appropriate code of any classificatory variable easily with the help of some key words about it. The system needs to be made available on the Internet through website. Such an arrangement would help the potential users of Classification.
Deficiencies.
14.7.24 Though the NSSO has been conducting several rounds of surveys year after year, there has been no specific attempt to build up a specialist group for survey data processing – with deep knowledge not only of computer software and hardware but also of the subject matter of the surveys. On the contrary, the group that had worked hard to acquire the required skill and knowledge and was able to remove the long-standing backlog of unfinished tabulation of NSS data, was unceremoniously split up and transferred to areas where the skills acquired by them may not be of much use to the new organisation.
14.7.25 The system of processing State-level data is very weak. Inadequacy of hardware and software and availability of trained manpower is a problem in many of the States.
14.7.26 Acquiring expensive equipment and embarking upon a large-scale experiment without adequate examination of the pros and cons, as in the case of palmtop computers, is counterproductive and wasteful.
14.7.27 The department has often got to arrange consultation with specialists working in different places. This is done by arranging a meeting at a common place, which costs quite a lot in the form of travel expenses and wasted time.
14.7.28 In the past, the Government constituted ad hoc technical committees to recommend changes in the IT set up of its department. Delays in implementation usually meant that the recommendations were outdated by the time they were implemented.
14.7.29 According to the National Policy on Data Dissemination, the survey results and unit-level data should be made available to data users in India and abroad after the expiry of three years from the completion of the fieldwork or after reports based on survey data are released, whichever is earlier. Naturally, the unit-level data disseminated would be authentic in the sense that if any re-tabulation is done using these data, the new results would agree exactly with the corresponding published table. The Commission carried out an exercise to examine this. Table I of ASI 1995-96 Report was re-tabulated from the publicly available unit-level data. Large unexplainable discrepancies were found between the two. This raises serious question, whether the unit-level data supplied, or, the tables published in the report were wrong. This jeopardises the credibility of the statistical system. It brings out the importance of extreme care in as commonplace a task as preparation of statistical tables.
14.7.30 Understandably, the dissemination policy is likely to put the statistical system under considerable strain but in the larger interest of the user community, the Commission would urge the Government to continue it.
Recommendations.
14.7.31 The Commission makes the following recommendations:
The Government must develop and nurture expertise and skills in various areas of specialisation - statistical software being one of the most important amongst them. Training and transfer policies must be framed accordingly. Transferring specialist officials to positions, in which their specialised knowledge is of no use, is a waste. A software group consisting of systematically-trained officers in IT tools should be set up in the National Statistical Office, to meet all software requirements of the NSO. When in-house expertise and resources are not available, data processing or software development projects could be given to agencies of proven competence.
The area of application of computers should be widened to cover statistical modelling, forecasting, simulation, and other sophisticated “applicable” theoretical methods.
It is essential to establish strong communication links between:
The National Statistical Organisation (NSO) and all its subordinate offices,
The NSO and all Central Ministries with substantial statistical output,
The NSO and all State Directorates of Economics and Statistics (DESs),
The NSSO and its SDRD, DPD, FOD and CPD,
Headquarters of FOD, DPD and their respective subordinate offices,
State DESs and Statistics Divisions of the Departments.
These offices should be networked through one or more Internet Service Providers, and/or one or more Virtual Private Network. A dedicated computer network is neither necessary nor desirable and would not at all be cost effective.
Urgent steps must be taken to strengthen computer hardware and software systems in the State DESs.
To cut down travel expenses and waste of time, it would be more economic and convenient to go in for video conferencing facilities, which are comparatively inexpensive when held between a pair of participants.
Before investing in expensive sophisticated equipment, a feasibility study including cost-benefit analysis must be carried out. When the equipment are to be used by primary workers under field conditions, as in the case of palmtop or laptop computers, practical difficulties of maintenance, repair, local availability of consumables and the procedural problems of handing over expensive Government property to primary workers should be carefully examined.
Specifically in respect of palmtop and laptop computers, the Commission is of the view that these are not needed for collection of data in large-scale sample surveys at present except when information content is small. The large number of palmtop computers already purchased, if they are still serviceable, should be used in surveys with small information content - price data collection, for example.
However, Methodological studies on Computer Aided Interviews, as a collaborative venture of survey practitioners, software specialists, subject-matter specialists and psychologists is recommended. The first attempt should be to reduce the questionnaire to a reasonable size, which can be honestly answered in less than one hour. The question of development of appropriate software is of second priority. It should be emphasised that software for laptop computers can very well be developed on PCs and no investment on laptop computers would be necessary for this methodological study.
For mobile applications, a few laptops should be available in each large statistical office.
In the ASI or in the envisaged Survey of Non-Manufacturing Industries, attempts should be made to collect information on electronic media from enterprises, which use computers for accounting purposes.
The existing practice of publishing survey results in the form of multiple cross-classified tables with, in many cases, a large number of empty cells should be stopped. Only readable reports, with simple tables and their interpretation, should be published. For experts and professionals, the results and unit-level data should be made available in an electronic medium like a Compact or Floppy Disk. Survey results and other important statistical information should be put on the website of the NSO.
There should be regular computer training programmes for statistical personnel at all levels.
The Commission has noted with serious concern that there are occasions when the unit-level data as well as summary tables computed from them, both disseminated under the National Policy on Data Dissemination, do not match. In order to establish its credibility, the Government should investigate the reasons for the discrepancies and assign institutional responsibility for the failure. A case of immediate concern is the data and results of ASI 1995-96.
A Standing Technical Committee on IT should be set up in the proposed NCS, to lay down policies and review their implementation.
A website of all classifications, concordance tables along with online database query system should be developed for public use. This system should help the user in identifying a code on the basis of part description or key words.
14.7.32 A recommendation relating to conversion of the Computer Centre as the Data Storage and Dissemination Office is given earlier in paragraph 14.5.17.