What is data classification
Split it into three parts for understanding.
Data: Refers to any record of information in electronic or non-electronic form.
Data classification: According to the attributes or characteristics of the organization data, it is classified and classified according to certain principles and methods, and a certain classification system and arrangement sequence are established to better manage and use the process of organization data.
Data classification: The process of grading classified organizational data according to certain classification principles, so as to provide support for the openness of organizational data and the formulation of shared security policies.
The value and significance of data classification
Through the classification and grading of data, the specific value of the data to the organization is identified, and the appropriate strategy is determined to protect the integrity, confidentiality and availability of the data.
For example, companies generally divide data into four types: top secret, confidential, secret, and open. Obviously, data that exceeds the open level is sensitive data. They have different values. Organizations need to adopt different additional inputs and specific strategies. Manage data to avoid the possibility of significant losses to the organization due to unauthorized access to sensitive information.
For example: top-secret data must be encrypted with AES256, and access and use must be approved by the data security management team; confidential-level data must be encrypted with AES256, and access and use require CTO approval; secret-level data must be encrypted with AES256, and access and use require Approval by the person in charge of the department; the use of public data can be stored in plain text, and the direct leader’s approval is required for access and use.
For the support level of event promotion, individuals praise “three-point technology, seven-point management, detail control, and management first”, and standardized systems and processes are one of the key tools and means to implement management thinking.
1. Develop a data classification and hierarchical management system
Implement data classification and classification into the organization and management system to form standardization, and clarify the following:
1) Purpose and scope of the system
2) Organizations and responsibilities involved in data classification and classification
3) Principles of data classification and classification
4) Overview of specific classification of organizational data
The organizational data is divided into three categories:
User data class
5) Overview of the specific classification of organizational data
The organization data is divided into five levels:
Top secret (G1) This is extremely sensitive information. If it is damaged or leaked, the organization may face serious financial or legal risks, such as financial information, system or personal authentication information.
Confidential (G2): This is highly sensitive information. If it is destroyed or leaked, the organization may face financial or legal risks, such as xinyongka information, PII or personal health information (PHI) or trade secrets.
Secret (G3): Damaged or leaked data may have a negative impact on operations, such as contracts with partners and suppliers, employee reviews, etc.
Internal disclosure (G4): Information that is not publicly disclosed, such as sales manuals, organizational charts, employee information, etc.
External disclosure (5): Data that can be freely disclosed, such as marketing materials, contact information, price lists, etc.
6) The use and protection principles of organizational data at various levels
7) The management process of opening and extracting the authority of each level of organization data
Different levels of data have different data access permissions or extraction management approval processes
2. Develop a list of data assets classification and classification
Share a classification and classification idea: The overall data classification is divided into three types of data, namely user data, business data, and company data. The three primary data classifications can be further subdivided into secondary and tertiary data, based on The most detailed level defines the corresponding data value level, and then summarizes the overall data classification list of the organization to guide the actual work of the organization’s overall data governance and data classification.
1) Data classification
a) User data classification
User data refers to the personal information of citizens. This type of data has clearer requirements and descriptions around the world. This can be classified by referring to relevant standards.
The definition of personal information in NIST 800-122:
Personally identifiable information is “any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or track personal identity, such as name, social security number, date and place of birth, mother’s surname or biometric record (2) Any other information that is linked or linkable to an individual, such as medical, educational, financial and employment information. Examples of PII include but are not limited to:
Name, such as full name, maiden name, mother’s maiden name or alias;
Personal identification number, such as social security number (SSN), passport number, driver’s license number, taxpayer identification number or financial account or credit card number;
Address information, such as street address or email address;
Personal characteristics, including photographic images (especially facial or other identifying characteristics), fingerprints, handwriting or other biometric data (for example, retinal scans, voice signatures, facial geometry);
Personal information related to or linkable to one of the above content (for example, date of birth, place of birth, race, religion, weight, activities, geographic indicators, employment information, medical information, education information, financial information).
The definition of personal data in the GDPR:
“Personal data” refers to all information related to an identified or identifiable natural person (hereinafter referred to as the “data subject”); a natural person can be regarded as an identifiable person and can be directly or indirectly identified, especially through Assigned to such as name, identification number, location data, online identifier or one or more identifiers representing special characteristics of the body, the physical, genetic, psychological, economic, cultural or social identity of the natural person;
b) Business data classification
Business-related data is closely related to the business form of the organization. For example, Taobao Jingdong is more about order logistics, product detail data, etc.; iqiyi Youku is more about video data, etc.; in addition, it also contains some General data, such as market data, business analysis data, etc. You can find the business PO to communicate and understand, the business characteristics are determined, not detailed.
c) Classification of company data
Company data mainly includes personnel data, financial data, legal data, procurement data, log data, code data, system data and other secondary data classifications. The secondary data can be divided into two categories. One is general data, such as logs and systems. Etc.; one type is customized data, such as personnel, finance, etc.
The data breakdown of each secondary category is not written in detail. Let me give an example of custom data category development, such as personnel data.
You can find the product manager of the personnel system or the R&D to obtain the data information table of the system. By viewing the data information table, you can clearly see which data the personnel system will use, three-level classification such as company, employee information, department, and position.
The sample is as follows, for reference only:
|Data Classification||Data classification|
|First class classification||Secondary classification||Three-level classification||G1||G2||G3||G4||G5|
|Company data||Personnel data||Employee authentication data: account password, identity verification token||√|
|Personal privacy data of employees: Shenfen card, mobile phone number, yinhang card number||√|
|Personal non-private data of employees: date of entry, rank||√|
|Employee family data: relationship with employees, gender||√|
|Staff education information: school name, degree, graduation type||√|
2) Data classification
Data is also a process of qualitative analysis of data. When assigning levels to various types of data, we need to consider the following questions:
What are the compliance risks associated with data leakage or breach?
What are the organizational economic risks associated with data leakage or destruction?
What are the software and hardware costs associated with data leakage or destruction?
What is the cost of data leakage or damage to the related organization’s brand and public opinion influence?
Define identity verification, organizational financial statements, etc. as G1 level
Define sensitive personal information as G2 level
Define organizational structure, personal general information, etc. as G3 level
Define organizational mailboxes as G4 level
Define the organization’s external public kai information as G5 level
3. Develop data usage specifications
a) Data extraction
- Range of distinction: internal or external use
Distinguish the magnitude: how much data is extracted
Differentiate level: the sensitivity level of the data
Based on the above three aspects, the data extraction process is refined and formulated.
b) Permission opening
Based on the sensitivity levels of libraries, tables, and fields, different authorization approval procedures are developed, and based on the minimum authorization opening method, the ideal state is based on the field opening, the normal situation is based on the table, and the special situation is based on the library. Examples are as follows, for reference only:
- G1 level data: Approval by department head, data attribution team, data security team, internal audit, legal affairs, and data security governance team is required to open
G2 level data: requires department head, data attribution team, data security team, internal audit, and legal approval before it can be opened
G3 level data: Approval by the department head, data attribution team, and data security team is required, and the internal control and legal affairs can be opened after CC
G4 level data: it needs the approval of the department head and the data attribution team to open
G5 level data: need to be approved by the person in charge of the department before opening
4. Data classification and classification promotion
1) System release
Data classification and grading is one of the core of data governance work. The system needs to be approved by at least the company’s technical committee, data governance team, legal affairs, internal control, CTO, etc., and then through internal platforms, emails, security awareness promotion and other multi-dimensional methods. Tell it.
Purpose: To form a top-down execution model from the initiation stage of the system, leading the execution of the company’s strategic level.
2) Implementation of the system
The implementation of data governance-related strategies requires accurate data entry and exit points.
The core data of the company’s production is in the mysql database of the production network, and is given to hive, which is responsible for the big data center, through data synchronization and other methods.
Based on the current situation, choose to implement data classification and classification at the hive level, and label it in a semi-automatic way.
Establish a data map and permission application approval system based on hive. The bottom layer is the mysql library. Data map is a panoramic system at the data level. It can query related information about databases, tables, and fields, and label data in a semi-automatic way. Common fields are automatically tagged and stored in the database, and non-universal fields are manually stored. Ways to tag and store. After the system is mature, it can realize automatic classification, classification and labeling. Data users can apply for data access permissions based on the libraries, tables, fields, and sensitive level information on the data map. Different levels of data permissions go through different data application procedures.
In addition, based on the sensitive information level list maintained by hive, it can reversely promote the management of production mysql data, such as storage encryption or desensitization of sensitive information, and desensitization of use.
5. Verification and evaluation
1) Manual verification and evaluation
Through manual inspection, regularly review the correctness of data labeling and the storage and usage status of sensitive data.
2) Automated verification and evaluation
Based on the data classification and hierarchical list and the sensitive information level list maintained by hive, formulate sensitive information discovery rules, proactively identify static data and dynamic data, and automatically discover and alert data that is not protected in accordance with policy requirements. (Here will be introduced in detail in the subsequent data identification chapter)
to sum up
Data classification and classification is one of the core tasks of organizing data governance and data security. It is a means to integrate information security into the value of data and ensure effective protection.
For data security practitioners, you should ask yourself the following questions: How much data does the organization have and how is the data distributed? What is sensitive data and where is the sensitive data? Who has access to the organization’s data? Have appropriate continuous protection, monitoring, and warning measures been taken? These problems are worthy of our thoughts and solutions, and the data classification and classification discussed today is the second problem to be solved.