-----------------------------------------------------------------
FILEs in CD
-----------------------------------------------------------------

1- 406552.pdf   : Master Thesis
2- 406552.rar
		  2.1. Source Code     : Program
		  		 2.1.1. datasets (INPUT FILE) : Includes 5 datasets [Classic3, LA1, Wap, Reuters and Hitech]
		  		 2.1.2. outputs (OUTPUT FILES): SVM use this file during the run
				 2.1.3.	supervised_results: The micro- and macro-averaged F-measure Results

		 	 	 2.1.4. Tez_406552: Project (main)
				 2.1.5. Tez_files : File codes (libraries+classes)
				 2.1.6. SVM_light : SVM Codes
					svm_arzu_v2 : for global policy
				 	svm_arzu    : for local policy

		  2.2. Example Input  : For Hitech dataset 
		  2.3. Example Outputs: For Hitech dataset 
					Example-1
  					* For Combination in global policy |3| tf-idf & CHI (g)
					"tez_406552.exe hitech TFIDF 100 tfidf CHI Method_3"
 					 output 
					 micro-f: 0.621359  macro-f: 0.532103
					Example-2
	  				* For Combination in local policy  |S| CHI & ACC (l)
					"tez_406552.exe hitech ACC_cl 1000 tfidf CHI_cl Score"
 					 output 
					 micro-f: 0.658879  macro-f: 0.589394
					Example-3
	  				* For Individual Term weighting Acc2 (g)
					"tez_406552.exe hitech ACC 2000 tfidf NO Individual" 
					 output 
					 micro-f: 0.66103 macro-f: 0.603025




-----------------------------------------------------------------
Hardware Requirements
-----------------------------------------------------------------
2 GB RAM
1 GB Disk Capacity

-----------------------------------------------------------------
Software Requirements
-----------------------------------------------------------------

C++ Compiler - Visual C++ 2008

-----------------------------------------------------------------

-----------------------------------------------------------------
RUN THE CODE
-----------------------------------------------------------------


INPUT FILES after preprocessing:
For each dataset:

	..\allwords\terms.txt :  List of the all TERMs. 
				 Includes 3 information:
				 1- ID of the TERMs, 
				 2- idf score of the TERMs and 
				 3- TERMs.

	..\allwords\topics.txt : List of the all TOPICs. 
				 Includes 2 information:
				 1- ID of the TOPICs and 
				 2- TOPICs.

	..\allwords\train-docIDs.txt : IDs of the Training DOCUMENTs				    

	..\allwords\train-topic-matrix.txt : Topics of the Training DOCUMENTs
   					     * Its dimension is number of DOCUMENTs * number of TOPICs, 
				             * If  DOCUMENT has the topic, it is "1" otherwise "0".

	..\allwords\train-data-matrix.txt : Data MATRIX
					    * tfidf scores of the TERMs in the DOCUMENTs. 
					    * Its dimension is number of DOCUMENTs * number of TERMs

---------------------------------------------------------------
In order to RUN the program:
---------------------------------------------------------------
	- Change the command prompt line "..\Tez_406552\Debug" 	
	- Run the code with parameters "tez_406552.exe P1 P2 P3 P4 P5 P6" 

	P1:   	Name of the Dataset.
	    	[wap-classic3-hitech-reut or la1]
		
	P2: 	First Feature Selection method
		For Global Policy: [TFIDF-CHI-IG-DF or ACC].
		For Local Policy : [TFIDF_cl-CHI_cl-IG_cl-DF_cl or ACC_cl].		
		
	P3:	Keyword Number
		[10,30,50,100,200,500,1000,1500 or 2000]	

	P4:	Term weighting Method
		[tfidf]

	P5: 	Second Feature Selection method
		For Individual   : [NO].
		For Global Policy: [TFIDF-CHI-IG-DF or ACC2].
		For Local Policy : [TFIDF_cl-CHI_cl-IG_cl-DF_cl or ACC2_cl].	
	
	P6:	Combination Methods:
		[Individual, Score, Rank, Method_1, Method_2, Method_3, Method_4, Method_5, Method_6 or Method_7] 

	- Examples RUNs
	  * For Combination in global policy |3| tf-idf & CHI (g)
		"tez_406552.exe hitech TFIDF 100 tfidf CHI Method_3"
	  * For Combination in local policy  |S| CHI & ACC (l)
		"tez_406552.exe hitech ACC_cl 1000 tfidf CHI_cl Score"
	  * For Individual Term weighting Acc2 (g)
		"tez_406552.exe hitech ACC 2000 tfidf NO Individual"

	

In code:

1-      When CHANGE the dataset:
	* tez_2709\feature_vector.h, svm_arzu\svm_arzu.main.cpp and 
	svm_arzu_v2\svm_arzu.main.cpp change the code according to your choice.
	
2- 	Use svm_arzu_v2 for global policy and 
	Use svm_arzu for local policy .

	svm_arzu_v2 and svm_arzu: Classify documents
	

3 -	DO not need to re-preprocess the datasets but if you want , you should remove the // comment line for related dataset.
	To preprocess:
	- Use preprocessor.cpp
	- Change the code in boss.cpp according to your choice.
	
	For wap and classic3 datasets: preprocessor.cpp,  
	For la1 and hitech datasets  : preprocessor_v2.cpp,
	For reuters datasets         : preprocessor_reut.cpp
	
	
4 -	CHANGE the code according to your choice:

	For global policy : tfidf_aggr_corpus.cpp, 
	For local policy  : tfidf_aggr_class.cpp.

	tfidf_aggr_corpus.cpp and tfidf_aggr_class.cpp: Select Features according 
							to your choice and reduce deminsinality.	
	
	
5 -	data_transform_after_svm.cpp : for each topic if the document is 
				       assigned a positive score by SVM the document is assigned to that TOPIC.
	
6 -     calculate_f_results.cpp      : Calculate micro- and macro-averaged F-measures

NOTs:
Do not forget:
* Puts These 6 files C: directory (or others), datasets, outputs , supervised_results, svm_arzu_v2, svm_arzu, Tez_406552 and Tez_files
* If you do not use the directory "C:" than change "c:" statement in all code files according to your directory.
* Change the dataset features according to your choice in code.



