Cem Rifki Aydin, Computer Engineering, Bogazici University, M.Sc. Thesis, 2014
AN APPROACH FOR DICTIONARY-BASED CONCEPT MINING IN TURKISH

Files in the CD:
	Thesis_TDK_Concepts_FrFLSc_FBE: This is the project written in Java platform that extracts concepts from corpora documents. It contains two java class files,
		one of which is "Thesis_TDK_Concepts_FrFLSc_FBE.java" that is the main java class, whereas the other one being a java class file, named
		"Tree.java" that helps extract the concepts by building up a hierarchical structure of document words through TDK (Trk Dil Kurumu) dictionary.
	Java project source code accepts four folder names as command line inputs that are also included in the project folder. These are as follows:

	TDK_Disamb_Upd: This folder has Turkish dictionary words with their meaning texts. The words in this folder are in a disambiguated form, that is, their POS
		tags are also indicated. The files in this folder are .txt files.
	CorporaNoun: There are four sub-folders in this folder, which include nouns that are present in the corpora named Forensic, Forensic News, Sport News and Gazi, from
		which concepts are to be extracted. Files are in .txt format.
	Concepts: This is the output folder to which concepts of corpora documents are to be written. These are in .txt format.
	temTokenized: All files in the corpora stated above are included in a specific format in this folder. In accordance with the format, nouns are normally present in files
		whereas the other words with different POS tags such as adverb, adjective, etc. are indicated through a character sequence (xx). Also are these in .txt format.

Hardware Requirements:
	In order to compile and run the project, any RAM capacity above 1 GB suffices. CPU clock rate doesn't matter. There is no graphics involved in this thesis work, so
		graphic card need not be of high performance. Mouse had better be present to facilitate running the project, but an expert can handle navigating through
		widgets and menu tools with no touchpad or mouse involved. In order to save the project, at least a free disk space of 512 MB is needed.

Software Requirements:
	This thesis work is run on Windows 7, but it also can be run on Windows XP, Windows, Vista, Windows 8, Linux and MAC OS X. In order to run the project, a JDK and JRE are
		needed. jre6 or newer jre versions, and jdk1.7 or newer jdk versions can handle running the project. The project is written in Eclipse platform (no matter which
		version is), nonetheless it can be run through command prompt or terminal. The commands to be written into command prompt are as follows:

		C:\Users\ASUS\workspace\Thesis_TDK_Concepts_FrFLSc_FBE\src> javac -encoding UTF-8 Thesis_TDK_Concept_FrFLSc_FBE.java
		C:\Users\ASUS\workspace\Thesis_TDK_Concepts_FrFLSc_FBE\src> java -Dfile.encoding=UTF8 Thesis_TDK_Concept_FrFLSc_FBE TDK_Disamb_Upd\ CorporaNoun Concepts tempTokenized
		
		One has to get into the src folder of the project first, this can be done by using the command cd. The statement above which is C:\Users\ASUS\workspace\Thesis_TDK_Concepts_FrFLSc_FBE\src>
			is not a command, it just shows the location of the java files. The command "javac -encoding UTF-8 Thesis_TDK_Concept_FrFLSc_FBE.java" does simply compiles the
			file "Thesis_TDK_Concept_FrFLSc_FBE.java", whereas the other command "java -Dfile.encoding=UTF-8 ..." runs the source code. "encoding" statement is needed
			to be written, since there are Turkish characters in corpora to be processed. But in order to compile and run source files through commands "javac" and "java",
			they need to be added to Path. Path can be added in Windows by the following steps:
				1. Right click Computer,
				2. Select Advanced tab,
				3. Select Environment Variables,
				4. A statement like ";C:\Program Files\Java\jdk1.7.0_09\bin" should be added to the end of the PATH variable string in System Variables,
				5. If command prompt is open, it has to be closed and reopened to run javac and java commands.
		All libraries required are included in JDK1.7 or newer versions, so no external library is needed to be downloaded and referenced to.